0% found this document useful (0 votes)
25 views

MLPC Midterm

Uploaded by

Aarohan Subedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

MLPC Midterm

Uploaded by

Aarohan Subedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

SCHOOL OF COMPUTER SCIENCE

MIDTERM TEST (Weightage 10%)


MAY 2024 SEMESTER

MODULE NAME
: Machine Learning and Parallel Computing
MODULE CODE
: ITS66604
DUE DATE : Week 7 (Midterm week)
: MyTIMES
PLATFORM

This paper consists of Eight (8) pages, inclusive of this page.

Project Title:
STUDENT DECLARATION
1. I confirm that I am aware of the University’s Regulation Governing Cheating in a University Test and
Assignment and of the guidance issued by the School of Computing and IT concerning plagiarism and proper
academic practice, and that the assessed work now submitted is in accordance with this regulation and guidance.
2. I understand that, unless already agreed with the School of Computing and IT, assessed work may not be
submitted that has previously been submitted, either in whole or in part, at this or any other institution.
3. I recognise that should evidence emerge that my work fails to comply with either of the above declarations,
then I may be liable to proceedings under Regulation

No Student Name Student ID Date Signature Score


Osin Dhamala 0364812 1st July, 2024

Page 1 of 18
Important Notes:

Note 1: Copying, cheating, attempts to cheat, plagiarism, collusion and any other
attempts to gain an unfair advantage in assessment result in to award 0 marks to all
parties concerned.
Note 2: The Turnitin similarity for this module is 20% overall and lesser than 1% from a
single source excluding program source code.
Note 3: All the submitted documents will be cross-checked with other students’ reports in
this current and previous semester. Therefore, any similarities rather that whatever is
highlighted in Note 2, will be considered as violating assessment rules and a Zero (0)
mark will be given to all group members.
Note 4: Severe disciplinary action will be taken against those caught violating assessment
rules such as colluding, plagiarizing or transcribing.

Motivation of Purpose Learning Assignment

Module Learning Outcome: On completion of this assessment, students should be able to do


the following:

MLO2: To design and develop machine learning algorithms to solve a problem.

Case study for diabetes using machine learning:

Modern lifestyle is the factor of various kinds of diseases like Heart Disease, Obesity, Type 2
Diabetes, Stroke, Hypertension, etc. You have given a data set of Diabetes with various
factors that impact it. Develop a machine learning model that can predict whether a
patient has diabetes or not, based on various factors such as age, BMI, glucose level,
blood pressure, insulin level, and family history.

Notes:

• You need to write descriptive answers to the questions under each task and also use
a proper program written in Python and execute the code.
• Make sure you add screenshots of the codes and results
• The screenshots need to be evident by including the date and time of your computer
machine.
• The program code must be added to the main report
• The original program files (*.py) are required to be attached to the report upon
submission.

Page 2 of 18
Each test report should entail the application of one or more machine learning methods to a
dataset. Use of third party code is strictly prohibited.
Section 1: Problem Understanding and Objectives
(10 marks)

Problem Definition
(5 marks)

Clearly define the problem. What is the primary objective of the project? Describe why
predicting diabetes is important and how it can impact patient care.

The scenario deals with the focus of chronic condition which is diabetes. So, here a machine
learning model must be developed which can detect either a patient has diabetes or not with the
help of given factors such as BMI, pregnancies, skin thickness, glucose level, blood pressure,
insulin level, Diabetes Pedigree Function and age. This model aims to build a predictive model
which can detect patient as diabetes or not based on the provided factors.

The main primary object of this project are as follows:


1. Prediction: Based on the patient’s data, the model can predict either the patient has
diabetes or not.
2. Clinical Support: The model works as a clinical tool which supports healthcare
individuals in making informed decision.

It is important because of following reasons:


1. Patient awareness: The predictive model can raise the awareness about all the risk
factors and encourage for the regular monitoring. Thus, with the help of this patient are
aware for it and encourages the use of this predictive model.
2. Early Detection: This predicting diabetes detection helps for the identification of the
diabetes even on early stage of it. This can be useful for the reducing risk of severe
complications.

This model also leaves certain impacts on the patient care. Some of them are described below:
1. Easy screening: This model provides an easy way for the screening of the patient who
may not have enough symptoms.
2. For educational purpose: This can be an educational tool for the patients and medical
students to understand the level of risk.

Success Metrics
(5 marks)

Identify and justify the evaluation metrics to be used (e.g., accuracy, precision, recall,
F1score, ROC-AUC). Explain why these metrics are appropriate for the problem.

Page 3 of 18
Section 2: Data Understanding and Preprocessing
(20 marks)

Dataset Description
(5 marks)

Describe the dataset, including the number of rows, columns, features, and source. Mention
any specific characteristics of the data, such as the types of features (categorical,
numerical).

Page 4 of 18
Data Quality Analysis
(5 marks)

Identify and discuss any missing values, outliers, or inconsistencies in the dataset. Provide
examples and quantify these issues.

Page 5 of 18
Page 6 of 18
Data Pre-processing
(10 marks)

Explain the steps taken for data cleaning, handling missing values, dealing with outliers, and
feature engineering. Describe how categorical variables are handled and if the data is
normalized or standardized.

Include the following:

• Missing value handling (e.g., mean imputation, median imputation).


(2 marks)

• Outlier treatment (e.g., IQR method, Z-score method).


(2 marks)

• Encoding categorical variables (e.g., one-hot encoding, label encoding).


(3 marks)

• Normalization or Standardization (e.g., Min-Max Scaling)


(3 marks)

the steps taken for data cleaning, handling missing values, dealing with outliers, and feature
engineering. Describe how categorical variables are handled and if the data is normalized or
standardized.

Page 7 of 18
Section 3: Exploratory Data Analysis (EDA) (20 marks)

Data Visualization
(10 marks)

Provide visualizations that show the relationships between features and the target variable.
Include at least three types of plots (e.g., histograms, scatter plots, correlation heat
maps). Explain the insights gained from these visualizations.

Page 8 of 18
Page 9 of 18
Page 10 of 18
Statistical Analysis
(10 marks)

Perform and explain statistical analyses to understand the data distribution and correlations
between features. Include summary statistics, correlation coefficients, and any other
relevant analyses.

Page 11 of 18
Section 4: Model Development (25 marks)

Algorithm Selection
(5 marks)

Justify the selection of machine learning algorithms considered for the task (e.g., Logistic
Regression, Decision Trees, Random Forest, SVM, Neural Networks). Discuss the pros and
cons of each algorithm.

Page 12 of 18
Data Splitting
(5 marks)

Describe how the data was split into training, validation, and test sets. Explain the rationale
behind the chosen split ratio.

Model Training and Hyper Parameter Tuning


(10 marks)
Explain the process of training the model. Discuss the techniques used for hyper parameter
tuning (e.g., Grid Search, Random Search). Provide details on the parameters tuned and the
chosen values.

Cross-Validation
(5 marks)

Describe the cross-validation technique used (e.g., K-Fold, Stratified K-Fold) and justify its
choice. Explain how cross-validation was implemented.

Section 5: Model Evaluation (10 marks)

Performance Metrics
(10 marks)

Page 13 of 18
Present the performance metrics for the model on the validation and test sets. Provide a
detailed interpretation of these metrics. Include confusion matrices, precision, recall, F1-
score, ROC-AUC, and any other relevant metrics.

Section 6: Model Interpretation


(5 marks)

Model Interpretation
(5 marks)

Page 14 of 18
Use techniques like SHAP, LIME, or feature importance to interpret the model. Discuss the
importance of the features in the model's predictions.

Section 7: Submission Requirements


(10 marks)

Submission report should be in 15-20 pages, excluding appendices. Please include all the
specific requirement and details in your report.

The following elements must be included in the slides submission:

1. Font type: Times New Roman


2. Font size: 12
3. Line spacing: 1.5
4. Alignment: Justify Text
5. Document type: .pdf
6. Number of pages: 15 – 20 pages (do not exceed the page limit)
7. Your full report should consist of the following:
a) Cover page (Name, ID, Date, Signature, Score)
b) Marking Rubrics (attach as second page in the report)
Appendixes (line spacing = 1.0)
• List of references (APA format)
• Report of similarity score (percentage of similarity score from each source
needs to be shown).

8. All figures and tables are labelled properly.

9. File naming conventions: UniversityID_FirstnameLastname_MidTermTest.pdf


Notes:

• Student is not allowed to transcribe directly (cut and paste) any material from another
source into their submission.

• Include in-text citation to support your answers and add the list of references at the
end of your report (APA format). The list of references is to be alphabetized by the
first author's last name, or (if no author is listed) the organization or title.

• The Turnitin similarity for this module is 20% overall and lesser than 1% from a
single source excluding program source codes.
• Utilization of AI tools is discouraged. If AI-generated content exceeds 30% in the
assignment. The students whose AI generated content exceeds 30% shall be
invited for a discussion session with the lecturer and their work will be graded
accordingly

Page 15 of 18
• If the assignment is returned to students due to format non-compliance, they will be
required to revise it. At that juncture, the maximum achievable marks for the
assignment will be limited to 50%.
• Students are only required to submit a single pdf document. That document should
contain a link to a Github repository which contains the. ipynb file. You are not
required to submit a separate .py or. ipynb file in the submission portal.
• Students will be provided 48 hours to complete the midterm test. The test will be
uploaded to myTimes and information regarding the start time will be provided
beforehand.

Submission Requirements:

Mark
Question Excellent Good Average Poor
s
Thoroughly
defined with
clear Well-defined Defined but Poorly defined
Problem objectives, with clear lacks clarity in or unclear
5
Definition objectives and objectives and objectives and
relevance, and
relevance. relevance. relevance.
potential
impact.

Page 16 of 18
Metrics are well Metrics are
Metrics are
justified, appropriate Metrics are not
Success somewhat
5 appropriate, and justified justified or
Metrics justified and
and explained but with less inappropriate.
appropriate.
thoroughly. detail.
Comprehensive Clear Partial
and clear Poor or
Dataset description with description with
5 description of incomplete
Description dataset minor details some key
description.
characteristics. missing. details missing.
Thorough
Good analysis Basic analysis Poor analysis
analysis with all
Data Quality with most with some with few or no
5 issues
Analysis issues issues issues
identified and
identified. identified. identified.
quantified.
Thorough and Good Basic Poor
effective preprocessing preprocessing preprocessing
Data preprocessing steps with with several with major
10 minor issues or issues or
Preprocessing steps, including issues or
all required missing missing missing most
components. components. components. components.
Good Basic
Excellent and visualizations visualizations Poor or missing
insightful with clear with visualizations
Data explanations, with unclear or
10 visualizations explanations
Visualization several
with clear but minor missing
explanations. improvements improvements explanations.
needed. needed.
Thorough and Poor or missing
Good analysis Basic analysis
insightful statistical
with some with
Statistical statistical analysis with
10 insights and explanations,
Analysis analysis with unclear or
clear several insights
clear missing
explanations. missing.
explanations. explanations.
Excellent Poor or missing
Good Basic
justification and justification and
justification and justification and
Algorithm selection, with selection with
5 selection with selection with
Selection thorough unclear or
clear pros and some pros and
discussion of missing pros
cons. cons.
pros and cons. and cons.
Well-executed Good data Poor or missing
and justified Basic data data splitting
splitting with
data splitting splitting with with unclear or
Data Splitting 5 clear rationale
some rationale,
with clear but minor missing
several issues.
rationale. issues. rationale.
Thorough and Good training Poor or missing
Model Training effective and tuning with Basic training training and
and training and clear and tuning with tuning with
10
Hyperparamete tuning, with explanations explanations, unclear or
r Tuning detailed but minor several issues. missing

Page 17 of 18
explanations. issues. explanations.
Basic use and Poor or missing
Excellent use Good use and
justification of use and
and justification justification of
technique with justification with
Cross- of technique technique with
5 several issues unclear or
Validation with clear minor issues in
in missing
implementation implementation
implementation implementation
. .
. .
Good
Excellent presentation Basic Poor or missing
presentation and presentation presentation
and interpretation of and and
Performance interpretation
10 interpretation of metrics with interpretation of
Metrics clear metrics with with unclear or
metrics with
detailed explanations explanations, missing
explanations. but minor several issues. explanations.
issues.
Good
Basic Poor or missing
Excellent interpretation
interpretation interpretation
interpretation and discussion
Model and discussion and discussion
5 and discussion with clear
Interpretation with with unclear or
with clear explanations
explanations, missing
explanations. but minor
several issues. explanations.
issues.
Submission
adheres Submission Some aspects
perfectly to all mostly adheres of formatting
formatting Formatting
Compliance to formatting guidelines are
guidelines guidelines are
with Formatting 10 guidelines, with followed, but
largely ignored
Guidelines specified (font minor there are
or not followed.
type, size, line inconsistencies notable
spacing, . inconsistencies
alignment).

END OF MIDTERM TEST

Page 18 of 18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy