MLPC Midterm
MLPC Midterm
MODULE NAME
: Machine Learning and Parallel Computing
MODULE CODE
: ITS66604
DUE DATE : Week 7 (Midterm week)
: MyTIMES
PLATFORM
Project Title:
STUDENT DECLARATION
1. I confirm that I am aware of the University’s Regulation Governing Cheating in a University Test and
Assignment and of the guidance issued by the School of Computing and IT concerning plagiarism and proper
academic practice, and that the assessed work now submitted is in accordance with this regulation and guidance.
2. I understand that, unless already agreed with the School of Computing and IT, assessed work may not be
submitted that has previously been submitted, either in whole or in part, at this or any other institution.
3. I recognise that should evidence emerge that my work fails to comply with either of the above declarations,
then I may be liable to proceedings under Regulation
Page 1 of 18
Important Notes:
Note 1: Copying, cheating, attempts to cheat, plagiarism, collusion and any other
attempts to gain an unfair advantage in assessment result in to award 0 marks to all
parties concerned.
Note 2: The Turnitin similarity for this module is 20% overall and lesser than 1% from a
single source excluding program source code.
Note 3: All the submitted documents will be cross-checked with other students’ reports in
this current and previous semester. Therefore, any similarities rather that whatever is
highlighted in Note 2, will be considered as violating assessment rules and a Zero (0)
mark will be given to all group members.
Note 4: Severe disciplinary action will be taken against those caught violating assessment
rules such as colluding, plagiarizing or transcribing.
Modern lifestyle is the factor of various kinds of diseases like Heart Disease, Obesity, Type 2
Diabetes, Stroke, Hypertension, etc. You have given a data set of Diabetes with various
factors that impact it. Develop a machine learning model that can predict whether a
patient has diabetes or not, based on various factors such as age, BMI, glucose level,
blood pressure, insulin level, and family history.
Notes:
• You need to write descriptive answers to the questions under each task and also use
a proper program written in Python and execute the code.
• Make sure you add screenshots of the codes and results
• The screenshots need to be evident by including the date and time of your computer
machine.
• The program code must be added to the main report
• The original program files (*.py) are required to be attached to the report upon
submission.
Page 2 of 18
Each test report should entail the application of one or more machine learning methods to a
dataset. Use of third party code is strictly prohibited.
Section 1: Problem Understanding and Objectives
(10 marks)
Problem Definition
(5 marks)
Clearly define the problem. What is the primary objective of the project? Describe why
predicting diabetes is important and how it can impact patient care.
The scenario deals with the focus of chronic condition which is diabetes. So, here a machine
learning model must be developed which can detect either a patient has diabetes or not with the
help of given factors such as BMI, pregnancies, skin thickness, glucose level, blood pressure,
insulin level, Diabetes Pedigree Function and age. This model aims to build a predictive model
which can detect patient as diabetes or not based on the provided factors.
This model also leaves certain impacts on the patient care. Some of them are described below:
1. Easy screening: This model provides an easy way for the screening of the patient who
may not have enough symptoms.
2. For educational purpose: This can be an educational tool for the patients and medical
students to understand the level of risk.
Success Metrics
(5 marks)
Identify and justify the evaluation metrics to be used (e.g., accuracy, precision, recall,
F1score, ROC-AUC). Explain why these metrics are appropriate for the problem.
Page 3 of 18
Section 2: Data Understanding and Preprocessing
(20 marks)
Dataset Description
(5 marks)
Describe the dataset, including the number of rows, columns, features, and source. Mention
any specific characteristics of the data, such as the types of features (categorical,
numerical).
Page 4 of 18
Data Quality Analysis
(5 marks)
Identify and discuss any missing values, outliers, or inconsistencies in the dataset. Provide
examples and quantify these issues.
Page 5 of 18
Page 6 of 18
Data Pre-processing
(10 marks)
Explain the steps taken for data cleaning, handling missing values, dealing with outliers, and
feature engineering. Describe how categorical variables are handled and if the data is
normalized or standardized.
the steps taken for data cleaning, handling missing values, dealing with outliers, and feature
engineering. Describe how categorical variables are handled and if the data is normalized or
standardized.
Page 7 of 18
Section 3: Exploratory Data Analysis (EDA) (20 marks)
Data Visualization
(10 marks)
Provide visualizations that show the relationships between features and the target variable.
Include at least three types of plots (e.g., histograms, scatter plots, correlation heat
maps). Explain the insights gained from these visualizations.
Page 8 of 18
Page 9 of 18
Page 10 of 18
Statistical Analysis
(10 marks)
Perform and explain statistical analyses to understand the data distribution and correlations
between features. Include summary statistics, correlation coefficients, and any other
relevant analyses.
Page 11 of 18
Section 4: Model Development (25 marks)
Algorithm Selection
(5 marks)
Justify the selection of machine learning algorithms considered for the task (e.g., Logistic
Regression, Decision Trees, Random Forest, SVM, Neural Networks). Discuss the pros and
cons of each algorithm.
Page 12 of 18
Data Splitting
(5 marks)
Describe how the data was split into training, validation, and test sets. Explain the rationale
behind the chosen split ratio.
Cross-Validation
(5 marks)
Describe the cross-validation technique used (e.g., K-Fold, Stratified K-Fold) and justify its
choice. Explain how cross-validation was implemented.
Performance Metrics
(10 marks)
Page 13 of 18
Present the performance metrics for the model on the validation and test sets. Provide a
detailed interpretation of these metrics. Include confusion matrices, precision, recall, F1-
score, ROC-AUC, and any other relevant metrics.
Model Interpretation
(5 marks)
Page 14 of 18
Use techniques like SHAP, LIME, or feature importance to interpret the model. Discuss the
importance of the features in the model's predictions.
Submission report should be in 15-20 pages, excluding appendices. Please include all the
specific requirement and details in your report.
• Student is not allowed to transcribe directly (cut and paste) any material from another
source into their submission.
• Include in-text citation to support your answers and add the list of references at the
end of your report (APA format). The list of references is to be alphabetized by the
first author's last name, or (if no author is listed) the organization or title.
• The Turnitin similarity for this module is 20% overall and lesser than 1% from a
single source excluding program source codes.
• Utilization of AI tools is discouraged. If AI-generated content exceeds 30% in the
assignment. The students whose AI generated content exceeds 30% shall be
invited for a discussion session with the lecturer and their work will be graded
accordingly
Page 15 of 18
• If the assignment is returned to students due to format non-compliance, they will be
required to revise it. At that juncture, the maximum achievable marks for the
assignment will be limited to 50%.
• Students are only required to submit a single pdf document. That document should
contain a link to a Github repository which contains the. ipynb file. You are not
required to submit a separate .py or. ipynb file in the submission portal.
• Students will be provided 48 hours to complete the midterm test. The test will be
uploaded to myTimes and information regarding the start time will be provided
beforehand.
Submission Requirements:
Mark
Question Excellent Good Average Poor
s
Thoroughly
defined with
clear Well-defined Defined but Poorly defined
Problem objectives, with clear lacks clarity in or unclear
5
Definition objectives and objectives and objectives and
relevance, and
relevance. relevance. relevance.
potential
impact.
Page 16 of 18
Metrics are well Metrics are
Metrics are
justified, appropriate Metrics are not
Success somewhat
5 appropriate, and justified justified or
Metrics justified and
and explained but with less inappropriate.
appropriate.
thoroughly. detail.
Comprehensive Clear Partial
and clear Poor or
Dataset description with description with
5 description of incomplete
Description dataset minor details some key
description.
characteristics. missing. details missing.
Thorough
Good analysis Basic analysis Poor analysis
analysis with all
Data Quality with most with some with few or no
5 issues
Analysis issues issues issues
identified and
identified. identified. identified.
quantified.
Thorough and Good Basic Poor
effective preprocessing preprocessing preprocessing
Data preprocessing steps with with several with major
10 minor issues or issues or
Preprocessing steps, including issues or
all required missing missing missing most
components. components. components. components.
Good Basic
Excellent and visualizations visualizations Poor or missing
insightful with clear with visualizations
Data explanations, with unclear or
10 visualizations explanations
Visualization several
with clear but minor missing
explanations. improvements improvements explanations.
needed. needed.
Thorough and Poor or missing
Good analysis Basic analysis
insightful statistical
with some with
Statistical statistical analysis with
10 insights and explanations,
Analysis analysis with unclear or
clear several insights
clear missing
explanations. missing.
explanations. explanations.
Excellent Poor or missing
Good Basic
justification and justification and
justification and justification and
Algorithm selection, with selection with
5 selection with selection with
Selection thorough unclear or
clear pros and some pros and
discussion of missing pros
cons. cons.
pros and cons. and cons.
Well-executed Good data Poor or missing
and justified Basic data data splitting
splitting with
data splitting splitting with with unclear or
Data Splitting 5 clear rationale
some rationale,
with clear but minor missing
several issues.
rationale. issues. rationale.
Thorough and Good training Poor or missing
Model Training effective and tuning with Basic training training and
and training and clear and tuning with tuning with
10
Hyperparamete tuning, with explanations explanations, unclear or
r Tuning detailed but minor several issues. missing
Page 17 of 18
explanations. issues. explanations.
Basic use and Poor or missing
Excellent use Good use and
justification of use and
and justification justification of
technique with justification with
Cross- of technique technique with
5 several issues unclear or
Validation with clear minor issues in
in missing
implementation implementation
implementation implementation
. .
. .
Good
Excellent presentation Basic Poor or missing
presentation and presentation presentation
and interpretation of and and
Performance interpretation
10 interpretation of metrics with interpretation of
Metrics clear metrics with with unclear or
metrics with
detailed explanations explanations, missing
explanations. but minor several issues. explanations.
issues.
Good
Basic Poor or missing
Excellent interpretation
interpretation interpretation
interpretation and discussion
Model and discussion and discussion
5 and discussion with clear
Interpretation with with unclear or
with clear explanations
explanations, missing
explanations. but minor
several issues. explanations.
issues.
Submission
adheres Submission Some aspects
perfectly to all mostly adheres of formatting
formatting Formatting
Compliance to formatting guidelines are
guidelines guidelines are
with Formatting 10 guidelines, with followed, but
largely ignored
Guidelines specified (font minor there are
or not followed.
type, size, line inconsistencies notable
spacing, . inconsistencies
alignment).
Page 18 of 18