0% found this document useful (0 votes)

25 views

MLPC Midterm

Uploaded by

Aarohan Subedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

MLPC Midterm

Uploaded by

Aarohan Subedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

SCHOOL OF COMPUTER SCIENCE

MIDTERM TEST (Weightage 10%)

MAY 2024 SEMESTER

MODULE NAME
: Machine Learning and Parallel Computing
MODULE CODE
: ITS66604
DUE DATE : Week 7 (Midterm week)
: MyTIMES
PLATFORM

This paper consists of Eight (8) pages, inclusive of this page.

Project Title:
STUDENT DECLARATION
1. I confirm that I am aware of the University’s Regulation Governing Cheating in a University Test and
Assignment and of the guidance issued by the School of Computing and IT concerning plagiarism and proper
academic practice, and that the assessed work now submitted is in accordance with this regulation and guidance.
2. I understand that, unless already agreed with the School of Computing and IT, assessed work may not be
submitted that has previously been submitted, either in whole or in part, at this or any other institution.
3. I recognise that should evidence emerge that my work fails to comply with either of the above declarations,
then I may be liable to proceedings under Regulation

No Student Name Student ID Date Signature Score

Osin Dhamala 0364812 1st July, 2024

Page 1 of 18
Important Notes:

Note 1: Copying, cheating, attempts to cheat, plagiarism, collusion and any other
attempts to gain an unfair advantage in assessment result in to award 0 marks to all
parties concerned.
Note 2: The Turnitin similarity for this module is 20% overall and lesser than 1% from a
single source excluding program source code.
Note 3: All the submitted documents will be cross-checked with other students’ reports in
this current and previous semester. Therefore, any similarities rather that whatever is
highlighted in Note 2, will be considered as violating assessment rules and a Zero (0)
mark will be given to all group members.
Note 4: Severe disciplinary action will be taken against those caught violating assessment
rules such as colluding, plagiarizing or transcribing.

Motivation of Purpose Learning Assignment

Module Learning Outcome: On completion of this assessment, students should be able to do

the following:

MLO2: To design and develop machine learning algorithms to solve a problem.

Case study for diabetes using machine learning:

Modern lifestyle is the factor of various kinds of diseases like Heart Disease, Obesity, Type 2
Diabetes, Stroke, Hypertension, etc. You have given a data set of Diabetes with various
factors that impact it. Develop a machine learning model that can predict whether a
patient has diabetes or not, based on various factors such as age, BMI, glucose level,
blood pressure, insulin level, and family history.

Notes:

• You need to write descriptive answers to the questions under each task and also use
a proper program written in Python and execute the code.
• Make sure you add screenshots of the codes and results
• The screenshots need to be evident by including the date and time of your computer
machine.
• The program code must be added to the main report
• The original program files (*.py) are required to be attached to the report upon
submission.

Page 2 of 18
Each test report should entail the application of one or more machine learning methods to a
dataset. Use of third party code is strictly prohibited.
Section 1: Problem Understanding and Objectives
(10 marks)

Problem Definition
(5 marks)

Clearly define the problem. What is the primary objective of the project? Describe why
predicting diabetes is important and how it can impact patient care.

The scenario deals with the focus of chronic condition which is diabetes. So, here a machine
learning model must be developed which can detect either a patient has diabetes or not with the
help of given factors such as BMI, pregnancies, skin thickness, glucose level, blood pressure,
insulin level, Diabetes Pedigree Function and age. This model aims to build a predictive model
which can detect patient as diabetes or not based on the provided factors.

The main primary object of this project are as follows:

1. Prediction: Based on the patient’s data, the model can predict either the patient has
diabetes or not.
2. Clinical Support: The model works as a clinical tool which supports healthcare
individuals in making informed decision.

It is important because of following reasons:

1. Patient awareness: The predictive model can raise the awareness about all the risk
factors and encourage for the regular monitoring. Thus, with the help of this patient are
aware for it and encourages the use of this predictive model.
2. Early Detection: This predicting diabetes detection helps for the identification of the
diabetes even on early stage of it. This can be useful for the reducing risk of severe
complications.

This model also leaves certain impacts on the patient care. Some of them are described below:
1. Easy screening: This model provides an easy way for the screening of the patient who
may not have enough symptoms.
2. For educational purpose: This can be an educational tool for the patients and medical
students to understand the level of risk.

Success Metrics
(5 marks)

Identify and justify the evaluation metrics to be used (e.g., accuracy, precision, recall,
F1score, ROC-AUC). Explain why these metrics are appropriate for the problem.

Page 3 of 18
Section 2: Data Understanding and Preprocessing
(20 marks)

Dataset Description
(5 marks)

Describe the dataset, including the number of rows, columns, features, and source. Mention
any specific characteristics of the data, such as the types of features (categorical,
numerical).

Page 4 of 18
Data Quality Analysis
(5 marks)

Identify and discuss any missing values, outliers, or inconsistencies in the dataset. Provide
examples and quantify these issues.

Page 5 of 18
Page 6 of 18
Data Pre-processing
(10 marks)

Explain the steps taken for data cleaning, handling missing values, dealing with outliers, and
feature engineering. Describe how categorical variables are handled and if the data is
normalized or standardized.

Include the following:

• Missing value handling (e.g., mean imputation, median imputation).

(2 marks)

• Outlier treatment (e.g., IQR method, Z-score method).

(2 marks)

• Encoding categorical variables (e.g., one-hot encoding, label encoding).

(3 marks)

• Normalization or Standardization (e.g., Min-Max Scaling)

(3 marks)

the steps taken for data cleaning, handling missing values, dealing with outliers, and feature
engineering. Describe how categorical variables are handled and if the data is normalized or
standardized.

Page 7 of 18
Section 3: Exploratory Data Analysis (EDA) (20 marks)

Data Visualization
(10 marks)

Provide visualizations that show the relationships between features and the target variable.
Include at least three types of plots (e.g., histograms, scatter plots, correlation heat
maps). Explain the insights gained from these visualizations.

Page 8 of 18
Page 9 of 18
Page 10 of 18
Statistical Analysis
(10 marks)

Perform and explain statistical analyses to understand the data distribution and correlations
between features. Include summary statistics, correlation coefficients, and any other
relevant analyses.

Page 11 of 18
Section 4: Model Development (25 marks)

Algorithm Selection
(5 marks)

Justify the selection of machine learning algorithms considered for the task (e.g., Logistic
Regression, Decision Trees, Random Forest, SVM, Neural Networks). Discuss the pros and
cons of each algorithm.

Page 12 of 18
Data Splitting
(5 marks)

Describe how the data was split into training, validation, and test sets. Explain the rationale
behind the chosen split ratio.

Model Training and Hyper Parameter Tuning

(10 marks)
Explain the process of training the model. Discuss the techniques used for hyper parameter
tuning (e.g., Grid Search, Random Search). Provide details on the parameters tuned and the
chosen values.

Cross-Validation
(5 marks)

Describe the cross-validation technique used (e.g., K-Fold, Stratified K-Fold) and justify its
choice. Explain how cross-validation was implemented.

Section 5: Model Evaluation (10 marks)

Performance Metrics
(10 marks)

Page 13 of 18
Present the performance metrics for the model on the validation and test sets. Provide a
detailed interpretation of these metrics. Include confusion matrices, precision, recall, F1-
score, ROC-AUC, and any other relevant metrics.

Section 6: Model Interpretation

(5 marks)

Model Interpretation
(5 marks)

Page 14 of 18
Use techniques like SHAP, LIME, or feature importance to interpret the model. Discuss the
importance of the features in the model's predictions.

Section 7: Submission Requirements

(10 marks)

Submission report should be in 15-20 pages, excluding appendices. Please include all the
specific requirement and details in your report.

The following elements must be included in the slides submission:

1. Font type: Times New Roman

2. Font size: 12
3. Line spacing: 1.5
4. Alignment: Justify Text
5. Document type: .pdf
6. Number of pages: 15 – 20 pages (do not exceed the page limit)
7. Your full report should consist of the following:
a) Cover page (Name, ID, Date, Signature, Score)
b) Marking Rubrics (attach as second page in the report)
Appendixes (line spacing = 1.0)
• List of references (APA format)
• Report of similarity score (percentage of similarity score from each source
needs to be shown).

8. All figures and tables are labelled properly.

9. File naming conventions: UniversityID_FirstnameLastname_MidTermTest.pdf

Notes:

• Student is not allowed to transcribe directly (cut and paste) any material from another
source into their submission.

• Include in-text citation to support your answers and add the list of references at the
end of your report (APA format). The list of references is to be alphabetized by the
first author's last name, or (if no author is listed) the organization or title.

• The Turnitin similarity for this module is 20% overall and lesser than 1% from a
single source excluding program source codes.
• Utilization of AI tools is discouraged. If AI-generated content exceeds 30% in the
assignment. The students whose AI generated content exceeds 30% shall be
invited for a discussion session with the lecturer and their work will be graded
accordingly

Page 15 of 18
• If the assignment is returned to students due to format non-compliance, they will be
required to revise it. At that juncture, the maximum achievable marks for the
assignment will be limited to 50%.
• Students are only required to submit a single pdf document. That document should
contain a link to a Github repository which contains the. ipynb file. You are not
required to submit a separate .py or. ipynb file in the submission portal.
• Students will be provided 48 hours to complete the midterm test. The test will be
uploaded to myTimes and information regarding the start time will be provided
beforehand.

Submission Requirements:

Mark
Question Excellent Good Average Poor
s
Thoroughly
defined with
clear Well-defined Defined but Poorly defined
Problem objectives, with clear lacks clarity in or unclear
5
Definition objectives and objectives and objectives and
relevance, and
relevance. relevance. relevance.
potential
impact.

Page 16 of 18
Metrics are well Metrics are
Metrics are
justified, appropriate Metrics are not
Success somewhat
5 appropriate, and justified justified or
Metrics justified and
and explained but with less inappropriate.
appropriate.
thoroughly. detail.
Comprehensive Clear Partial
and clear Poor or
Dataset description with description with
5 description of incomplete
Description dataset minor details some key
description.
characteristics. missing. details missing.
Thorough
Good analysis Basic analysis Poor analysis
analysis with all
Data Quality with most with some with few or no
5 issues
Analysis issues issues issues
identified and
identified. identified. identified.
quantified.
Thorough and Good Basic Poor
effective preprocessing preprocessing preprocessing
Data preprocessing steps with with several with major
10 minor issues or issues or
Preprocessing steps, including issues or
all required missing missing missing most
components. components. components. components.
Good Basic
Excellent and visualizations visualizations Poor or missing
insightful with clear with visualizations
Data explanations, with unclear or
10 visualizations explanations
Visualization several
with clear but minor missing
explanations. improvements improvements explanations.
needed. needed.
Thorough and Poor or missing
Good analysis Basic analysis
insightful statistical
with some with
Statistical statistical analysis with
10 insights and explanations,
Analysis analysis with unclear or
clear several insights
clear missing
explanations. missing.
explanations. explanations.
Excellent Poor or missing
Good Basic
justification and justification and
justification and justification and
Algorithm selection, with selection with
5 selection with selection with
Selection thorough unclear or
clear pros and some pros and
discussion of missing pros
cons. cons.
pros and cons. and cons.
Well-executed Good data Poor or missing
and justified Basic data data splitting
splitting with
data splitting splitting with with unclear or
Data Splitting 5 clear rationale
some rationale,
with clear but minor missing
several issues.
rationale. issues. rationale.
Thorough and Good training Poor or missing
Model Training effective and tuning with Basic training training and
and training and clear and tuning with tuning with
10
Hyperparamete tuning, with explanations explanations, unclear or
r Tuning detailed but minor several issues. missing

Page 17 of 18
explanations. issues. explanations.
Basic use and Poor or missing
Excellent use Good use and
justification of use and
and justification justification of
technique with justification with
Cross- of technique technique with
5 several issues unclear or
Validation with clear minor issues in
in missing
implementation implementation
implementation implementation
. .
. .
Good
Excellent presentation Basic Poor or missing
presentation and presentation presentation
and interpretation of and and
Performance interpretation
10 interpretation of metrics with interpretation of
Metrics clear metrics with with unclear or
metrics with
detailed explanations explanations, missing
explanations. but minor several issues. explanations.
issues.
Good
Basic Poor or missing
Excellent interpretation
interpretation interpretation
interpretation and discussion
Model and discussion and discussion
5 and discussion with clear
Interpretation with with unclear or
with clear explanations
explanations, missing
explanations. but minor
several issues. explanations.
issues.
Submission
adheres Submission Some aspects
perfectly to all mostly adheres of formatting
formatting Formatting
Compliance to formatting guidelines are
guidelines guidelines are
with Formatting 10 guidelines, with followed, but
largely ignored
Guidelines specified (font minor there are
or not followed.
type, size, line inconsistencies notable
spacing, . inconsistencies
alignment).

END OF MIDTERM TEST

Page 18 of 18

Disease Prediction and Drug Recommendation Using Machine Learning
100% (1)
Disease Prediction and Drug Recommendation Using Machine Learning
26 pages
Machine Learning Project
100% (3)
Machine Learning Project
75 pages
ICT583 Data Science Applications - Final Assignment - Individual - UPDATED!!! - Explanation
0% (1)
ICT583 Data Science Applications - Final Assignment - Individual - UPDATED!!! - Explanation
5 pages
Math 221 Week 1 Quiz
No ratings yet
Math 221 Week 1 Quiz
10 pages
EY Treasury Management Systems Overview PDF
No ratings yet
EY Treasury Management Systems Overview PDF
86 pages
DS Assignment
No ratings yet
DS Assignment
7 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
COM4509/6509 MLAI - Assignment Part 2 Brief: This Link This Link
No ratings yet
COM4509/6509 MLAI - Assignment Part 2 Brief: This Link This Link
5 pages
Datascience
No ratings yet
Datascience
8 pages
Diabetes Disease Prediction Using A Web Tool With The Help of A Machine Learning Model.
No ratings yet
Diabetes Disease Prediction Using A Web Tool With The Help of A Machine Learning Model.
43 pages
KR&AI-ML-DM Practical Journal ANS
No ratings yet
KR&AI-ML-DM Practical Journal ANS
64 pages
Anant MLDS File
No ratings yet
Anant MLDS File
38 pages
Introduction To Machine Learning Course Code: 4350702
No ratings yet
Introduction To Machine Learning Course Code: 4350702
12 pages
Diabetes Thesis1
No ratings yet
Diabetes Thesis1
20 pages
projectworddoc
No ratings yet
projectworddoc
56 pages
Shwet Mlds
No ratings yet
Shwet Mlds
35 pages
28,30 MP Report
No ratings yet
28,30 MP Report
38 pages
Mini Project Report
No ratings yet
Mini Project Report
21 pages
SL_Problem Statement
No ratings yet
SL_Problem Statement
3 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
SUB Final Question
No ratings yet
SUB Final Question
2 pages
A Mini Skill Based Project Report On: Machine Learning & Optimization (270404)
No ratings yet
A Mini Skill Based Project Report On: Machine Learning & Optimization (270404)
20 pages
Ai Datascience Project Grade 10
No ratings yet
Ai Datascience Project Grade 10
14 pages
Skill Based Projects - Data - Science (See List On Last Page)
No ratings yet
Skill Based Projects - Data - Science (See List On Last Page)
4 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
AD3461 MACHINE LEARNING LABORATORY SYLLABUS
No ratings yet
AD3461 MACHINE LEARNING LABORATORY SYLLABUS
2 pages
Automated payroll management system
No ratings yet
Automated payroll management system
4 pages
Maindra
No ratings yet
Maindra
22 pages
Fundamentals of Machine Learning 4341603
No ratings yet
Fundamentals of Machine Learning 4341603
9 pages
Final Project Guidelines: Dataset Selection & Planning
No ratings yet
Final Project Guidelines: Dataset Selection & Planning
3 pages
CSL7620_A2
No ratings yet
CSL7620_A2
2 pages
COS10022 Data Science Assignment 1 Question
No ratings yet
COS10022 Data Science Assignment 1 Question
3 pages
dsa _dk question paper
No ratings yet
dsa _dk question paper
4 pages
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
ML Question Bank
No ratings yet
ML Question Bank
7 pages
Ads exp 10
No ratings yet
Ads exp 10
10 pages
Machine Learning Bla CK Book
No ratings yet
Machine Learning Bla CK Book
71 pages
A1991370857_65680_10_2025_CSM355CA1
No ratings yet
A1991370857_65680_10_2025_CSM355CA1
6 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Cs Batchno19
No ratings yet
Cs Batchno19
53 pages
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
IPL Winning Prediction Intern Report
No ratings yet
IPL Winning Prediction Intern Report
52 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
ML Manual2024 - IV YEar
No ratings yet
ML Manual2024 - IV YEar
39 pages
Project Word
No ratings yet
Project Word
58 pages
diabetes_test report
No ratings yet
diabetes_test report
62 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Multiple Disease Prediction and Medical Check Up Using Machine Learning
No ratings yet
Multiple Disease Prediction and Medical Check Up Using Machine Learning
38 pages
Ek125 Final Project
No ratings yet
Ek125 Final Project
13 pages
Minor Project Report
0% (1)
Minor Project Report
25 pages
Coursework Assessment MFKhan v1.4
No ratings yet
Coursework Assessment MFKhan v1.4
9 pages
Estimating diabetic risk accurately(ppt)
No ratings yet
Estimating diabetic risk accurately(ppt)
26 pages
Diabetes Prediciton Model
100% (1)
Diabetes Prediciton Model
23 pages
Id5059 23 2 1
No ratings yet
Id5059 23 2 1
8 pages
Mini_Project_Proposal_2024-25[1]
No ratings yet
Mini_Project_Proposal_2024-25[1]
5 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Thesis
No ratings yet
Thesis
45 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
31 pages
Predictive Model For Diabetes Using Machine Learning
No ratings yet
Predictive Model For Diabetes Using Machine Learning
38 pages
Final Report
No ratings yet
Final Report
40 pages
109 Sourabh Vivek Chougule
No ratings yet
109 Sourabh Vivek Chougule
75 pages
The Effects of Working Hours On Motivation and Productivity in Malaysia
100% (1)
The Effects of Working Hours On Motivation and Productivity in Malaysia
56 pages
Joshua Erdy Tan: Data Analyst/Professional Teacher
No ratings yet
Joshua Erdy Tan: Data Analyst/Professional Teacher
1 page
ETE Report
No ratings yet
ETE Report
11 pages
Mla Cae 1 QB
No ratings yet
Mla Cae 1 QB
2 pages
Dubi Lavallee Schneider PAT 2012
No ratings yet
Dubi Lavallee Schneider PAT 2012
10 pages
Simple Linear Regression - Assignn5
No ratings yet
Simple Linear Regression - Assignn5
8 pages
Data Analytics for Process Engineers Prediction Control and Optimization
No ratings yet
Data Analytics for Process Engineers Prediction Control and Optimization
3 pages
Factor Analysis True/False Questions
No ratings yet
Factor Analysis True/False Questions
3 pages
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
No ratings yet
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
11 pages
Note For Guidance On The Investigation of Bioavailability & Bioequivalence
No ratings yet
Note For Guidance On The Investigation of Bioavailability & Bioequivalence
19 pages
Uji Asumsi Klasik Aplikasi Analisis Kuantitatif: X1 X2 X3 X4 Y
No ratings yet
Uji Asumsi Klasik Aplikasi Analisis Kuantitatif: X1 X2 X3 X4 Y
12 pages
LRC Editing
No ratings yet
LRC Editing
4 pages
Chapter 6 Correlation and Regression
No ratings yet
Chapter 6 Correlation and Regression
29 pages
956_BSc DataScience Semester 4 DSC D ML Paper 4
No ratings yet
956_BSc DataScience Semester 4 DSC D ML Paper 4
3 pages
Bab Iii
No ratings yet
Bab Iii
5 pages
Timetable - Data Analytics Using R Programming Language
No ratings yet
Timetable - Data Analytics Using R Programming Language
1 page
Baking Soda Vinegar Lab Write Up
No ratings yet
Baking Soda Vinegar Lab Write Up
2 pages
MQM100 MultipleChoice Chapter3
100% (2)
MQM100 MultipleChoice Chapter3
21 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
10 pages
Northwestern QSB Program Syllabus 2017-9-5-1
No ratings yet
Northwestern QSB Program Syllabus 2017-9-5-1
4 pages
Online Shopping in Oman: Obstacles and Challenges
67% (3)
Online Shopping in Oman: Obstacles and Challenges
84 pages
Mit Data Science Program
No ratings yet
Mit Data Science Program
12 pages
Master Ofarts (Psychology) Term-End Examination, 2019 MPC 006
No ratings yet
Master Ofarts (Psychology) Term-End Examination, 2019 MPC 006
3 pages
Research Paper - Samsung Sample
No ratings yet
Research Paper - Samsung Sample
19 pages
Data Analyst Job Roles
No ratings yet
Data Analyst Job Roles
3 pages
Chapter 1 RSCH 24
No ratings yet
Chapter 1 RSCH 24
8 pages
Causal Comparative Study1
No ratings yet
Causal Comparative Study1
25 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

MLPC Midterm

Uploaded by

MLPC Midterm

Uploaded by

SCHOOL OF COMPUTER SCIENCE

MIDTERM TEST (Weightage 10%)

This paper consists of Eight (8) pages, inclusive of this page.

No Student Name Student ID Date Signature Score

Motivation of Purpose Learning Assignment

Module Learning Outcome: On completion of this assessment, students should be able to do

MLO2: To design and develop machine learning algorithms to solve a problem.

Case study for diabetes using machine learning:

The main primary object of this project are as follows:

It is important because of following reasons:

Include the following:

• Missing value handling (e.g., mean imputation, median imputation).

• Outlier treatment (e.g., IQR method, Z-score method).

• Encoding categorical variables (e.g., one-hot encoding, label encoding).

• Normalization or Standardization (e.g., Min-Max Scaling)

Model Training and Hyper Parameter Tuning

Section 5: Model Evaluation (10 marks)

Section 6: Model Interpretation

Section 7: Submission Requirements

The following elements must be included in the slides submission:

1. Font type: Times New Roman

8. All figures and tables are labelled properly.

9. File naming conventions: UniversityID_FirstnameLastname_MidTermTest.pdf

END OF MIDTERM TEST

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.