0% found this document useful (0 votes)

2 views

MLExample1-1

The document outlines a project focused on mortality prediction for sepsis patients using the MIMIC-IV dataset, employing various machine learning techniques including scikit-learn models and a PyTorch neural network. It details the setup, data processing, model training, evaluation metrics, and comparisons between different classifiers, highlighting that while the random forest classifier performs best, the neural network shows potential for improvement. Future work includes exploring data augmentation, different architectures, and additional features to enhance model performance.

Uploaded by

Bhakti Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

MLExample1-1

Uploaded by

Bhakti Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

MIMIC ML/DL

Learning objectives
• Make us of MIMIC-IV for mortality prediction on patients
diagnosed with sepsis.
• Make use of scikit-learn built-in binary classification models for
mortality prediction.
• Build, train, and evaluate a neural network (PyTorch) for
mortality prediction.
Set up and pre-requisites
• First thing we need to do is
make sure latest google-colab
is installed and import
required Python modules.
• Main libraries required are
PyTorch, scikit-learn,
NumPy, pandas, and
matplotlib.
Set up and pre-requisites
• Jupyter notebook assumes
MIMIC-IV dataset
(https://physionet.org/content/mi
miciv/2.2/) has been previously
downloaded from PhysioNet,
unzipped, and uploaded to
Google Drive.
• Thus, we need to mount Google
Colab to Google Drive.
• String constants for multiple
column names found in MIMIC-
IV are also defined.
Data processing and features
• Helper method shown on the
right is used to read MIMIC-IV
csv data files into a pandas
data frame.
• MIMIC-IV files are split into
hosp and icu directories.
Thus, we need to check if
target csv file exists in either
directory.
Data processing and features
• Helper method (first half)
shown on the right is used to
read, filter, merge, and return
raw MIMIC-IV data (as a
pandas data frame) for
hospital admissions
diagnosed with sepsis.
• MIMIC-IV tables merged
include dictionary of ICD-9
and ICD-10 codes, diagnoses
table, admissions table,
patients table, and ICU stays
table.
Data processing and features
• Second half of helper method to get
raw sepsis data is shown on the right.
• Usually complete blood count (CBC)
tests are used to track the progression
of sepsis. We would like to include
these in our data as these could be
helpful features in mortality prediction.
To be more specific, results for
hematocrit, platelet count, and
hemoglobin.
• One issue is MIMIC-IV lab events
table is large. Even with Google Colab
Pro high-ram, trying to load all lab
events data we run out of memory.
The workaround is to load and filter
lab events data in chunks.
Data processing and features
• Below shows execution of previous helper method to load MIMIC-IV raw sepsis data
• This will take several minutes due to the fact lab events table is very large
Data processing and features
• Helper method (first half) shown
on the right is used to
processed previously fetched
raw sepsis data and create
features out of it.
• At a high level:
• Age at admission is calculated
• Race and first care unit column
values are simplified
• CBC lab event results become
separate columns
• Categorical columns become one-
hot encoded
Data processing and features
• Second half of helper method
to process raw sepsis data
and create features is shown
on the right.
• Only hospital admission rows
with lab results for all CBC
tests are included.
• Furthermore, duplicate rows
and rows with missing values
are dropped.
Data processing and features
• Below shows execution of previous helper method to create features from raw sepsis data
Train, validation, and test split
• Helper method on the right is
used to rebalance and split data
into train, validation, and test
splits.
• Rebalancing is done by
randomly down-sampling data
such that target values, i.e.,
hospital_expire_flag, are
equally distributed for binary
classification.
• The split is 70% train, 15%
validation, and 15% test.
Scikit-learn and random baseline binary
classifiers
• We now define a random binary
classifier (shown on right) as a
baseline.
• The end goal is to eventually
train a neural network that
performs better than random
guessing.
• This random binary classifier
learns the probability distribution
of classes from the training
data. And during inference, it
randomly samples from such
learned distribution.
Scikit-learn and random baseline binary
classifiers
• But first, shown on the right is
a simple helper method to
print results.
• We’ll store result metrics for
multiple models inside a
dictionary.
• Some of the metrics we’ll
explore further (and print)
include AUC (ROC) score, F1
score, precision score, recall
score, and accuracy.
Scikit-learn and random baseline binary
classifiers
• Method on the right evaluates
following binary classifiers (most
from scikit-learn):
• Logistic regression
• Linear support vector machines
• Decision tree classifier
• Random forest classifier
• Gaussian naïve Bayes
• K-neighbors classifier
• Random (baseline) classifier
• Each model is trained on training
data and evaluated on test dataset.
Result metrics are stored in
dictionary and printed to console.
Scikit-learn and random baseline binary
classifiers
• Below shows execution of evaluating scikit-learn binary classifiers and random (baseline) classifier
• Result metrics are printed to console, but these are visualized and discussed on in slides that follow
Scikit-learn and random baseline binary
classifiers
• Method on the right visualizes
result metrics (stored in
dictionary).
• For each evaluation metric a
horizontal bar plot is created.
• Horizontal bar for neural
network (in future slides) is
colored red for easy
comparison.
• In the slides that follow are
these plots.
Scikit-learn and random baseline binary
classifiers
Scikit-learn and random baseline binary
classifiers
Scikit-learn and random baseline binary
classifiers
• Key insights:
• Random forest classifier
performs best at mortality
prediction overall. With an
accuracy of ~0.644.
• Random (baseline) classifier
performs the worst overall (as
expected), with ~0.5155
accuracy and ~0.5164 AUC
(ROC).
Train and evaluate PyTorch neural
network
• Now, we will build, train, and evaluate a
very simple PyTorch neural network.
• First, we need to define a device. This is
helpful if running on a GPU. However,
GPU is not required.
• On the right is implementation of a very
simple neural network. It takes as input:
• input_size – The number of features in the
training data.
• hidden_layer_size – The number of
neurons in the hidden layer.
• The architecture of this neural network is
very simple: a single hidden layer followed
by a single neuron and a sigmoid layer.
Between these we make use of dropout
(20%) for regularization and a leaky ReLu
function (for non-linearity).
Train and evaluate PyTorch neural
network
• Before proceeding, we need
to define a custom dataset
wrapper object, shown on the
right.
• Data, features and labels, are
stored as PyTorch tensors
(float) to device.
Train and evaluate PyTorch neural
network
• On the right we define helper
methods to train and evaluate
neural network a single epoch.
• At a high-level, train loop makes
a model prediction, calculates
the loss, and performs one step
of back propagation. A running
loss over the training data is
computed and returned.
• The evaluation loop returns the
loss of model prediction over all
validation dataset.
Train and evaluate PyTorch neural
network
• Putting all previous helper methods
together, the train method on the
right performs the end-2-end
training of the neural network.
• Train method takes as input the
model to train, a training data
loader, a validation data loader,
and multiple hyperparameters such
as number of epochs, the loss
function, and an optimizer.
• We’ll train for 1000 epochs using a
batch size of 64.
• Adam optimizer and binary cross
entropy loss are used for training.
Train and evaluate PyTorch neural
network
• Running end-2-end training of
neural network takes a minute
or two. Train and validation
losses are printed for each
epoch (screenshot on right)
and returned at the end.
Train and evaluate PyTorch neural
network
• Plotting train and validation
losses after end-2-end
training of neural network
some observations:
• Train loss decreases
throughout epochs.
• However, validation loss
plateaus around ~0.66.
• We need to be very careful
with underfitting on the
training data and overfitting
on the validation data.
Train and evaluate PyTorch neural
network
• Helper methods on the right
are used to evaluate neural
network on test dataset. This
includes computing multiple
metrics such as AUC (ROC),
F1, precision, recall, and
accuracy.
• The best threshold is also
computed based on ROC
curve.
Train and evaluate PyTorch neural
network
• Evaluating neural network
metrics are printed to console
and stored in existing results
dictionary.
• In next slides we plot and
compare all models, including
this trained neural network.
Compare all models
Compare all models
Compare all models
• Key insights:
• Trained neural network
performs better than random
guessing.
• However, random forest
classifier still performs better at
mortality prediction compared
to neural network with regards
to accuracy.
• Neural network outperforms all
other models on recall but
underperforms on precision.
Compare all models
• In last few slides we will plot
ROC curves for all models.
• Code (shown on the right)
comes from
https://jovian.com/vipul0036vi
pul/how-to-find-optimal-
threshold-for-binary-
classification-roc-curve
Compare all models Key insights:
• ROC for random classifier is as expected
Compare all models
Compare all models Key insights:
• Gaussian naïve Bayes, which performed the
worst, has similar a ROC as random
guessing
Compare all models Key insights:
• ROC for neural network and random forest
classifier are similar. But there is still a lot of
room for improvement.
Learnings and future work
• Some things learned:
• Mortality prediction on MIMIC-IV (framed as binary classification) is
difficult.
• We see a very simple neural network does perform better than a
random classifier. However, there is still a lot of room for improvement.
• Future work / enhancements include:
• Making use of data augmentation to train better performing model.
• Play with different model architectures and training hyperparameters.
Though we need to be careful not to overfit on the training data.
• Make use of additional features for the training data. For example,
more lab event results, prescribed medications, etc.

minor project
No ratings yet
minor project
21 pages
Assignment 3 Dl
No ratings yet
Assignment 3 Dl
6 pages
Introduction To Scikit Learn
100% (1)
Introduction To Scikit Learn
108 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
4 pages
Coding Neural Networks-Classification & Regression
No ratings yet
Coding Neural Networks-Classification & Regression
39 pages
2018 02 Msu Data Science
No ratings yet
2018 02 Msu Data Science
65 pages
phyton
No ratings yet
phyton
10 pages
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
100% (1)
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
23 pages
Deep Learning and Machine Learning: Lab Explanation
No ratings yet
Deep Learning and Machine Learning: Lab Explanation
34 pages
ML 4 To 9 Keyur
No ratings yet
ML 4 To 9 Keyur
21 pages
Rahul Phase 4...
No ratings yet
Rahul Phase 4...
13 pages
04_MLModelingBasics
No ratings yet
04_MLModelingBasics
61 pages
DL Lab-final
No ratings yet
DL Lab-final
22 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
Scikit Learn
No ratings yet
Scikit Learn
25 pages
Mi-90 1
No ratings yet
Mi-90 1
24 pages
Binary Classification Tutorial With The Keras Deep Learning Library
No ratings yet
Binary Classification Tutorial With The Keras Deep Learning Library
33 pages
Module 5.pptx_20250608_201231_0000
No ratings yet
Module 5.pptx_20250608_201231_0000
43 pages
Hca Unit - 2 Answers
No ratings yet
Hca Unit - 2 Answers
22 pages
DL Practical 02 Binary Class Classifier Using ANN
No ratings yet
DL Practical 02 Binary Class Classifier Using ANN
5 pages
Topology Design Through Evolution
No ratings yet
Topology Design Through Evolution
7 pages
Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur - No Free Hunch
No ratings yet
Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur - No Free Hunch
22 pages
Recognizing Patterns __ Getting Started (Neural Network Toolb..
No ratings yet
Recognizing Patterns __ Getting Started (Neural Network Toolb..
10 pages
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
No ratings yet
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
8 pages
Lecture 2: Introduction To Pytorch
No ratings yet
Lecture 2: Introduction To Pytorch
7 pages
Your First Deep Learning Project in Python With Keras Step-By-Step
No ratings yet
Your First Deep Learning Project in Python With Keras Step-By-Step
229 pages
machine learning
No ratings yet
machine learning
39 pages
END_TO_END_PROJECT
No ratings yet
END_TO_END_PROJECT
21 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
ChatGPT_MyLearning on Coding for Machine Learning
No ratings yet
ChatGPT_MyLearning on Coding for Machine Learning
16 pages
Binary Classification Using Convolution Neural Network (CNN) Model by Mayank Verma Medium 2
No ratings yet
Binary Classification Using Convolution Neural Network (CNN) Model by Mayank Verma Medium 2
1 page
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
ML Libraries
No ratings yet
ML Libraries
19 pages
Neural Network Classification With
No ratings yet
Neural Network Classification With
25 pages
Scalable-ML-3 4 1
No ratings yet
Scalable-ML-3 4 1
147 pages
Artificial Intelligence & Machine Learning Lab With Applications
No ratings yet
Artificial Intelligence & Machine Learning Lab With Applications
6 pages
Keras-tensorflow-IT Haarlem 2023
No ratings yet
Keras-tensorflow-IT Haarlem 2023
35 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
13 pages
Final 1
No ratings yet
Final 1
36 pages
AMAN
No ratings yet
AMAN
51 pages
1 - An Introduction To Machine Learning With Scikit-Learn
No ratings yet
1 - An Introduction To Machine Learning With Scikit-Learn
9 pages
CS215 LectureSlidesSet2 IntroductionToMachineLearning AI
No ratings yet
CS215 LectureSlidesSet2 IntroductionToMachineLearning AI
112 pages
PyTorch Workflow Fundamentals - Zero To Mastery Learn PyTorch For Deep Learning
No ratings yet
PyTorch Workflow Fundamentals - Zero To Mastery Learn PyTorch For Deep Learning
43 pages
ML - Hands On
No ratings yet
ML - Hands On
24 pages
Mental Illness Prediction Using Deep Learning
No ratings yet
Mental Illness Prediction Using Deep Learning
58 pages
Python Predictive Modeling
No ratings yet
Python Predictive Modeling
24 pages
Project Documentation
No ratings yet
Project Documentation
24 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
92 pages
DL Lab-III-II
No ratings yet
DL Lab-III-II
98 pages
Lecture W1c UG
No ratings yet
Lecture W1c UG
33 pages
vnd.openxmlformats-officedocument.wordprocessingml.document&rendition=1-10
No ratings yet
vnd.openxmlformats-officedocument.wordprocessingml.document&rendition=1-10
13 pages
Week_2
No ratings yet
Week_2
17 pages
ML Project
No ratings yet
ML Project
11 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Lab Manual-ANN
No ratings yet
Lab Manual-ANN
7 pages
Chapter 11 Neural Nets (Python)
No ratings yet
Chapter 11 Neural Nets (Python)
43 pages
CS335 Lab6
No ratings yet
CS335 Lab6
7 pages
Machine Learning
100% (1)
Machine Learning
21 pages
Confusion Matrix
No ratings yet
Confusion Matrix
6 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
RealNumbers_limited
No ratings yet
RealNumbers_limited
2 pages
IX_AI_UNIT_1_HO-2
No ratings yet
IX_AI_UNIT_1_HO-2
5 pages
078) Real Numbers DHA 02 Udaan 2026
No ratings yet
078) Real Numbers DHA 02 Udaan 2026
3 pages
057) Life Processes 02 Class Notes Udaan 2026
No ratings yet
057) Life Processes 02 Class Notes Udaan 2026
19 pages
NMTC Sub Junior 2022 Solutions
No ratings yet
NMTC Sub Junior 2022 Solutions
21 pages
097) Literary Devices DHA 01 Udaan 2026
No ratings yet
097) Literary Devices DHA 01 Udaan 2026
4 pages
Preposition Ahw 2309081727 0383223804 9
No ratings yet
Preposition Ahw 2309081727 0383223804 9
7 pages
Food Warriors
No ratings yet
Food Warriors
7 pages
Plant Trees Else Earth
No ratings yet
Plant Trees Else Earth
8 pages
Ab - Class Viii - Sample Paper-02
No ratings yet
Ab - Class Viii - Sample Paper-02
15 pages
Aryabhatt Ganit Chalenge Last Year Question Paper
67% (3)
Aryabhatt Ganit Chalenge Last Year Question Paper
9 pages
Algebraic Expression and Identities
No ratings yet
Algebraic Expression and Identities
2 pages
Java Architecture 1
No ratings yet
Java Architecture 1
1 page
RhinoSDK 1.4.4 Admin Manual
No ratings yet
RhinoSDK 1.4.4 Admin Manual
148 pages
SWOT Analysis of Telecom Sector in India
No ratings yet
SWOT Analysis of Telecom Sector in India
28 pages
Identification of Stock Market Manipulation With D
No ratings yet
Identification of Stock Market Manipulation With D
12 pages
E4 DS203 2023 Sem2
No ratings yet
E4 DS203 2023 Sem2
2 pages
How To Improve Model
No ratings yet
How To Improve Model
27 pages
Building Machine Learning Systems with a Feature Store: Batch, Real-Time, and LLM Systems (Early Release) Jim Dowling all chapter instant download
100% (1)
Building Machine Learning Systems with a Feature Store: Batch, Real-Time, and LLM Systems (Early Release) Jim Dowling all chapter instant download
62 pages
Predicting Coronary Heart Disease Using An Improved LightGBM Model Performance Analysis and Comparison
No ratings yet
Predicting Coronary Heart Disease Using An Improved LightGBM Model Performance Analysis and Comparison
15 pages
Confusion Matrix & Evaluation Metrics in Machine Learning
No ratings yet
Confusion Matrix & Evaluation Metrics in Machine Learning
23 pages
Fai Micro Report
No ratings yet
Fai Micro Report
25 pages
Semantics Analysis of Agricultural Experts Opinions For Crop Productivity Through Machine Learning
No ratings yet
Semantics Analysis of Agricultural Experts Opinions For Crop Productivity Through Machine Learning
17 pages
Deep Transfer Learning Based Classification Model For Covid-19 Using Chest CT-scans
No ratings yet
Deep Transfer Learning Based Classification Model For Covid-19 Using Chest CT-scans
7 pages
NLP Mini Project
No ratings yet
NLP Mini Project
26 pages
Multiple Object Tracking Using Deep Learning With Yolo v5 IJERTCONV9IS13010
No ratings yet
Multiple Object Tracking Using Deep Learning With Yolo v5 IJERTCONV9IS13010
5 pages
Medical Image Feature, Extraction, Selection and Classification
No ratings yet
Medical Image Feature, Extraction, Selection and Classification
6 pages
A Comprehensive Review For Chronic Disease Prediction Using Machine Learning Algorithms
No ratings yet
A Comprehensive Review For Chronic Disease Prediction Using Machine Learning Algorithms
28 pages
Case Study 1
No ratings yet
Case Study 1
8 pages
NLP UNIT-I Part-II
No ratings yet
NLP UNIT-I Part-II
17 pages
CS 3308 Discussion Assignment Unit 6
No ratings yet
CS 3308 Discussion Assignment Unit 6
5 pages
Machine Learning With Naive Bayes Course Notes 365 Data Science
No ratings yet
Machine Learning With Naive Bayes Course Notes 365 Data Science
32 pages
Scene Change Detection
No ratings yet
Scene Change Detection
31 pages
Svmlight: - Svmlight Is An Implementation of Support Vector Machine (SVM) in C. - Download Source From
No ratings yet
Svmlight: - Svmlight Is An Implementation of Support Vector Machine (SVM) in C. - Download Source From
8 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Prediksi Pasien PusKesMas
No ratings yet
Prediksi Pasien PusKesMas
6 pages
Chung Et Al. - 2022 - Anomaly Detection in Additive Manufacturing Proces
No ratings yet
Chung Et Al. - 2022 - Anomaly Detection in Additive Manufacturing Proces
13 pages
ML Question Bank Unit 1 2
No ratings yet
ML Question Bank Unit 1 2
6 pages
Amurd: Annotated Arabic-English Receipt Dataset For Key Information Extraction and Classification
No ratings yet
Amurd: Annotated Arabic-English Receipt Dataset For Key Information Extraction and Classification
11 pages
Comparison of Face Recognition Accuracy of ArcFace Facenet and Facenet512 Models On Deepface Framework
No ratings yet
Comparison of Face Recognition Accuracy of ArcFace Facenet and Facenet512 Models On Deepface Framework
5 pages
Sustainability 15 02754 v2 1
No ratings yet
Sustainability 15 02754 v2 1
14 pages
Identification of Emotions From Facial Gestures in
No ratings yet
Identification of Emotions From Facial Gestures in
15 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
Sm-Detector: A Security Model Based On Bert To Detect Smishing Messages in Mobile Environments
No ratings yet
Sm-Detector: A Security Model Based On Bert To Detect Smishing Messages in Mobile Environments
15 pages
A Comparative Study On Predicting The Probability of Liver Disease IJERTV8IS100314 PDF
No ratings yet
A Comparative Study On Predicting The Probability of Liver Disease IJERTV8IS100314 PDF
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

MLExample1-1

Uploaded by

MLExample1-1

Uploaded by

MIMIC ML/DL

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.