0% found this document useful (0 votes)
49 views

The Machine Learning Audit Andrew Clark

This document discusses machine learning audits. It defines machine learning and why audits are important as algorithms increasingly impact lives. A machine learning audit examines a model's purpose, process, execution and monitoring. The document outlines the CRISP-DMA framework for auditing, which extends an existing data mining process to include steps for business understanding, data preparation, modeling, evaluation and deployment. It provides an example audit of a weather prediction model built with a Raspberry Pi to predict rain.

Uploaded by

Wenbo Pan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

The Machine Learning Audit Andrew Clark

This document discusses machine learning audits. It defines machine learning and why audits are important as algorithms increasingly impact lives. A machine learning audit examines a model's purpose, process, execution and monitoring. The document outlines the CRISP-DMA framework for auditing, which extends an existing data mining process to include steps for business understanding, data preparation, modeling, evaluation and deployment. It provides an example audit of a weather prediction model built with a Raspberry Pi to predict rain.

Uploaded by

Wenbo Pan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

The Machine Learning Andrew Clark, Principal

Machine Learning Auditor


Audit Capital One
Overview
❖ What is a Machine Learning?
❖ Why is it important?
❖ Why do we need machine
learning audits?
❖ What exactly is a machine
learning audit?
❖ What would a machine
learning audit entail?
❖ Full-length example using the
CRISP-DMA framework Kong, Qingkai . "Machine Learning 1 - What is machine learning and real world
example." Qingkai's Blog (web log), October 4, 2016. Accessed February 21, 2017.
http://qingkaikong.blogspot.com/2016/10/machine-learning-1-what-is-
machine.html?showComment=1484689212391#c4748865641151946089.
What is Machine Learning?

❖ A computer recognizing patterns without having to be


explicitly programmed.
Why is Machine Learning
Important?
❖ Disrupting business. Example ML powered businesses disrupted
Blockbuster, Taxis, etc.
❖ Revolutionizing existing business models. Predictive maintenance in
manufacturing, retailing, credit card fraud detection, loan underwriting.
❖ One of the key technologies in driving economic growth.
❖ One of the most talked about but least understood topics in modern
discourse. e.x. “Facebook shuts down robots after they invent their own
language” (The Telegraph August 1, 2017) and “Elon Musk: regulate AI to
combat 'existential threat' before it's too late” (The Guardian July 17, 2017).
❖ Sensational stories are clickbait.
What Machine Learning is not:

❖ Magic
❖ Going to take your job (for the majority of professionals)
❖ Always the best tool for the job
Why do we need machine learning
audits?

❖ With algorithms increasingly dictating our lives, how


do we know that they are operating as intended?
❖ e.x. Weapons of Math Destruction by Cathy O'Neil
❖ Some believe the EU General Data Protection
Regulation act provides a “Right to Explanation”,
although this is not explicitly stated and is untested in
the courts.
What exactly is a machine learning
audit?

❖ Examination of the purpose, process, execution, and monitoring


of a machine learning model ‘in the wild’.
❖ As assurance professionals, how do we know that the model is
doing what it should be doing? What is the risk to the business?
❖ Data Science is a new discipline, without the formal rigor and
mature of processes that exist in other disciplines. Statistics is a
profession that has been around for years, yet there are so many
issues with the peer review process of statistics, and their
models aren’t as complicated!
What would a machine learning audit
entail?
❖ Understand the business use case.
❖ Model integration into existing architecture.
❖ Potential regulatory or risk constraints
❖ “Data Sciencey stuff” – i.e.
❖ How was the test data obtained?
❖ How was it cleaned data cleaned?
❖ How was the feature engineering conducted?
❖ How was the specific algorithm decided upon?
❖ Are there correction cascades?
❖ How was the model evaluated?
❖ What was the process to prevent overfitting, etc.
❖ Is the model accomplishing what the business wanted it to accomplish?
Introducing the CRISP-DMA
framework
❖ Framework written by yours truly that extends the industry standard data mining
framework, CRISP-DM to auditing machine learning implementations.
❖ Leverages that existing, eight, iterative steps of the CRISP-DM model:
❖ Business Understanding
❖ Data Understanding
❖ Data Preparation
❖ Modeling
❖ Evaluation
❖ Deployment
Business Understanding

❖ What is the goal of the algorithm?


❖ Have models been used in this use case before?
❖ What attributes, i.e. temperature, humidity, etc., have
been identified by the business as key factors for
deriving the desired decision in the given use case?
❖ Are there any regulatory constraints or considerations
of which to be aware?
Data Understanding

❖ What dataset[s] was utilized to train the model?


❖ What dataset[s] is utilized for production prediction?
❖ Where did the data set[s] identified in 1,2 originate? I.e.
web scrapped data, log files, relational databases.
❖ Are all of the input variables in the same format? I.e.
miles or kilometers.
❖ Have the correlations and covariances been examined?
Data Preparation

❖ How was the data cleaned?


❖ If supervised learning was used, how was the training dataset created?
❖ Were standard software development techniques used for the ETL
process for production models?
❖ How was the data scaled?
❖ How were the variables selected? Was an automated variable selection
technique utilized?
❖ What process was used to separate the data into train and test sets?
Was care taken to avoid peaking at the test set?
Modeling
❖ What was the thought process behind choosing
algorithm[s] for the model?
❖ What steps were used to guard against overfitting?
❖ What process was used to optimize the chosen algorithm?
❖ Was the algorithm coded from scratch or was a standard
library used? If so, what are the license terms of the
library?
❖ What type of version control was utilized?
Evaluation

❖ What metrics were used to evaluate the model?


❖ What process and metrics are in place to monitor the
continued accuracy and stability of the model?
❖ Create a mock dataset that covers all of the relevant
assumptions and run the results through the algorithm
to test that it is operating as intended.
Deployment

❖ How was the model moved to production? Was it


rewritten by the engineering team, or does it rely on an
API, etc., (if it was rewritten, a code review for accuracy
should be performed).
❖ Is the model accomplishing what the business wanted it
to accomplish?
Raspberry Pie Machine Learning Weather Prediction - A
simple example
Architecture Diagram
Raspberry Pi readings and actual weather
Aggregate readings to one average reading every thirty
minutes
Aggregation cont.
Convert the status to 1 if the status is rain or thunderstorm, 0
otherwise
Split the data into training and test sets
View model accuracy
Examine model weights
Test the model by manually passing in
observations
Conclusion and Recap
❖ What machine learning is.
❖ Why machine learning is important.
❖ Why we need machine learning audits.
❖ What constitutes a machine learning audit.
❖ What a machine learning audit entails.
❖ Overview of the CRISP-DMA framework.
❖ Simple end to end machine learning audit example using the
CRISP-DMA framework.
Thank you!

❖ Email: andrewtaylorclark@gmail.com
❖ GitHub: aclarkData
❖ Blog: https://aclarkdata.github.io/
❖ LinkedIn: www.linkedin.com/in/andrew-clark-
b326b767

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy