0% found this document useful (0 votes)
411 views

How To Document Your Data Science Project

The document provides guidance on how to effectively document a data science project. It recommends including comments in code to explain major steps and rules for writing clear comments. Key areas that should be documented in a technical report include background, success metrics, data collection, tools used, data cleaning, feature selection, model training methods, model evaluation, output presentation, best performing model, model explainability, deployment, improvements, challenges faced, new research ideas, team members, and references. Documentation of a project in a technical report or Word/PDF format helps distinguish the work from others and makes it publishable or reproducible.

Uploaded by

yinka omojesu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
411 views

How To Document Your Data Science Project

The document provides guidance on how to effectively document a data science project. It recommends including comments in code to explain major steps and rules for writing clear comments. Key areas that should be documented in a technical report include background, success metrics, data collection, tools used, data cleaning, feature selection, model training methods, model evaluation, output presentation, best performing model, model explainability, deployment, improvements, challenges faced, new research ideas, team members, and references. Documentation of a project in a technical report or Word/PDF format helps distinguish the work from others and makes it publishable or reproducible.

Uploaded by

yinka omojesu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

HOW TO DOCUMENT YOUR

DATA SCIENCE PROJECT


Learn how to document your data
science/machine learning project

www.datalab.com.ng
Introduction
Good Data Scientists know the value of
documentation. Project documentation
helps in tracking every step in a project
while maintaining workflow. Good
documentation also allows stakeholders
to comprehend every aspect of a
project. We examine some steps on
how to effectively document your next
data science project.
COMMENTS
Comments are text added to your code for more explainability.
When working on data science project using programming
languages such as python, R or SQL you can use the # or // key
for commenting out major steps in your code which will help
you in tracking what a code is doing. Good use of comments can
also make code maintenance easy, as well as help you find bugs
in your code faster. Clean codes are well commented and easy to
understand by others. As a Data Scientist make it a practice to
comment out major steps in your project such as adding few
statements alongside your code. This should however be done
only for major steps so you won’t overdo it. Below are some
rules for commenting your code:
9 rules on writing excellent
comments on your code
Rule 1: Comments should not duplicate the code.

Rule 2: Good comments do not excuse unclear code.

Rule 3: If you can’t write a clear comment, there may


be a problem with the code.
Rule 4: Comments should dispel confusion, not cause
it.
Rule 5: Explain unidiomatic code in comments.

Rule 6: Provide links to the original source of copied


code.
Rule 7: Include links to external references where they
will be most helpful.

Rule 8: Add comments when fixing bugs.

Rule 9: Use comments to inform about incomplete


implementations or updates.

(List Source: Best practices for writing code comments, by Ellen Spertus,
https://stackoverflow.blog/2021/07/05/best-practices-for-writing-code-comments/ )

TECHNICAL REPORT
Documenting your data science project as a
technical report can help in distinguishing your
work from others. Reporting makes your work
publishable or reproducible. It takes your data
science project from the coding environment to
a technical document that can be consumed
by others. Here we listed major areas you need
to report when writing a professional report for
your data science project.
Background to the project (What problem
1 you are trying to solve and why).

2 What are the success metrics or key


performance indicators (KPIs) for your
project should your model work well (E.g.
Achieve prediction accuracy of up to 90%
in predicting fraud cases, Reduce customer
waiting time down to 10%).

3 How you collected your data (public data,


scrapped data, argumented data, data
require permission to access e.t.c.).

Your tools (Programming languages or


4 Business Intelligence Dashboards).

5 How you cleaned your data (what are the


data cleaning techniques you adopted?).

6 How you selected features from your data


(How did you select features?, Did you do
any feature transformation or engineering?
Did you find any interesting interactions
between features?, What are the important
drivers/features relevant to your stated
KPIs that you used for model building?,
Did you discard any feature?)

7 The training method(s) you used (Linear


Regression, Logistics Regression, XGBoost,
Did you ensemble models e.t.c.).
How long it took your model to train and make
8 predicions (It is very important you report on
model computational time).

How you evaluated your model performance


9 (E.g Did you explain your Confusion Matrix
and AUC-ROC Curve? Did you explain your
regression result estimate values and P-Values,
Did you explain your Adjusted R-Squared?
e.t.c.).

How you presented your output or predictions


10 (E.g. Accuracy, Precision and Recall Scores,
Probability or Likelihood Percentage, Adjusted
R-Squared, Visualizations e.t.c.).

Which model achieved the best result


11 overall?

Is there a subset of features that would get


12 90-95% of your final model performance?
Which features?

13 Did you try any simpler model that can


achieve the same result as your best
model? (E.g. Aim for Model simplicity).

14 How you made your model explainable


(E.g. Did you use any model explainability
library?)
15 How did you deloy your model (E.g Did you
prototype it with a web or mobile
application?, what are the technology stack
you used for the web or mobile application?).

16 Are there suggestions on how the project


can be improved if other people wish to
work on it?

17 Are there technical challenges you faced


during the project that other people can
avoid?

18 Are there new ideas generated from the


project which could form a new research
area?

19 Did you mention your team members who


also worked on the project (if any)?.

20 Did you reference the original source of


copied codes or external materials used?

(Some tips taken from: www.kaggle.com/WinningModelDocumentationGuidelines)

Conclusion
As a Data Scientist keep in mind that your project may
be read by people with technical and non-technical
backgrounds and should aim to be clear and well
documented. Documentation can be written in Word
or PDF format.
NEED MORE INSIGHT?

Join Our Data

science Internship

Program

We are a team of Data Scientists


and AI specialists. We analyze
and generate actionable insights
from data to execute innovative
Artificial Intelligence solutions
that drives various businesses
ADDRESS:
Suite 33 Mazfalah
Plaza, Karu Site,
Abuja, Nigeria

PHONE:
+2348038518576
WEBSITE:
www.datalab.com.ng

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy