Detection of Autism Spectrum Disorder
Detection of Autism Spectrum Disorder
Detection of Autism Spectrum Disorder
This is to certify that the Mini Project Report entitled “DETECTION OF AUTISM
SPECTRUM DISORDER” is being submitted by SRIRAMOJU SAI
SHARANYA(21271A6638), SARDARNI HARPREETH KOUR(21271A6617),KATTEKOLA
STANLEY RICHARDS(21271A6648),GADAPA MANIKANTA(21271A6617) in partial
fulfillment of the requirements for the award of the Degree of Bachelor of Technology in
Computer Science & Engineering to the Jyothishmathi Institute of Technology & Science,
Karimnagar, during academic year 2024-2025, is a Bonafide work carried out by them under my
guidance and supervision.
The results presented in this Project Work have been verified and are found to be satisfactory.
The results embodied in this Project Work have not been submitted to any other University for
the award of any other degree or diploma.
R.SATYATEJA Dr.R.Jegadeesan
Asst.Professor Professor & HOD
EXTERNAL EXAMINER
ACKNOWLEDGEMENT
We would like to express our sincere gratitude to our advisor, R.SATYA TEJA , Asst.
Professor, CSE Dept, whose knowledge and guidance has motivated us to achieve goals we
never thought possible. The time we have spent working under her/his supervision has truly
been a pleasure.
The experience from this kind of work is great and will be useful to us in future. We thank
DR. R. JEGADEESAN, Professor & HOD CSE Dept for his effort, kind cooperation,
guidance and encouraging us to do this work and also for providing the facilities to carry out
this work.
We thank all the Faculty members of the Department of Computer Science & Engineering
for sharing their valuable knowledge with us We extend our thanks to the Technical Staff of
the department for their valuable suggestions to technical problems Finally Special thanks to
our parents for their support and encouragement throughout our life and this course Thanks to
all our friends and well-wishers for their constant support.
DECLARATION
We hereby declare that the work which is being presented in this dissertation entitled,
“DETECTION OF AUTISM SPECTRUM DISORDER”, submitted towards the partial
fulfillment of the requirements for the award of the degree of Bachelor of Technology in
Computer Science & Engineering, Jyothishmathi Institute of Technology & Science,
Karimnagar is an authentic record of our own work carried out under the supervision of R.
SATYA TEJA ASST. PROFESSOR CSE Dept, Jyothishmathi Institute of Technology and
Science, Karimnagar.
To the best of our knowledge and belief, this project bears no resemblance with any report
submitted to JNTUH or any other University for the award of any degree or diploma.
Date:
Place: Karimnagar
ABSTRACT
The dataset comprises key features, including behavioral assessment scores (A1 to A10), age in
months, Qchat-10 scores, gender, ethnicity, history of jaundice at birth, family history of autism,
and the relationship of the person completing the screening. The target variable indicates the
presence of autism traits.
The data preprocessing phase focused on converting categorical variables into numerical formats,
addressing missing or irrelevant data, and standardizing column names. Exploratory Data Analysis
(EDA) provided visual insights into feature distributions and correlations, identifying Qchat-10-
Score and family history as significant predictors.
Five machine learning classifiers were trained and evaluated: Random Forest, Support Vector
Machine (SVM), K-Nearest Neighbors (KNN), Decision Tree, and Gaussian Naive Bayes.
Performance was assessed using accuracy, precision, recall, F1-score, and ROC-AUC metrics.
Among these, the Random Forest Classifier emerged as the best-performing model with high
accuracy and interpretability.
Key outputs include a serialized model for deployment, visualizations of feature importance and
model performance, and actionable insights on factors influencing ASD detection. This project
demonstrates the potential of machine learning in autism screening, offering a robust, scalable
solution to support early diagnosis and intervention. The integration of this model into practical
applications can significantly aid caregivers and healthcare professionals in identifying autism traits
effectively.
ⅰ
TABLE OF CONTENTS
ABSTRACT i
LIST OF FIGURES v
LIST OF ABBREVIATIONS vi
1 INTRODUCTION 1
1.1 Project Overview 1
2 LITERATURE REVIEW 4
3 EXISTING & PROPOSED SYSTEM 7
3.1 Existing System 7
4 SYSTEM REQUIREMENTS 10
4.1 Software Requirements 10
ii
6 SYSTEM DESIGN 14
6.1 System Architecture 14
7 SOFTWARE SPECIFICATIONS 21
7.1 Python 21
8 IMPLEMENTATION 24
8.1 Installations 24
8.2 Code 25
9 SOFTWARE TESTING 32
9.1 Unit Testing 32
31
9.4 Acceptance Testing 33
9.5 Performance Testing 33
9.6 Security Testing 33
10 RESULTS 34
iii
11 FUTURE SCOPE 42
12 CONCLUSION 43
REFERENCES 44
iv
LIST OF FIGURES
v
LIST OF ABBREVIATIONS
vi
7
CHAPTER 1
INTRODUCTION
Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition that requires early
detection to ensure timely intervention and improved developmental outcomes for toddlers. The
Detection of Autism Spectrum Disorder project aims to build a machine learning model capable of
identifying autism traits in toddlers by analyzing behavioral and demographic data.
The project utilizes a comprehensive dataset containing behavioral assessment scores (A1 to A10),
age in months, Qchat-10 scores (quantifying communication and social skills), gender, ethnicity,
jaundice history at birth, family history of autism, and the relationship of the person completing the
screening. The target variable indicates the presence of autism traits.
The pipeline begins with data preprocessing, including converting categorical variables into
numerical formats, handling missing data, and standardizing column names. Exploratory Data
Analysis (EDA) highlights significant correlations between features, offering visual insights
through bar plots, heatmaps, and feature distribution graphs.
Five machine learning models—Random Forest, Support Vector Machine (SVM), K-Nearest
Neighbors (KNN), Decision Tree, and Gaussian Naive Bayes—were trained and evaluated using
metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. The Random Forest Classifier
outperformed the other models, achieving the highest evaluation metrics and proving to be the most
suitable for ASD detection.
The project outputs include a serialized model ready for deployment, comprehensive visualizations
of data and model performance, and insights into feature importance. This project demonstrates
how machine learning can aid caregivers and healthcare professionals in efficiently screening for
autism traits, promoting early diagnosis and intervention.
8
1.2 PROJECT PURPOSE
The purpose of the “Detection of Autism Spectrum Disorder” project is to develop an efficient,
accurate, and scalable solution for the early screening of Autism Spectrum Disorder (ASD) traits in
toddlers. ASD is a neurodevelopmental condition that manifests through challenges in
communication, social interaction, and repetitive behaviors. Early identification of autism traits is
crucial for timely interventions, as they can significantly improve developmental outcomes,
enhance quality of life, and reduce the long-term impact on families and communities.
Despite the importance of early detection, traditional screening methods often face challenges such
as subjectivity, accessibility, and cost, limiting their effectiveness in large-scale applications. This
project aims to address these challenges by leveraging machine learning technology to provide an
automated, objective, and data-driven approach to ASD detection.
The project’s machine learning model is designed to analyze a variety of inputs, including
behavioral assessment scores, demographic factors, and health-related variables, to predict the
presence of autism traits. By training multiple classifiers—Random Forest, Support Vector Machine
(SVM), K-Nearest Neighbors (KNN), Decision Tree, and Gaussian Naive Bayes—the project
evaluates their performance to identify the most accurate and reliable model for ASD prediction.
This initiative not only offers a tool for healthcare professionals and caregivers to screen for autism
traits but also enhances understanding of the factors contributing to ASD diagnosis. Insights derived
from the data, such as the impact of Qchat-10 scores, family history, and demographic variables,
can further support research into autism risk factors.
Moreover, the serialized machine learning models produced in this project are designed for
integration into real-world applications, such as web-based tools or mobile apps, making them
accessible to a broader audience. This ensures that the solution is not only scientifically robust but
also practical and user-friendly, addressing a critical need for early ASD screening in diverse
settings.
Ultimately, this project strives to advance ASD diagnosis capabilities and provide a foundation for
future innovations in autism research and intervention strategies.
9
1.3 PROJECT SCOPE
The dataset includes behavioral assessment scores, demographic information, and health-related
factors such as Qchat-10 scores, family history of autism, and jaundice history at birth. These
features enable the model to predict the likelihood of autism traits accurately. The project aims
to preprocess the data, including converting categorical variables to numerical values, handling
missing data, and ensuring consistency in column formatting.
EDA is conducted to uncover relationships between features and their impact on autism traits,
providing visual insights through plots and heatmaps. Five machine learning algorithms—
Random Forest, SVM, KNN, Decision Tree, and Gaussian Naive Bayes—are trained and
evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics.
The project delivers a serialized machine learning model, visualizations, and insights into
significant predictors of autism traits. It is designed for practical use in real-world applications,
such as integration into web or mobile platforms, ensuring accessibility and scalability for
caregivers and healthcare professionals. .
10
1.4 PROJECT FEATURES
The Detection of Autism Spectrum Disorder project incorporates several key features to ensure a
robust, accurate, and scalable solution for early autism detection. These features include:
1. Comprehensive Dataset
• The project utilizes a rich dataset that includes behavioral assessment scores (A1 to A10),
age, gender, ethnicity, Qchat-10 scores, jaundice history at birth, and family history of
autism.
• The target variable, “Traits,” indicates the presence of autism traits in toddlers.
2. Data Preprocessing
• Categorical variables are converted into numerical formats for seamless analysis (e.g.,
Yes/No to 1/0).
• Missing and irrelevant data are handled effectively to improve model performance.
• Consistent formatting ensures compatibility and accuracy in downstream processes.
• Visual tools like Seaborn and Matplotlib are used to generate heatmaps, bar plots, and count
plots.
• These visuals highlight significant relationships and feature distributions, helping in feature
selection and insight generation.
• Five classifiers are trained: Random Forest, SVM, KNN, Decision Tree, and Gaussian Naive
Bayes.
• Models are evaluated based on metrics like accuracy, precision, recall, F1-score, and ROC-
AUC.
6. Serialized Models
11
• Trained models are saved in .sav format, enabling real-time and scalable deployment in
applications.
7. Practical Applications
• The project output can be integrated into web or mobile platforms for use by healthcare
professionals and caregivers.
These features collectively ensure the project achieves its goal of providing a reliable tool for early
ASD detection.
12
CHAPTER 2
LITERATURE REVIEW
The early identification of ASD traits has been emphasized in numerous studies. According to
Dawson et al. (2010), early intervention during the first few years of life can substantially enhance
cognitive, social, and communication skills in children diagnosed with ASD. However, traditional
diagnostic processes rely heavily on behavioral assessments conducted by clinicians, which are
time-consuming, costly, and often subjective (Lord et al., 2000). These limitations underline the
need for scalable and objective methods for ASD screening.
Several screening tools have been developed to assist in the early detection of ASD traits. The
Modified Checklist for Autism in Toddlers (M-CHAT) is a widely used parent-reported
questionnaire designed to identify early signs of ASD (Robins et al., 2001). While effective in many
cases, M-CHAT has limitations, including dependency on parental understanding and potential for
high false-positive rates. Similarly, tools like the Autism Diagnostic Observation Schedule (ADOS)
provide comprehensive assessments but require trained professionals, making them less accessible
for large-scale use.
Traditional ASD screening methods face challenges such as subjectivity, reliance on parent-
reported data, and accessibility barriers. Furthermore, these tools may not capture nuanced
behavioral patterns or correlations between features that are detectable using data-driven
techniques. These limitations create a gap that machine learning approaches aim to fill by
leveraging patterns and relationships within large datasets.
13
4. Machine Learning in Autism Screening
Machine learning (ML) has emerged as a promising approach for improving ASD screening. ML
techniques can analyze complex datasets and identify patterns that are often undetectable by
humans. A study by Thabtah (2019) highlights the effectiveness of ML algorithms, such as decision
trees and neural networks, in classifying ASD traits with high accuracy. Similarly, Bone et al.
(2016) demonstrated the potential of ML in analyzing facial expressions and speech patterns for
autism detection.
Behavioral and demographic factors are pivotal in ASD screening. Studies have shown that
behavioral assessment scores, communication ability (e.g., Qchat-10), and demographic details like
age, gender, and family history significantly contribute to ASD diagnosis (Zwaigenbaum et al.,
2005). The inclusion of these features in ML models enhances predictive accuracy.
Supervised learning methods are commonly used in ASD-related research. Random Forest
classifiers, for instance, are valued for their ability to handle high-dimensional data and provide
feature importance rankings, as noted by Breiman (2001). Support Vector Machines (SVMs) are
also effective in classifying autism traits due to their robustness with small datasets (Cortes &
Vapnik, 1995). Naive Bayes classifiers have demonstrated potential in scenarios where simplicity
and computational efficiency are paramount.
A key focus in ASD research is comparing the performance of various ML algorithms. Studies
often evaluate models based on metrics such as accuracy, precision, recall, and F1-score. For
example, Duda et al. (2016) compared Random Forest, SVM, and Logistic Regression models for
ASD screening, concluding that Random Forest achieved the highest performance. Such
comparisons guide researchers in selecting the most suitable algorithm for their datasets and
objectives.
The success of ML models for ASD screening depends heavily on data preprocessing and feature
engineering. Converting categorical variables into numerical formats, handling missing data, and
standardizing feature representations are critical steps (García et al., 2015). Proper preprocessing
14
ensures that the models can learn effectively from the data, improving predictive accuracy and
reliability.
ML-based tools have shown promise in real-world applications. Mobile applications and web
platforms equipped with ML models enable caregivers to conduct preliminary ASD screenings from
the comfort of their homes. For instance, a study by Abbas et al. (2020) showcased an ML-powered
mobile app that screens toddlers for ASD traits using parental inputs. Such tools reduce dependency
on clinical settings and make screening more accessible.
Despite their potential, ML models face challenges such as data quality, interpretability, and
generalizability. Small sample sizes, class imbalances, and missing data in ASD datasets can affect
model performance. Moreover, ensuring that ML models are interpretable and explainable is
essential for their adoption in clinical settings (Lipton, 2016). Addressing these challenges requires
rigorous data preprocessing, balanced datasets, and techniques to enhance model transparency.
The Detection of Autism Spectrum Disorder project builds upon the existing body of research by
developing a comprehensive ML-based solution for early ASD screening in toddlers. The project
integrates behavioral and demographic data, performs rigorous data preprocessing, and trains
multiple supervised learning models. By evaluating these models using metrics such as accuracy,
precision, recall, and F1-score, the project identifies the best-performing classifier for ASD
detection. The outputs, including serialized models and visual insights, contribute to the broader
goal of making ASD screening accessible, efficient, and scalable.
Future research in ML-based ASD screening could explore the integration of additional data
sources, such as genetic or environmental factors, to enhance predictive accuracy. Advances in deep
learning may further improve the ability to analyze complex patterns in behavioral and
physiological data. Additionally, improving model interpretability and addressing ethical concerns,
such as data privacy and algorithmic bias, will be critical for widespread adoption.
Conclusion
15
This literature review underscores the importance of early ASD detection and the potential of
machine learning to transform screening methodologies. By addressing the limitations of traditional
methods, ML-based approaches provide a scalable, objective, and efficient alternative. The
Detection of Autism Spectrum Disorder project contributes to this growing field by offering a
practical and scientifically robust solution for early autism screening, advancing both research and
real-world applications.
16
CHAPTER 3
EXISTING & PROPOSED SYSTEM
The current systems for Autism Spectrum Disorder (ASD) detection primarily rely on traditional
diagnostic tools such as the Modified Checklist for Autism in Toddlers (M-CHAT) and the Autism
Diagnostic Observation Schedule (ADOS). These tools require caregivers or professionals to fill out
detailed questionnaires and observe behavioral patterns. While effective in many cases, they are
time-consuming, subjective, and prone to errors based on individual interpretations. Moreover,
accessibility to trained professionals and resources often limits widespread adoption, especially in
low-resource settings.
Additionally, existing methods lack scalability and automation, making them unsuitable for large-
scale screening. There is a pressing need for more efficient and objective systems capable of
analyzing complex patterns in behavioral and demographic data.
1. Subjectivity in Diagnosis: Traditional autism screening methods like M-CHAT and ADOS
heavily rely on subjective inputs from caregivers and professionals. This dependency
introduces variability in the results and increases the chances of misdiagnosis or false
positives.
2. Time and Resource Intensive: Current diagnostic processes often require significant time
and skilled professionals, which limits their scalability and accessibility, particularly in
remote or under-resourced areas.
3. Limited Automation: Existing systems lack automation, making it challenging to handle
large-scale screening efficiently. Manual assessments are labor-intensive and prone to
human error.
4. High Cost of Implementation: Diagnostic tools like ADOS involve specialized training
and resources, which can be prohibitively expensive for many families or organizations.
5. Inability to Capture Complex Patterns: Traditional systems do not leverage data-driven
techniques to uncover subtle patterns in behavioral and demographic data, which could
enhance diagnostic accuracy.
These limitations highlight the need for innovative, efficient, and scalable solutions like
17
machine learning-based models.
The lack of scalable, automated, and objective screening methods creates a significant gap in early
ASD detection. Current systems fail to leverage advanced data analysis techniques to identify subtle
patterns in behavioral and demographic data. This project addresses these limitations by developing
a machine learning-based model for efficient and accurate ASD screening.
The Detection of Autism Spectrum Disorder project introduces a machine learning-based system to
address the limitations of traditional diagnostic methods. The proposed system offers the following
features and improvements:
1. Comprehensive Data Integration
o Combines behavioral assessment scores, demographic factors, and health-related
information to create a robust dataset for analysis.
2. Machine Learning Models
o Trains multiple supervised learning algorithms, including Random Forest, Support
Vector Machine, Decision Tree, K-Nearest Neighbors, and Naive Bayes, for accurate
predictions.
3. Automated Feature Selection
o Uses data preprocessing and exploratory data analysis (EDA) to identify significant
predictors, such as Qchat-10 scores and family history of ASD.
4. Evaluation and Optimization
o Compares classifiers based on accuracy, precision, recall, and F1-score, selecting the
best-performing model for deployment.
5. Deployment Readiness
o Serializes the trained models for integration into real-world applications like web-
based tools or mobile apps.
18
6. Scalable Design
o Developed for large-scale implementation, suitable for diverse settings and resource-
constrained environments.
This proposed system bridges the gap in ASD screening, providing a robust, automated, and
scalable solution.
The proposed system offers several advantages that address the limitations of the existing system:
1. Efficient and Automated: Reduces manual effort and time required for ASD screening
through automated processes.
2. Objective Analysis: Provides unbiased predictions using data-driven insights.
3. Widespread Accessibility: Ensures scalability for large-scale ASD screening in various
environments, including remote areas.
4. Accurate Predictions: Delivers high precision and recall, reducing false positives and
negatives.
5. Cost-Effective: Minimizes reliance on expensive diagnostic tools and professional expertise,
lowering costs.
6. Practical Integration: Deployment-ready models can be integrated into accessible platforms
for real-time usage.
7. Early Intervention Support: Facilitates timely diagnosis, enabling early intervention and
better outcomes for toddlers.
8. Customizable Framework: Allows further enhancement by incorporating additional datasets
or features to improve accuracy and relevance.
This refined approach ensures the proposed system and its advantages are distinct and aligned with
the project's objectives.
19
CHAPTER 4
SYSTEM REQUIREMENTS
20
CHAPTER 5
PROJECT DESCRIPTION
The Detection of Autism Spectrum Disorder project leverages machine learning to develop an
efficient and scalable screening tool for identifying autism traits in toddlers. ASD is a
neurodevelopmental condition characterized by challenges in communication, social interaction,
and repetitive behaviors. Early detection is critical for timely intervention and improved outcomes,
but traditional methods like M-CHAT and ADOS are often subjective, time-consuming, and
resource-intensive.
This project uses a dataset containing behavioral assessment scores, demographic details, and
health-related factors to train machine learning models, including Random Forest, SVM, Decision
Tree, KNN, and Naive Bayes. By preprocessing the data and evaluating models using metrics such
as accuracy, precision, recall, and F1-score, the project identifies the best-performing classifier. The
resulting serialized model is ready for integration into real-world applications, offering a cost-
effective and automated solution for early ASD detection.
21
CHAPTER 6
SYSTEM DESIGN
22
▪ Grievance submission.
▪ Contact us.
▪ Campus.
▪ Department.
▪ Principal's desk.
3. Data Layer
• Purpose: Manages data storage and retrieval.
• Technology Used: MongoDB.
• Responsibilities:
o Secure storage of grievance form submissions and other necessary data.
o Facilitates authorized access for data review and action.
Component Interactions
1. User Requests and Interactions:
o Processed by the Presentation Layer through Flutter widgets.
o Handles inputs and displays necessary information to users.
2. Data Processing:
o Managed by the Application Layer.
o Processes user inputs, displays relevant data, and handles notifications.
3. Data Storage and Retrieval:
o Performed by the Data Layer using the MongoDB database.
o Ensures secure storage and provides data access to the application layer as needed.
A Data Flow Diagram (DFD) is a graphical representation of the flow of data within a
23
system or process. It is a modeling technique that shows how data moves through processes
and how it is stored, transformed, or exchanged within a system. DFDs are often used in
system analysis and design to visualize and understand the data processing and flow of
information in a structured manner.
24
6.3 UML DESIGN
25
elements of a system.UML has the subsequent5 varieties of behavioral diagrams.
They are
1. Use case diagram
2. Sequence diagram
3. Collaboration diagram
4. State chart diagram
5. Activity diagram
Use Case Diagrams in UML describe interactions between a system and external
entities known as actors. Use cases represent specific functionalities or scenarios that the
system provides to its users.
26
6.3.2 Class Diagram
The Class Diagram in UML illustrates the static structure of a system, detailing
classes, attributes, methods, and their relationships. Rectangles represent classes, and lines
indicate associations, dependencies, and inheritances.
27
6.3.3 Activity Diagram
28
6.3.4 Sequence Diagram
29
CHAPTER 7
SOFTWARE SPECIFICATIONS
1. Programming Language
• Python 3.7 or above for implementing data preprocessing, machine learning models,
evaluation, and visualizations.
• Google Colab: Preferred for cloud-based development with GPU/TPU support for faster
model training.
• Jupyter Notebook: Alternative for local experimentation and debugging.
4. Operating System
• Fully compatible with major operating systems such as Windows, Linux, and MacOS.
• Google Drive API: Useful for storing and retrieving datasets and serialized models when
using Google Colab.
30
CHAPTER 8
IMPLEMENTATION
The implementation of the Detection of Autism Spectrum Disorder project involves a systematic
pipeline comprising data preprocessing, exploratory data analysis (EDA), machine learning model
training, evaluation, and deployment. Below is a detailed breakdown:
2. Data Preprocessing
• Objective: Prepare the dataset for machine learning models.
• Steps:
o Handle missing values using imputation techniques.
o Convert categorical variables (e.g., Sex, Jaundice) into numerical formats (e.g., Yes
→ 1, No → 0).
o Normalize or scale numerical features to standardize data.
o Split the dataset into training and testing sets using train_test_split from scikit-learn.
5. Model Evaluation
• Objective: Evaluate the performance of trained models.
• Steps:
o Use metrics like accuracy, precision, recall, F1-score, and ROC-AUC.
o Generate classification reports and confusion matrices for each model.
o Visualize model performance using bar plots and ROC curves.
o Identify the best-performing model (e.g., Random Forest Classifier) based on
evaluation metrics.
7. Model Serialization
• Objective: Save the trained models for deployment and real-time predictions.
• Steps:
o Serialize the best-performing model using pickle.dump().
o Save the model as a .sav file for easy reuse.
8. Deployment Readiness
32
• Objective: Prepare the system for integration into applications.
• Steps:
o Test the serialized model by loading it using pickle.load() and making predictions on
sample inputs.
o Design a basic user interface or API for interacting with the model.
o Deploy the system on a web or mobile platform to enable real-time autism screening.
9. Outputs
• Preprocessed Data: Clean and structured dataset ready for modeling.
• Visual Insights: Correlation heatmaps, feature importance charts, and performance bar
plots.
• Serialized Models: Saved .sav files of trained models for deployment.
• Best Model: Random Forest Classifier with high accuracy and recall.
33
8.2 CODE
Here are the core parts of the code for the Detection of Autism Spectrum Disorder:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report, roc_curve, auc
import matplotlib.pyplot as plt
import seaborn as sns
# Load dataset
file_path =r"C:\Users\saish\OneDrive\Documents\Autism\Autism\Toddler Autism dataset
July 2023.csv"
34
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Define classifiers
classifiers = {
'Naive Bayes': GaussianNB(),
'Support Vector Machine': SVC(probability=True),
'Random Forest': RandomForestClassifier(),
'Decision Tree': DecisionTreeClassifier(),
'K-Nearest Neighbors': KNeighborsClassifier()
}
plt.figure(figsize=(10, 6))
#sns.barplot(x=feature_importance, y=feature_names, palette="viridis")
sns.barplot(x=feature_importance, y=feature_names, hue=feature_importance,
palette="viridis", legend=False)
plt.title("Feature Importance in Random Forest Model")
plt.xlabel("Importance Score")
plt.ylabel("Feature")
plt.show()
37
CHAPTER 9
SOFTWARE TESTING
Software testing is a crucial phase in the development lifecycle that ensures the reliability,
functionality, and performance of the autism detection system. In the context of this project,
testing verifies that the system processes data accurately, predicts autism traits effectively, and
performs robustly under diverse conditions, meeting stakeholder expectations.
Acceptance testing in the Detection of Autism Spectrum Disorder project ensures the system
meets all stakeholder requirements and project objectives. It validates that the machine learning
models accurately predict autism traits, achieving acceptable precision, recall, and accuracy
metrics. The system is tested for usability by verifying outputs like visualizations, feature
importance analysis, and serialized models for deployment readiness. For example, predictions
made by the best-performing model, such as Random Forest, are compared against expected
results to ensure reliability. Acceptance testing confirms that the project delivers a scalable,
accurate, and practical solution for early autism detection, ready for real-world applications.
Performance testing in the *Detection of Autism Spectrum Disorder* project evaluates the
system’s efficiency under various conditions. It measures key metrics such as model training
time, prediction speed, and resource utilization. For example, the system is tested to ensure it
handles large datasets efficiently without significant delays or memory issues. The Random
Forest model's ability to maintain high accuracy and recall under heavy data loads is also
analyzed. Performance testing ensures that the system delivers optimal results in both resource-
constrained and high-demand environments, making it scalable and reliable for real-world
applications like web or mobile platform integration.
Security testing in the *Detection of Autism Spectrum Disorder* project ensures the system
handles sensitive data securely and adheres to privacy standards. It verifies that behavioral and
demographic information is protected during data preprocessing, model training, and
deployment. For example, encryption is tested for serialized models to prevent unauthorized
access or tampering. The system is evaluated for compliance with data protection regulations
such as GDPR or HIPAA, ensuring safe storage and processing of personal information.
39
Security testing safeguards against vulnerabilities, such as data breaches or leaks, making the
project reliable for real-world use in healthcare and other sensitive environments.
Compatibility testing in the Detection of Autism Spectrum Disorder project ensures the system
functions consistently across different platforms, environments, and configurations. It verifies
that the code runs seamlessly on various operating systems, including Windows, Linux, and
MacOS. Additionally, compatibility is tested for different Python versions and library
dependencies to avoid conflicts. For deployment, the serialized machine learning models are
evaluated for integration with web or mobile platforms. This testing also ensures that
visualization outputs, such as plots and graphs, render correctly across devices. Compatibility
testing guarantees the system's adaptability, scalability, and reliability in diverse usage scenarios.
40
CHAPTER 10
RESULTS
This table provides detailed evaluation metrics for each classifier, including
precision, recall, F1-score, and support. It compares the performance of models like
Random Forest, SVM, Decision Tree, KNN, and Naive Bayes, highlighting their strengths
in correctly identifying autism traits.
41
● Feature Correlation Heatmap
The heatmap illustrates the correlation between different features in the dataset,
such as Qchat-10 scores, behavioral assessments, and family history. Strong correlations
are highlighted, providing insights into relationships that significantly impact ASD
prediction.
This bar plot visualizes the precision, recall, and F1-scores for all classifiers,
offering a clear comparison of model performance. It helps identify the most balanced and
42
• Receiver Operating Characteristic (ROC) Curve
The ROC curves compare the performance of classifiers based on their ability
to distinguish between classes (autism traits detected vs. not detected). AUC scores
for each model indicate their effectiveness, with higher AUC values reflecting better
performance.
43
• Feature Importance Plot
This plot ranks the features based on their contribution to the predictions in the
Random Forest classifier. For instance, Qchat-10 scores and family history may emerge as
the most influential predictors, guiding future research and interventions.
This bar chart compares the overall accuracy of all classifiers. It visually identifies the
best-performing models, such as Random Forest and Decision Tree, which achieve the highest
accuracy in autism trait detection.
44
CHAPTER 11
FUTURE SCOPE
The Detection of Autism Spectrum Disorder project lays a solid foundation for utilizing machine
learning to enhance early ASD detection. Its future scope extends to various areas, promising both
technological advancements and broader practical applications:
• Expanding the dataset by incorporating more diverse demographic and behavioral data will
enhance the model's accuracy and generalizability.
• Including additional features such as genetic, environmental, and neurological data could
provide deeper insights into ASD risk factors.
• Utilizing deep learning algorithms like Convolutional Neural Networks (CNNs) for image-
based ASD detection (e.g., brain scans or facial patterns).
• Employing ensemble techniques or hybrid models to further improve prediction accuracy
and minimize false positives/negatives.
3. Real-Time Applications
• Developing user-friendly web and mobile applications to deploy the model for real-time
ASD screening.
• Integrating the system into telehealth platforms for remote diagnosis, making ASD
screening accessible globally, especially in underserved areas.
• Enhancing the model to provide personalized risk profiles and tailored recommendations for
intervention plans.
• Leveraging feedback from caregivers and healthcare professionals to continuously refine the
system.
45
5. Cross-Domain Expansion
• Adapting the model for other neurodevelopmental conditions like ADHD or learning
disabilities, using similar datasets and methodologies.
• Ensuring data privacy and ethical use of patient data by adhering to regulations like GDPR
and HIPAA.
• Developing explainable AI models to increase trust and transparency in predictions,
especially for healthcare professionals and caregivers.
By addressing these future opportunities, the project can significantly advance autism research,
improve accessibility to early screening, and empower caregivers and healthcare providers with
innovative tools for timely and effective intervention.
46
CHAPTER 12
CONCLUSION
The Detection of Autism Spectrum Disorder project demonstrates the effectiveness of machine
learning in developing a scalable, efficient, and accurate screening tool for identifying autism traits
in toddlers. By leveraging behavioral and demographic data, the project addresses the limitations of
traditional diagnostic methods, such as subjectivity, high cost, and limited scalability.
The project successfully implemented multiple machine learning algorithms, including Random
Forest, SVM, Decision Tree, KNN, and Naive Bayes, to predict ASD traits. Through rigorous
evaluation using metrics like accuracy, precision, recall, and F1-score, the Random Forest Classifier
was identified as the best-performing model. Insights from feature importance analysis revealed key
predictors, such as Qchat-10 scores and family history, which can guide further research and
intervention strategies.
This project also emphasizes practical deployment by providing serialized models for real-world
applications, including integration into web and mobile platforms. Such applications can make ASD
screening more accessible to caregivers and healthcare professionals, enabling early diagnosis and
timely intervention, which are critical for improving developmental outcomes.
In conclusion, this project not only enhances early ASD detection but also lays the groundwork for
future innovations in autism research and neurodevelopmental screening. It represents a significant
step toward leveraging data-driven technology to improve the lives of individuals with autism and
their families.
47
REFERENCES
48