0% found this document useful (0 votes)
8 views

IJETAUTISMPAPER

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

IJETAUTISMPAPER

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Engineering & Technology​, 5 (x) (2017) xxx-xxx

International Journal of Engineering & Technology


Website: ​www.sciencepubco.com/index.php/IJET
doi:
Research paper, Short communication, Review, Technical paper

A machine learning based


approach to classify
Autism with optimum
behaviour sets

Vaishali R​1​, Sasikala R​2 ​*


1​
School of Computer Science and Engineering, VIT
University,Vellore,India
2​
School of Computer Science and Engineering, VIT
University,Vellore,India
*Corresponding author E-mail: sasikala.ra@vit.ac.in

Abstract

Machine Learning based behavioural analytics emphasis the need to develop accurate prediction models for detecting the risk of autism
faster than the traditional diagnostic methods. Quality of prediction rely on the accuracy of the supplied dataset and the machine learning
model.To improve accuracy of prediction, dimensionality reduction with feature selection is applied to eliminate noisy features from a
dataset. In this work an ASD diagnosis dataset with 21 features obtained from UCI machine learning repository is experimented with
swarm intelligence based binay firefly feature selection wrapper. The alternative hypothesis of the experiment claims that it is possible
for a machine learning model to achieve a better classification accuracy with minimum feature subsets.Using Swarm intelligence based
single-objective binary firefly feature selection wrapper it is found that 10 features among 21 features of ASD dataset are sufficient to
distinguish between ASD and non-ASD patients.The results obtained with our approach justifies the hypothesis by producing an average
accuracy in the range of 92.12%-97.95% with optimum feature subsets which is approximately equal to the average accuracy produced
by entire ASD diagnosis dataset.

Keywords​:​ Autism Spectrum Disorder, Behavioural Analytics, Machine Learning, Feature Selection

1. Introduction are also followed for behaviour classification. ADI-R and ADOS
are some common behaviour tests carried out by pediatricians for
detection of childhood autism symptoms. These clinical
Autism is a childhood disorder which has become more prevalent
experiments are practiced by certified professionals in laboratory
among younger generations in the recent decade. According to the
conditions. The assessments can last for 60 minutes of duration
centre for disease control and prevention, there is a sustainable
based on the patient’s responsiveness. The certified professional
growth in the number of children diagnosed with Autism disorder.
awards a binary score based on the quality of response.
According to them, 1 among 68 Children under the age of 8 in the
Consolidated scores decide the severity of autism in the patients.
United States of America is diagnosed with autism [1]⁠. ​Autism
In [3]​, an ASD diagnostic dataset comprised of 21
diagnosis is a clinical examination procedure conducted according
behavioural attributes is taken for classification task of ASD
to the DSM-V standards for disorder classification [2]⁠. These
patients from non-ASD. This work has adapted a mobile
standards are coined by the US Mental health professionals based
application based ASD screening approach obeying the DSM-V
on their successful diagnostic experiences and contributions.
fulfillment for Autism detection. The behaviour dataset has
These procedures are widely incorporated in behavioral analytics
collected 292 samples of children Autism screening episodes. In ,
for classification of ASD from non-ASD. In addition to DSM-V
the researcher suggests feature selection as a measure for
standards, interview and questionnaire based clinical examinations
improving prediction accuracy of machine learning models. This

Copyright © 2016 Authors. This is an open access article distributed under the ​Creative Commons Attribution License​, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is properly cited.
2 International Journal of Engineering & Technology

process of obtaining better accuracy with an optimum feature


subset that represent the structure of entire dataset is called 3. Materials and Methods of Study
Dimensionality reduction. Among two approaches of
dimensionality reduction, feature selection is recommended for
real world datasets. Any dataset that exceeds 10 features fall under 3.1. Feature selection for machine learning
the problem of high dimensionality. In ASD dataset there are 21
features, which makes it a high dimensional dataset according to
the previous claim. In the literature a work to classify ASD from In module 1, the ASD children dataset is trained with 8 different
ADHD has applied a filter based forward feature elimination machine learning algorithms using 10 fold cross validation.The
approach on a different ASD behaviour dataset consisting of 64 results of module 1 are compared with results of machine learning
features [4]⁠. The work claimed 5 attributes are sufficient for models obtained after feature selection with Binary firefly feature
efficient classification. In [5]⁠, a backward feature elimination selection algorithm. The configuration and pseudocode of feature
approach was to select features for classification of ASD patients selection algorithm is listed below. The Binary firefly algorithm
from ADHD based on differences in the behaviour patterns. There for feature selection implemented in this work is introduced as an
exist very few research works on machine learning based optimizer in and developed as feature selection algorithm in [21].
diagnosis of ASD due to unavailability of datasets for public
access. In 2017, Fadi Thabtah has published the ASD screening 3.2. Binary Firefly feature selection wrapper
dataset of children, adult and adolescent in UCI machine learning
repository for public access. These datasets were analyzed in their The binary firefly feature selection algorithm is proposed in [22],
work on ASD prediction with machine learning [3], [6]⁠. R [7]⁠ and [23]⁠for optimization of classification and regression algorithms.
Weka tool [8]⁠ were used for building machine learning models for This feature selection algorithm is a recent and fast performer that
classification of ASD from non-ASD patients.In this paper we has outperformed benchmark algorithms such as on 40 datasets.
analyze ASD diagnosis dataset of children with 292 instances for Binary firefly algorithm is accelerated with a logistic chaotic map
behavioural analytics and prediction tasks. As the repository to boost attractiveness. The local and global search strategy of
mentions the presence of missing values in the dataset, missing feature selection is enhanced by simulated annealing. Thus the
data imputation approach for noise reduction is applied to check algorithm converges towards global best solution within minimum
completeness of the dataset. The organization of this paper include iterations. The binary firefly feature selection algorithm is
(i) Discussion on Problem statement and solution (ii) Materials classified as a swarm intelligence optimizer based wrapper feature
and Methods of study (iii) Dataset and Pre-Processing tasks (iv) selection algorithm with a single objective function. General
Analysis (v) Interpretations from the results and (vi) Future Scope architectural working model of wrappers is shown in Fig. 1.
of the research.

2. Problem statement and Proposed Solution

ASD dataset with 21 attributes contains 20 features and 1 binary


class attribute. This dataset can produce 2​20 ​feature subsets for
evaluation. Exhaustive search based feature subset selection
algorithms will face exponential increase in time complexity as
feature sub-selection is classified as an np-hard problem [9]⁠.
Stochastic search algorithms with objective evaluation function
and feature elimination algorithms with candidate evaluation are
best solutions to overcome the np-hard search problems. In [5]⁠ and
[4]⁠, feature elimination approach by individual candidate
evaluation and selection based on ranking is opted as a feature
selection strategy. According to [10]⁠features selected by ranking
approach are highly prone to inter-feature correlation bias and thus
leads to redundancy among features which inversely affect the
performance of machine learning model. [11], [12]⁠ have proposed
swarm intelligence based feature selection wrappers as better Fig. 1: ​Architectural working model of swarm intelligence based feature
alternatives to avoid inter-feature correlation with correlation bias selection for feature subset evaluation with a single objective evaluation
function on ASD datasets
as an objective function in feature subset evaluation. Among
stochastic algorithms, bio-inspired swarm intelligence wrappers
In addition to the architectural working model, The flow chart of
are better explorers in feature selection. This makes swarm
the binary firefly feature selection algorithm can provide deep
intelligence wrappers as a better choice to explore more
insights into the working of the algorithm.
possibilities in minimum iterations and produce results that meets
the objective of selection. In this paper we propose a swarm
intelligence based feature selection wrapper combining Binary
firefly algorithm for feature selection with a single objective
function considering maximum accuracy and minimum features to
decide fitness of subsets [13]⁠.
International Journal of Engineering & Technology 3

Distance Euclidean

Number of Particles 30

No of Iterations 100

Objective Maximum accuracy and Minimum Features

Objective Type Single Weighted fuzzy fitness function

Chaotic function Logistic map

Machine Learning model Configuration

Naive Bayes A simple probability based Bayesian


classification algorithm for prediction [14]⁠

J48 Decision Tree A tree based decision tree classifier based on


C4.5 by R.Quinlan [15]⁠

SVM A discriminant function based classifier


classifies data with hyperplanes and Kernels
[16]–[18]⁠

K-NN A distance based classification algorithm


based on nearest values. [19]

Fig. 2: Flowchart on the working of Binary Firefly algorithm for feature MLP Back propagation neural network based
selection on ASD diagnosis dataset classification algorithm [20]⁠

CV Partition 10 Fold Cross validation


4. Dataset and Pre-processing

Experimental results of machine learning algorithms before and Evaluation Metrics


after feature selection of ASD children diagnosis dataset is
tabulated for analysis in table 2. The experiments are executed
according to the setup and configuration discussed above.
The Binary firefly feature selection algorithm has selected 10 Feature Reduction Ratio of number of features selected from
featured subset among 21 features in the dataset as optimum. The Ratio total feature set
feature reduction ratio of this algorithm is 0.48.
Accuracy Percentage of Instances correctly classified
Table 1: Parameter Setup of the Firefly feature selection wrapper
algorithm and Machine Learning algorithms
TP Rate Ratio of Correctly Predicted positive
instances
System Intel i5 5th Gen, 12GB RAM
Configuration
RMSE Bias rate in prediction
Tools R and Weka

Feature Selection Firefly wrapper ROC Area Area under the curve calculated by integrated
Algorithm start and end points of a graph

Category Swarm Intelligence search

Evaluation k-NN wrapper (k=5)


algorithm
4 International Journal of Engineering & Technology

Table 2: ​Analysis of performance of various machine learning algorithms


on the ASD children dataset before and after feature selection with Binary
Firefly algorithm

Accuracy TP Rate ROC area RMSE

B A B A B A B A

N 93.15 95.55 0..93 0.9 0.99 1.00 0.22 0.20


B 6

J 91.10 92.12 0.91 0.9 0.89 0.90 0.30 0.28


4 2
8

S 99.66 97.95 1.00 0.9 1.00 0.98 0.06 0.14


V 8
M Fig. 3: A graphical representation of variation in accuracy of the machine
learning models before and after feature selection with Binary Firefly.
From the figure it is clear that the after feature selection that models are
K 87.67 93.84 0.88 0.9 0.97 0.97 0.30 0.23 able to perform better or perform closer to the machine learning models
N 3 built on entire datasets.
N

M 99.66 97.60 1.00 0.9 1.00 1.00 0.052 0.14


L 8
P

Evaluation of various machine learning models on ASD children


diagnosis dataset observed an accuracy in the range of (87.67% to
99.66%) on original dataset. K-NN classifier with K=5 has
produced the least accuracy of 87.67% with RMSE score of 0.30.
Multilayered Perceptron and Support vector machine classifiers
produced 99.66% prediction accuracy on original dataset. J48
decision tree and Naive bayes classifier had shown medium
performance. Maximum ROC of 1 is achieved by MLP and SVM Fig. 4: ​TP rate obtained by machine learning models before and after
classifiers with a considerably minimum RMSE scores of 0.05 and feature selection is visualized in the figure.
0.06 respectively. These algorithms have achieved maximum true
positive rate of 1 whereas other algorithms have undergone True positive refers to the number of positive instances correctly
misclassification errors affecting True positive rate. classified as True. This measure is given importance because the
impact of positive instances falsely classified as false may result
After feature selection the number features are reduced to 10. The lethal effect on the prediction model. If a kid suffering from
features selected are A1_Score A2_Score, A3_Score, A4_Score, Autism is wrongly classified as Autism free, then it may affect the
A5_Score, A7_Score, A8_Score, A9_Score, A10_Score and treatment of the Kid as well as delays diagnostic process resulting
relation. On training machine learning models with these selected in complications. Hence TP rate is considered as an important
features, the accuracy obtained are in the range of evaluation factor in terms of medical datasets. False Positive rates
(92.12%-97.95%). K-NN model produced 93.84% of accuracy are negotiable, as the therapist’s intervention may prove it wrong
which shows 6.17% improvement than K-NN model trained with anytime and it does not interrupt or affect the diagnosis of the
original dataset. Except SVM and MLP models other three models child.
trained with optimum behaviour set have shown a considerable After feature selection, there is an improvement in the TP rate and
improvement in the accuracy, TP Rate, ROC and RMSE. in cases of SVM and MLP, the TP rate is considerably better and
closer to the actual model.

5. Interpretations

Among 292 instances in ASD children dataset,there are 151


instances with class ‘​yes’​ and 141 instances with class ‘​No​’. This
shows that the chosen dataset is void of class imbalance problem.
Due to the presence of 21 attributes, the dataset becomes high
dimensional and faces NP-hard problem in feature selection.
Stochastic Swarm intelligence algorithms with fixed number of
iterations and exploration capacity are better choices for optimum
feature subset selection.
International Journal of Engineering & Technology 5

Binary Firefly algorithm for feature selection opted is a fast [4] M. Duda, R. Ma, N. Haber, and D. P. Wall, “Use of
explorer than existing swarm intelligence search algorithms. machine learning for behavioral distinction of autism and
Comparison of results of machine learning models before and
ADHD,” ​Transl. Psychiatry​, vol. 6, no. 2, p. e732, 2017.
after feature selection showed that 3/5 machine learning models
have considerable performance improvement with the optimum [5] J. A. Kosmicki, V. Sochat, M. Duda, and D. P. Wall,
behaviour sets. “Searching for a minimal set of behaviors for autism detection
Presence of 15% missing values in the selected Relation attribute through feature selection-based machine learning,” ​Transl.
might have caused deterioration of quality of models in the Psychiatry​, vol. 5, no. 2, p. e514, 2015.
functional classifiers such as SVM and MLP. However the
[6] F. Thabtah, “Machine learning in autistic spectrum
performance of functional models built with optimum behaviour
set is better than the other classification models. disorder behavioral research: A review and ways forward,”
Due to lesser amount of instances in the dataset, there exist of Informatics Heal. Soc. Care,​ vol. 0, no. 0, pp. 1–20, 2018.
chance of model overfitting on the dataset. [7] R. C. Team and others, “R: A language and environment
From the above interpretations it is clear that the optimum for statistical computing,” 2013.
behaviour set has improved the prediction performance of
[8] G. Holmes, A. Donkin, and I. H. Witten, “Weka: A
machine learning models in 3/5 cases and in 2/5 cases the
behaviour set has exhibited a decent performance with minimum machine learning workbench,” in ​Intelligent Information Systems,
features. These observations validate the alternative hypothesis: 1994. Proceedings of the 1994 Second Australian and New
Minimum behaviour sets can retain the structure of the entire Zealand Conference on​, 1994, pp. 357–361.
dataset in machine learning.
[9] W. Siedlecki and J. Sklansky, “On automatic feature
selection,” ​Int. J. Pattern Recognit. Artif. Intell.,​ vol. 2, no. 02, pp.
6. Conclusion
This paper aimed to design an automated ASD prediction model 197–220, 1988.
with minimum behaviour sets selected from ASD diagnosis [10] L. Tolosi and T. Lengauer, “Classification with
dataset with Binary Firefly algorithm for feature selection. The correlated features: unreliability of feature ranking and solutions,”
hypothesis of this paper is to find whether machine learning Bioinformatics,​ vol. 27, no. 14, pp. 1986–1994, 2011.
models trained with minimum behaviour sets are capable of better
[11] X. Wang, J. Yang, X. Teng, W. Xia, and R. Jensen,
performance or not. In order to select features a swarm
intelligence based wrapper is considered as a better alternative to “Feature selection based on rough sets and particle swarm
Ranking based feature elimination algorithms. From the above optimization,” ​Pattern Recognit. Lett.,​ vol. 28, no. 4, pp. 459–471,
results and discussions the hypothesis is validated. 2007.
[12] A. Unler, A. Murat, and R. B. Chinnam, “mr2PSO: A
7. Future work
maximum relevance minimum redundancy feature selection
UCI repository indicates the presence of missing instances in the
ASD child dataset which is not handled in the present work. method based on swarm intelligence for support vector machine
Rather it is assumed that the dataset is complete and evaluation is classification,” ​Inf. Sci. (Ny).,​ vol. 181, no. 20, pp. 4625–4641,
done. This assumption could have impacted on the performance of 2011.
feature selection and machine learning. In future, a suitable [13] H. Banati and M. Bajaj, “Fire fly based feature selection
missing data imputation framework should be designed to check
approach,” ​IJCSI Int. J. Comput. Sci. Issues​, vol. 8, no. 4, 2011.
the presence of missing data in the dataset. Even Though swarm
intelligence wrappers are better explorers than traditional feature [14] G. H. John and P. Langley, “Estimating continuous
selection , there exist their own disadvantages in terms of risk of distributions in Bayesian classifiers,” in ​Proceedings of the
overfitting, time complexity and search complexity. These factors Eleventh conference on Uncertainty in artificial intelligence,​
should be addressed in the future work.
1995, pp. 338–345.
[15] J. R. Quinlan, “C4. 5: Programming for machine
Acknowledgement
The authors thank VIT University for providing ‘VIT SEED learning,” ​Morgan Kauffmann​, vol. 38, p. 48, 1993.
GRANT’ for carrying out this research. [16] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K.
R. K. Murthy, “Improvements to Platt’s SMO algorithm for SVM
References classifier design,” ​Neural Comput.,​ vol. 13, no. 3, pp. 637–649,
2001.
[1] J. Baio, “Prevalence of Autism Spectrum Disorder
[17] J. C. Platt, “12 fast training of support vector machines
Among Children Aged 8 Years—Autism and Developmental
using sequential minimal optimization,” ​Adv. kernel methods​, pp.
Disabilities Monitoring Network, 11 Sites, United States, 2014,”
185–208, 1999.
MMWR. Surveill. Summ.​, vol. 67, 2018.
[18] T. Hastie and R. Tibshirani, “Classification by pairwise
[2] A. P. Association and others, ​Diagnostic and statistical
coupling,” in ​Advances in neural information processing systems​,
manual of mental disorders (DSM-5®)​. American Psychiatric
1998, pp. 507–513.
Pub, 2013.
[19] D. W. Aha, D. Kibler, and M. K. Albert,
[3] F. Thabtah, “Autism Spectrum Disorder Screening:
“Instance-based learning algorithms,” ​Mach. Learn.,​ vol. 6, no. 1,
Machine Learning Adaptation and DSM-5 Fulfillment,” in
pp. 37–66, 1991.
Proceedings of the 1st International Conference on Medical and
[20] S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy
Health Informatics 2017​, 2017, pp. 1–6.
sets, and classification,” ​IEEE Trans. neural networks​, vol. 3, no.
5, pp. 683–697, 1992.
6 International Journal of Engineering & Technology

[21] X.-S. Yang, “Firefly algorithm, Levy flights and global


optimization,” in ​Research and development in intelligent systems
XXVI,​ Springer, 2010, pp. 209–218.
[22] H. Banati and M. Bajaj, “Fire fly based feature selection
approach,” ​IJCSI Int. J. Comput. Sci. Issues​, vol. 8, no. 4, 2011.
[23] L. Zhang, K. Mistry, C. P. Lim, and S. C. Neoh,
“Feature selection using firefly optimization for classification and
regression models,” ​Decis. Support Syst.,​ 2017.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy