0% found this document useful (0 votes)
139 views

Artificial Intelligence: Semester Project

This document presents a semester project on artificial intelligence using clustering algorithms on cricket statistics. It introduces the topic, dataset, and three clustering algorithms (K-means, Agglomerative, and Mean-Shift) that will be applied. The dataset contains statistics on the top batsmen from T20 cricket matches in 2016. The document outlines the process, advantages, and disadvantages of each clustering algorithm and provides the table of contents for the subsequent sections that will present and analyze the results of applying the three algorithms to the cricket statistics dataset.

Uploaded by

Abdullah Ammar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
139 views

Artificial Intelligence: Semester Project

This document presents a semester project on artificial intelligence using clustering algorithms on cricket statistics. It introduces the topic, dataset, and three clustering algorithms (K-means, Agglomerative, and Mean-Shift) that will be applied. The dataset contains statistics on the top batsmen from T20 cricket matches in 2016. The document outlines the process, advantages, and disadvantages of each clustering algorithm and provides the table of contents for the subsequent sections that will present and analyze the results of applying the three algorithms to the cricket statistics dataset.

Uploaded by

Abdullah Ammar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Artificial Intelligence

CSC-411

Semester Project

Instructor
Dr. Samabia Tehseem

Submitted by
Abdur Rehman Anwar
01-134142-199

Suneel Kumar
01-134142-202

Muhammad Ejaz
01-134142-201

Mian Usama Tariq


01-134142-119

Department of Computer Science


Bahria University, Islamabad

19-12-2017
Table of Contents
1. Introduction ............................................................................................................................................ 3
1.1 Domain ............................................................................................................................................... 3
1.2 Application ......................................................................................................................................... 3
2. Dataset ..................................................................................................................................................... 3
2.1 Details..................................................................................................................................................... 3
2.2 Source .................................................................................................................................................... 3
3. Algorithms .............................................................................................................................................. 3
3.1 K-means ................................................................................................................................................. 3
3.1.1 Process ........................................................................................................................................... 3
3.1.2 Advantages .................................................................................................................................... 3
3.1.3 Disadvantages ............................................................................................................................... 3
3.2 Agglomerative ....................................................................................................................................... 4
3.2.1 Process ........................................................................................................................................... 4
3.2.2 Advantages .................................................................................................................................... 4
3.2.3 Disadvantages ............................................................................................................................... 4
3.3 Mean-Shift.............................................................................................................................................. 4
3.3.1 Process ....................................................................................................................................... 4
3.3.2 Advantages .................................................................................................................................... 4
3.3.3 Disadvantages ............................................................................................................................... 4
4. Result and Analysis.............................................................................................................................. 5
4.1. Screen Shots of Python Based Graphical User Interface (GUI) ............................................................. 5
4.1.1. Initial Layout of this system. .................................................................................................... 5
4.1.2. When user wants to load required file using this project, then it opens dialog box to
load file...................................................................................................................................................... 5
4.1.3. If user tries to load other extension files which are not allowed through this system,
then it shows error message by using message box. ....................................................................... 5
4.1.4. If user uploaded respective extension file which is allowed through this system
specification, then it shows confirmation message by using message box. .................................. 5
4.1.5. This is complete layout of this system, after that system works on the three clustering
algorithms as they mentioned by using button for each algorithm. ................................................. 6
4.2. K-Mean Result ..................................................................................................................................... 6
4.2.1. K-means Graph ............................................................................................................................ 6
4.2.2. K-means Analysis ........................................................................................................................ 6
4.3. Agglomerative Result .......................................................................................................................... 7
4.3.1. Agglomerative Graph................................................................................................................... 7
4.3.2. Agglomerative Analysis ............................................................................................................... 7
4.4. Mean-Shift Result ................................................................................................................................ 7
4.4.1. Mean-Shift Graph ......................................................................................................................... 7
4.4.2. Mean-Shift Analysis ..................................................................................................................... 7
1. Introduction
1.1 Domain
We selected Machine Learning as a domain of Artificial Intelligence project. Many result can be
produced using clustering algorithms on cricket statistics. It is a vast domain for these types of
algorithms.
1.2 Application
 To predict match outcomes.
 To predict player’s performance.

2. Dataset
2.1 Details
The dataset we choose for our project is about t20 most runs by batsman in 2016. It is a numeric
dataset which contains 14 columns and 50 rows. In columns there is players name, matches,
innings, not outs, runs, average runs, strike rate, run rate, best score, 100s, 50, 6s, 4s and 0s.
2.2 Source
We took this dataset from world famous dataset repository Kaggle. This dataset is available in the
Kaggle website with name T20 Cricket Most Runs 2016 in the below link,
https://www.kaggle.com/frankfernandes/t20mostruns2016/

3. Algorithms
3.1 K-means
k-means clustering algorithm is one of the unsupervised learning algorithms in machine learning. It
can be sense from its name that it makes k clusters based on k means. It takes distance of new
given data with all means then compare these distances and assigns this given data to the cluster
which have least distance from it. After this it calculate means of each cluster and then it repeats
the same process until new means are become equal to previous means.
3.1.1 Process
Assume that we have dataset and set of k centroids.
 Iterates through all dataset
 Calculate distance between each data feature with these centroids
 Assign this data to the cluster which have minimum distance from this data
 At the end recalculate new centroids using the values in their cluster
 Repeat the process until new centroids are equal to the previous centroids
3.1.2 Advantages
 It is a simple and easy algorithm which can be written by beginners
 It is fast as compare to the other clustering algorithms
 It is best for distinct data values in the dataset
3.1.3 Disadvantages
 It is not best for overlapping data values in the dataset as it can’t decide the cluster of these
values
 It gives different result for different representation of the same dataset.
 Euclidean distance is not efficient for calculating distance for different data values in the
dataset
 When we choose data centers randomly it can’t lead us to the accurate results
 It is useless for nosey data and not good for outliers.
 Dataset should be linear otherwise it will not give correct results.
3.2 Agglomerative
It is a hierarchical clustering algorithm. It clusters dataset from bottom to up as every cluster in the
top has sub-clusters and these sub-clusters also have their own sub-clusters.
3.2.1 Process
 Consider each value as a cluster.
 Construct a distance matrix.
 Merge two clusters with minimum distance.
 Reconstruct the distance matrix with these new clusters.
 Repeat the process until distance matrix is reduced to two elements.
3.2.2 Advantages
 It can generate the order of objects, which can be useful for the data visualization.
 Lesser clusters are created, which can be supportive for detection.
 It is simple to implement and have multiple applications
3.2.3 Disadvantages
 There can be a chance it produced different result as it uses different distance matrices to calculate
distance.
 It can produce imbalances clusters.
 It is very difficult to choose number of clusters.

3.3 Mean-Shift
Mean shift is one of the nonparametric clustering algorithms with does not need the number of
clusters to be generated and there is no restriction for cluster’s shape. It shifts means into the
region with high density until convergence occur. It iterates through the region of interest and
calculate mean of all values in that region and shift center to the newly calculated mean. In this
way it calculates all centroids for the clusters.
3.3.1 Process
 First take random centroids and radius for the area of interest.
 Calculate distance of all dataset from this centroid.
 Choose those values which are within the defined area of interest.
 Calculate the mean of all chosen values.
 Shift the centroid to the newly calculated mean of values.
 Repeat the same process until the there is no shifting between centroids.
3.3.2 Advantages
 Mean shift algorithm is independent of the type of application.
 It is very simple and easy to implement.
 It does not require predefined shape of the clusters.
 It can be used to cluster dataset with any type of features.
 It depends on the bandwidth of the cluster.
3.3.3 Disadvantages
 Bandwidth should be taken carefully otherwise result will be inaccurate.
 Window size to calculate kernel should be non-trivial.
 As it picks first centroid randomly which can be an outlier.
4. Result and Analysis
4.1. Screen Shots of Python Based Graphical User Interface (GUI)

4.1.1. Initial Layout of this system.

4.1.2. When user wants to load required file using this project, then it opens dialog box
to load file.

4.1.3. If user tries to load other extension files which are not allowed through this system,
then it shows error message by using message box.

4.1.4. If user uploaded respective extension file which is allowed through this system
specification, then it shows confirmation message by using message box.
4.1.5. This is complete layout of this system, after that system works on the three
clustering algorithms as they mentioned by using button for each algorithm.

4.2. K-Mean Result


4.2.1. K-means Graph

4.2.2. K-means Analysis


In this system, K-means algorithm is used to make cluster of attribute “RUNS” from the
T20 Dataset. It considers K=2 in order to make two cluster as given in above graph. It is concluded
that decision boundary of this cluster is cleared for attribute “RUNS” from T20 Dataset.
4.3. Agglomerative Result
4.3.1. Agglomerative Graph

4.3.2. Agglomerative Analysis


In this system, Agglomerative algorithm is used to make cluster of attribute “RUNS”
from the T20 Dataset. It gives two cluster as given in above graph. It is concluded that decision
boundary is not cleared as K-mean algorithm gives for attribute “RUNS” from T20 Dataset.

4.4. Mean-Shift Result


4.4.1. Mean-Shift Graph

4.4.2. Mean-Shift Analysis


In this system, Mean-Shift algorithm is used to make cluster of attributes “RUNS” and
“NumberOfMatches” from the T20 Dataset as It is useless to apply Mean-Shift algorithm in single
attribute of any dataset due to which we took two attributes to make cluster through this algorithm.
In the graph, it is mentioned that there are two clusters with blue and green color clusters
respectively. There are some outliers too, which are shown with yellow color.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy