0% found this document useful (0 votes)

37 views

An Introduction To The WEKA Data Mining System

An Introduction to the WEKA Data Mining System

Uploaded by

Najm Almohammed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

An Introduction To The WEKA Data Mining System

An Introduction to the WEKA Data Mining System

Uploaded by

Najm Almohammed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

An Introduction to the WEKA Data Mining System

Zdravko Markov
Central Connecticut State University
markovz@ccsu.edu

Ingrid Russell
University of Hartford
irussell@hartford.edu
Agenda

• Data Mining

• Weka Project

• Basic functionality of Weka by example

• Weka for document classification and clustering

Database management systems (DBMS), Online
Analytical Processing (OLAP) and Data Mining

Area DBMS OLAP Data Mining

Knowledge
Extraction of detailed Summaries, trends discovery of hidden
Task
and summary data and forecasts patterns and
insights
Insight and
Type of result Information Analysis
Prediction
Multidimensional Induction (Build the
Deduction (Ask the
data modeling, model, apply it to
Method question, verify with
Aggregation, new data, get the
data)
Statistics result)
What is the average Who will buy a
Example Who purchased mutual income of mutual mutual fund in the
question funds in the last 3 years? fund buyers by next 6 months and
region by year? why?
Example of DBMS, OLAP and Data Mining: Weather data
Assume we have made a record of the weather conditions during a two-week period,
along with the decisions of a tennis player whether or not to play tennis on each
particular day. Thus we have generated tuples (or examples, instances) consisting of
values of four independent variables (outlook, temperature, humidity, windy) and one
dependent variable (play).
Data-Base Management System

• What was the temperature in the sunny days? {85, 80, 72, 69, 75}
• Which days the humidity was less than 75? {6, 7, 9, 11}
• Which days the temperature was greater than 70? {1, 2, 3, 8, 10, 11, 12, 13, 14}
• Which days the temperature was greater than 70 and the humidity was less than 75?
The intersection of the above two: {11}
OLAP: Multidimensional Model (Data Cube)

Dimensions:
• Time: Week 1={1, 2, 3, 4, 5, 6, 7}, Week 2={8, 9, 10, 11, 12, 13, 14}
• Outlook: {sunny, rainy, overcast}
Unit: play (yes/no)

9/5 sunny rainy overcast

Week 1 0/2 2/1 2/0
Week 2 2/1 1/1 2/0

 if outlook = overcast then play = yes

Data Mining: Association Rules

Discretize numeric attributes (data pre-processing

stage in data mining). Group the temperature values in
three intervals (hot, mild, cool) and humidity values in
two (high, normal).

1. humidity=normal windy=false 4 ==> play=yes (4, 1)

2. temperature=cool 4 ==> humidity=normal (4, 1)
3. outlook=overcast 4 ==> play=yes (4, 1)
4. temperature=cool play=yes 3 ==> humidity=normal (3, 1)
5. outlook=rainy windy=false 3 ==> play=yes (3, 1)
6. outlook=rainy play=yes 3 ==> windy=false (3, 1)
7. outlook=sunny humidity=high 3 ==> play=no (3, 1)
8. outlook=sunny play=no 3 ==> humidity=high (3, 1)
9. temperature=cool windy=false 2 ==> humidity=normal play=yes (2, 1)
10. temperature=cool humidity=normal windy=false 2 ==> play=yes (2, 1)
Data Mining: Decision Tree and Rules

If outlook = overcast then yes

If humidity = normal and windy = false then yes
If temperature = mild and humidity = normal then yes
If outlook = rainy and windy = false then yes
If outlook = sunny and humidity = high then no
If outlook = rainy and windy = true then no
Data Mining: Prediction

P(play=yes | outlook=sunny, temperature=mild, humidity=normal, windy=false) = 0.8

P(play=no | outlook=sunny, temperature=mild, humidity=normal, windy=false) = 0.2

Weka Project
12 Years ago …
KDnuggets : News : 2005 : n13 : item2

SIGKDD Service Award is the highest service award in the field of data mining and knowledge discovery. It is is given
to one individual or one group who has performed significant service to the data mining and knowledge discovery
field, including professional volunteer services in disseminating technical information to the field, education, and
research funding.

The 2005 ACM SIGKDD Service Award is presented to the Weka team for their development of the freely-available
Weka Data Mining Software, including the accompanying book Data Mining: Practical Machine Learning Tools and
Techniques (now in second edition) and much other documentation.

The Weka team includes Ian H. Witten and Eibe Frank, and the following major contributors (in alphabetical order of
last names): Remco R. Bouckaert, John G. Cleary, Sally Jo Cunningham, Andrew Donkin, Dale Fletcher, Steve
Garner, Mark A. Hall, Geoffrey Holmes, Matt Humphrey, Lyn Hunt, Stuart Inglis, Ashraf M. Kibriya, Richard
Kirkby, Brent Martin, Bob McQueen, Craig G. Nevill-Manning, Bernhard Pfahringer, Peter Reutemann, Gabi
Schmidberger, Lloyd A. Smith, Tony C. Smith, Kai Ming Ting, Leonard E. Trigg, Yong Wang, Malcolm Ware, and
Xin Xu.

The Weka team has put a tremendous amount of effort into continuously developing and maintaining the system since
1994. The development of Weka was funded by a grant from the New Zealand Government's Foundation for
Research, Science and Technology.

The key features responsible for Weka's success are:

– it provides many different algorithms for data mining and machine learning
– is is open source and freely available
– it is platform-independent
– it is easily usable by people who are not data mining specialists
– it provides flexible facilities for scripting experiments
– it has kept up-to-date, with new algorithms being added as they appear in the research literature.
12 Years ago …
KDnuggets : News : 2005 : n13 : item2 (cont.)
The Weka Data Mining Software has been downloaded 200,000 times since it was put on SourceForge in April
2000, and is currently downloaded at a rate of 10,000/month. The Weka mailing list has over 1100
subscribers in 50 countries, including subscribers from many major companies.

There are 15 well-documented substantial projects that incorporate, wrap or extend Weka, and no doubt many
more that have not been reported on Sourceforge.

Ian H. Witten and Eibe Frank also wrote a very popular book "Data Mining: Practical Machine Learning
Tools and Techniques" (now in the second edition), that seamlessly integrates Weka system into teaching
of data mining and machine learning. In addition, they provided excellent teaching material on the book
website.

This book became one of the most popular textbooks for data mining and machine learning, and is very
frequently cited in scientific publications.

Weka is a landmark system in the history of the data mining and machine learning research communities,
because it is the only toolkit that has gained such widespread adoption and survived for an extended period
of time (the first version of Weka was released 11 years ago). Other data mining and machine learning
systems that have achieved this are individual systems, such as C4.5, not toolkits.

Since Weka is freely available for download and offers many powerful features (sometimes not found in
commercial data mining software), it has become one of the most widely used data mining systems. Weka
also became one of the favorite vehicles for data mining research and helped to advance it by making many
powerful features available to all.

In sum, the Weka team has made an outstanding contribution to the data mining field.
Now …
Machine Learning, Data and Web Mining
by Example
(“learning by doing” approach)

• Data preprocessing and visualization

• Attribute selection
• Classification (OneR, Decision trees)
• Prediction (Nearest neighbor)
• Model evaluation
• Clustering (K-means)
• Association rules
Data preprocessing and visualization

Initial Data Preparation

(Weka data input)

• Raw data (Japanese loan data)

• Web/Text documents (Department data)
Data preprocessing and visualization
Japanese loan data (a sample from a loan history database of a Japanese bank)
Clients: s1,..., s20
• Approved loan: s1, s2, s4, s5, s6, s7, s8, s9, s14, s15, s17, s18, s19
• Rejected loan: s3, s10, s11, s12, s13, s16, s20

Clients data:
• unemployed clients: s3, s10, s12
• loan is to buy a personal computer: s1, s2, s3, s4, s5, s6, s7, s8, s9, s10
• loan is to buy a car: s11, s12, s13, s14, s15, s16, s17, s18, s19, s20
• male clients: s6, s7, s8, s9, s10, s16, s17, s18, s19, s20
• not married: s1, s2, s5, s6, s7, s11, s13, s14, s16, s18
• live in problematic area: s3, s5
• age: s1=18, s2=20, s3=25, s4=40, s5=50, s6=18, s7=22, s8=28, s9=40, s10=50, s11=18, s12=20,
s13=25, s14=38, s15=50, s16=19, s17=21, s18=25, s19=38, s20=50
• money in a bank (x10000 yen): s1=20, s2=10, s3=5, s4=5, s5=5, s6=10, s7=10, s8=15, s9=20, s10=5,
s11=50, s12=50, s13=50, s14=150, s15=50, s16=50, s17=150, s18=150, s19=100, s20=50
• monthly pay (x10000 yen): s1=2, s2=2, s3=4, s4=7, s5=4, s6=5, s7=3, s8=4, s9=2, s10=4, s11=8,
s12=10, s13=5, s14=10, s15=15, s16=7, s17=3, s18=10, s19=10, s20=10
• months for the loan: s1=15, s2=20, s3=12, s4=12, s5=12, s6=8, s7=8, s8=10, s9=20, s10=12, s11=20,
s12=20, s13=20, s14=20, s15=20, s16=20, s17=20, s18=20, s19=20, s20=30
• years with the last employer: s1=1, s2=2, s3=0, s4=2, s5=25, s6=1, s7=4, s8=5, s9=15, s10=0, s11=1,
s12=2, s13=5, s14=15, s15=8, s16=2, s17=3, s18=2, s19=15, s20=2
Data preprocessing and visualization
Relations, attributes, tuples (instances)

Loan data – CSV format

(LoanData.csv)
Data preprocessing and visualization
Attribute-Relation File Format (ARFF) - http://www.cs.waikato.ac.nz/~ml/weka/arff.html
Data preprocessing and visualization
Run Weka and select the Explorer
Data preprocessing and visualization
Load data into Weka – ARFF format or CSV format (click on “Open file…”)
Data preprocessing and visualization
Converting data formats through Weka (click on “Save…”)
Data preprocessing and visualization
Editing data in Weka (click on ”Edit…”)
Data preprocessing and visualization
Examining data
• Attribute type and properties
• Class (last attribute) distribution
Data preprocessing and visualization
Click on “Visualize All”
Data preprocessing and visualization
Click on “Visualize” tab, double-click on a plot to see the 2D projection of the instance space
Data preprocessing and visualization
Using filters: click on “Choose” in the “Filter” window, select “Discretize”
Data preprocessing and visualization
Click in the “Discretize” in the Filter window and choose parameters, then click on “Apply”

Note how the plot

of “lastemp”
changed.
Data preprocessing and visualization
Web/Text documents - Department data
Data preprocessing and visualization
Department data document collection
http://www.cs.ccsu.edu/~markov/MDLclustering

http://www.cs.ccsu.edu/~markov/MDLclustering/data.zip
Data preprocessing and visualization
Department data: Create ARFF file

http://www.cs.ccsu.edu/~markov/MDLclustering/MDL.jar
Data preprocessing and visualization
Department data: Create ARFF file in string format (using SimpleCLI)

1. Create file deptA with the files in folder data/departments/A with class label A:
java ARFFstring data/departments/A A deptA

2. Create file deptB with the files in folder data/departments/B with class label B:
java ARFFstring data/departments/B B deptB

3. Merge deptA and deptB into one file departments-string.arff

4. Add the following ARFF file header in the beginning of departments-string.arff:

@relation departments_string
@attribute document_name string
@attribute document_content string
@attribute document_class {A,B}
@data
Data preprocessing and visualization
Loading text data in Weka
• String format for ID and content
• One document per line
• Add class (nominal) if needed
Data preprocessing and visualization
Converting a string attribute into nominal
Choose filters/unsupervised/attribute/StringToNominal, set attributeRange to 1, click on Apply
Data preprocessing and visualization
Converting text data into TFIDF (Term Frequency – Inverted Document Frequency) attribute format
• Choose filters/unsupervised/attribute/StringToWordVector
• Set the parameters as needed (see “More”)
• Click on “Apply”
Data preprocessing and visualization
Make document_class last attribute
• Choose filters/unsupervised/attribute/Copy
• Set the index to 2 and click on Apply
• Remove attribute 2
Data preprocessing and visualization
• Change the attributes to nominal (use NumericToNominal filter)
• Save data on a file for further use
Data preprocessing and visualization
ARFF department data in binary format (NonSparse and Sparse format, see SparseToNonSparse filter)
Data preprocessing and visualization
ARFF department data in TF and TFIDF format
Data preprocessing and visualization
Student Projects

• Preprocess.html
• Visualization.html
Attribute Selection
Finding a minimal set of attributes that preserve the class distribution
Attribute relevance with respect to the class – irrelevant attribute (accounting)

IF accounting=1 THEN class=A (Error=0, Coverage = 1 instance → overfitting )

IF accounting=0 THEN class=B (Error=10/19, Coverage = 19 instances → low accuracy)
Attribute Selection
Attribute relevance with respect to the class – relevant attribute (science)

IF science=1 THEN class=A (Error=0, Coverage = 7 instance)

IF science=0 THEN class=B (Error=4/13, Coverage = 13 instances)
Attribute Selection (with document_name)
Attribute Selection (without document_name)

Select document_name and

click on Remove
Attribute Selection (ranking)
Attribute Selection (explanation of ranking)
Attribute Selection (using filters)
• Choose filters/supervised/attribute/AttributeSelection
• Set parameters to InfoGainAttributeEval and Ranker
• Click on Apply and see the attribute ordering
Attribute Selection (using filters)
Choose filters/supervised/attribute/AttributeSelection and use CfsSubsetEval and BestFirst search.
Then click on Visualize All
Attribute Selection
Student Projects

• Attribute Selection.html
Classification – creating models (hypotheses)
Mapping (independent attributes -> class)
Classification – creating models (hypotheses)
Inferring one-attribute rules - OneR

Weather data (weather.nominal.arff)

Attribute Rules Errors Total
error
outlook sunny -> no 2/5 4/14
overcast -> yes 0/4
rainy -> yes 2/5

temperature hot -> no 2/4 5/14

mild -> yes 2/6
cool -> yes 1/4

humidity high -> no 3/7 4/14

1/7
normal -> yes
windy false -> yes 2/8 5/14
true -> no 3/5
Classification – OneR
Classification – decision tree
Right click on the highlighted line in Result list and choose Visualize tree
Classification – decision tree
Top-down induction of decision trees (TDIDT, old approach know
from pattern recognition):
• Select an attribute for root node and create a branch for each
possible attribute value.
• Split the instances into subsets (one for each branch
extending from the node).
• Repeat the procedure recursively for each branch, using only
instances that reach the branch (those that satisfy the
conditions along the path from the root to the branch).
• Stop if all instances have the same class.

ID3, C4.5, J48 (Weka): Select the attribute that minimizes the class
entropy in the split.
Classification – numeric attributes
weather.arff
Classification – predicting class
Click on Set… Click on Open file…
Classification – predicting class
Right click on the highlighted line in Result list and choose Visualize classifier errors
Click on the square
Classification – predicting class
Click on Save
Classification
Student Projects

• Classification.html
Prediction (no model, lazy learning)
test: (sunny, cool, high, TRUE, ?) • K-nearest neighbor (IBk)
Take the class of the nearest neighbor
or the majority class among K neighbors
K=1 -> no
K=3 -> no
K=5 -> yes
K=14 -> yes (Majority predictor, ZeroR)

• Weighted K-nearest neighbor

K=5 -> undecided
no=1/1+1/2=1.5
yes=1/2+1/2+1/2=1.5

X 2 8 9 11 12 … 10
Distance(test,X) 1 2 2 2 2 … 4
play no no yes yes yes … yes
• Distance is calculated as the number of different attribute values
• Euclidean distance for numeric attributes
Prediction (no model, lazy learning)
Prediction
Student Projects

• Prediction.html
Model evaluation – holdout (percentage split)
Model evaluation – cross validation
Model evaluation – leave one out cross validation
Model evaluation – confusion (contingency) matrix

predicted

actual
yes no
yes 3 1
no 1 0

predicted
yes no

actual
yes TP FN
no FP TN

Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
Model evaluation
Student Projects

• Evaluation.html
Clustering – k-means
Click on Ignore attributes
Clustering – classes to clusters evaluation
Right click on Result list, select Visualize cluster assignments
Click on Save
Clustering
Student Projects

• Clustering.html
Association Rules (A => B)
• Confidence (accuracy): P(B|A) = (# of tuples containing both A and B) / (# of tuples containing A).
• Support (coverage): P(A,B) = (# of tuples containing both A and B) / (total # of tuples)
Association Rules
Student Projects

• Association.html
Document classification and clustering
Predict the class of the Theatre document

1. Create a training set – all departments excluding Theatre

(data collection)
2. Use Binary, Term Frequency or TFIDF representation
(data preprocessing)
3. Select a relevant subset of attributes (attribute selection)
4. Use J48, IBk, and Naïve Bayes (classification)
5. Evaluate all models by cross validation (model
evaluation)
6. Choose the best model and predict the class of Theatre
(prediction)
7. Cluster the training set with K-means compare the
cluster centroids with Theatre
Document classification and clustering
Teaching resources and student projects based on Weka

• Zdravko Markov and Daniel T. Larose, Data Mining the Web:

Uncovering Patterns in Web Content, Structure, and Usage, Wiley 2007
(free excerpts: Chapter 1, TOC, Index)
• Lecture slides: dmw1.pdf, dmw2.pdf, dmw3.pdf, dmw4.pdf, dmw5.pdf
• Data sets: http://www.cs.ccsu.edu/~markov/dmwdata.zip
• Clustering.html
http://www.cs.ccsu.edu/~markov/MDLclustering/
http://www.cs.ccsu.edu/~markov/DMWprojects

Personal Loan Campaign Final
No ratings yet
Personal Loan Campaign Final
12 pages
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet
Weka Tutorial
100% (2)
Weka Tutorial
60 pages
Compare Data Mining Tools
No ratings yet
Compare Data Mining Tools
11 pages
Census Data Mining and Data Analysis Using WEKA: Abstract
No ratings yet
Census Data Mining and Data Analysis Using WEKA: Abstract
6 pages
Main Steps For Doing Data Mining Project Using Weka: February 2016
No ratings yet
Main Steps For Doing Data Mining Project Using Weka: February 2016
20 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Weka
No ratings yet
Weka
15 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
STRT Abhay
No ratings yet
STRT Abhay
14 pages
IJCSE-01768
No ratings yet
IJCSE-01768
4 pages
An Introduction To The WEKA Data Mining System
No ratings yet
An Introduction To The WEKA Data Mining System
2 pages
9348 11568 1 PB Published Paper
No ratings yet
9348 11568 1 PB Published Paper
12 pages
Exp 6
No ratings yet
Exp 6
9 pages
Flood Prediction Analysis
No ratings yet
Flood Prediction Analysis
42 pages
Business Intelligence DM1
No ratings yet
Business Intelligence DM1
36 pages
class 1a-DataCollection
No ratings yet
class 1a-DataCollection
14 pages
A Comparative Study On Data Mining Tools: Related Papers
No ratings yet
A Comparative Study On Data Mining Tools: Related Papers
4 pages
Ijiset V2 I2 63 PDF
No ratings yet
Ijiset V2 I2 63 PDF
9 pages
Weka
No ratings yet
Weka
22 pages
List of Practical
No ratings yet
List of Practical
66 pages
DATA MINING AND MACHINE LEARNING TOOLS (1)
No ratings yet
DATA MINING AND MACHINE LEARNING TOOLS (1)
6 pages
PREPROCESSINGOFDresses Sales DATASET
No ratings yet
PREPROCESSINGOFDresses Sales DATASET
20 pages
K-Means Clustering Using Weka Interface
No ratings yet
K-Means Clustering Using Weka Interface
6 pages
Background: Research and Evolution
No ratings yet
Background: Research and Evolution
6 pages
Data Mining Concepts and Applications: Six Factors Behind The Sudden Rise in Popularity of Data Mining
No ratings yet
Data Mining Concepts and Applications: Six Factors Behind The Sudden Rise in Popularity of Data Mining
36 pages
WEKA: The Bird: Machine Learning With Weka
No ratings yet
WEKA: The Bird: Machine Learning With Weka
87 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Hemant STRT
No ratings yet
Hemant STRT
18 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Workshop 1
No ratings yet
Workshop 1
16 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
Open Source Data Mining
No ratings yet
Open Source Data Mining
5 pages
Data Mining: Encyclopedic Style Neutral
No ratings yet
Data Mining: Encyclopedic Style Neutral
12 pages
Classification Algorithm
No ratings yet
Classification Algorithm
51 pages
WEKA Lab Manual
100% (1)
WEKA Lab Manual
107 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Data Warehouse Final Record
No ratings yet
Data Warehouse Final Record
55 pages
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
A Research Review On Comparative Analysis of Data Mining Tools, Techniques and Parameters
No ratings yet
A Research Review On Comparative Analysis of Data Mining Tools, Techniques and Parameters
7 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
A Study of Open-Source Data Mining Tools For Forecasting: Nurdatillah Hasim Norhaidah Abu Haris
No ratings yet
A Study of Open-Source Data Mining Tools For Forecasting: Nurdatillah Hasim Norhaidah Abu Haris
4 pages
DMW_LabFile_0901CS243D11_swastik
No ratings yet
DMW_LabFile_0901CS243D11_swastik
25 pages
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
No ratings yet
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
25 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
21 pages
Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: I10030789S19/19©BEIESP DOI: 10.35940/ijitee.I1003.0789S19
No ratings yet
Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: I10030789S19/19©BEIESP DOI: 10.35940/ijitee.I1003.0789S19
1 page
WEKA
No ratings yet
WEKA
50 pages
Comp 6838
No ratings yet
Comp 6838
41 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
Data Mining - Session #1 - Unlocked
No ratings yet
Data Mining - Session #1 - Unlocked
22 pages
From Big Data To Smart Data: Teaching Data Mining and Visualization
No ratings yet
From Big Data To Smart Data: Teaching Data Mining and Visualization
5 pages
Weka
No ratings yet
Weka
12 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Different Approaches of CBIR Techniques
No ratings yet
Different Approaches of CBIR Techniques
4 pages
Weka Filters Unsupervised Attribute
No ratings yet
Weka Filters Unsupervised Attribute
3 pages
English Level 2
No ratings yet
English Level 2
28 pages
English Level 3
No ratings yet
English Level 3
31 pages
Previous Year Paper - Sem 7
No ratings yet
Previous Year Paper - Sem 7
12 pages
Applying K-Nearest Neighbour in Diagnosing Heart Disease Patient
No ratings yet
Applying K-Nearest Neighbour in Diagnosing Heart Disease Patient
4 pages
Scott Knott
No ratings yet
Scott Knott
26 pages
Heart Disease Prediction Using Machine Learning Techniques: Devansh Shah Samir Patel Santosh Kumar Bharti
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Devansh Shah Samir Patel Santosh Kumar Bharti
6 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
1 s2.0 S0264999313004318 Main
No ratings yet
1 s2.0 S0264999313004318 Main
9 pages
ML DecisionTrees
No ratings yet
ML DecisionTrees
46 pages
Note 6
No ratings yet
Note 6
33 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Random Forest PHD Thesis
100% (3)
Random Forest PHD Thesis
4 pages
Introduction To Machine Learning: Enrique Vinicio Carrera
No ratings yet
Introduction To Machine Learning: Enrique Vinicio Carrera
98 pages
Computerized Medical Imaging and Graphics
No ratings yet
Computerized Medical Imaging and Graphics
10 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
Word Sense Disambiguation and Its Approaches: Vimal Dixit, Kamlesh Dutta and Pardeep Singh
No ratings yet
Word Sense Disambiguation and Its Approaches: Vimal Dixit, Kamlesh Dutta and Pardeep Singh
5 pages
Exploring the Impact of Noise on Hybrid Inversion of PROSAIL RTM on Sentinel-2 Data (1)
No ratings yet
Exploring the Impact of Noise on Hybrid Inversion of PROSAIL RTM on Sentinel-2 Data (1)
20 pages
ML Tutorial by MS
No ratings yet
ML Tutorial by MS
354 pages
ML Lab Manual
No ratings yet
ML Lab Manual
53 pages
LP1 Oral Answers
No ratings yet
LP1 Oral Answers
6 pages
ML Bits & Answers
100% (1)
ML Bits & Answers
4 pages
Detecting Cybersecurity Attacks Across Different Network Features and Learners
No ratings yet
Detecting Cybersecurity Attacks Across Different Network Features and Learners
29 pages
AI UNIT-4 PPT
No ratings yet
AI UNIT-4 PPT
60 pages
(SCI 174) Rough Set Theory. A True Landmark in Data Analysis (Studies in Computational Intelligence) (Springer 2009)
No ratings yet
(SCI 174) Rough Set Theory. A True Landmark in Data Analysis (Studies in Computational Intelligence) (Springer 2009)
327 pages
PADM - Decision Trees
No ratings yet
PADM - Decision Trees
43 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
AI Unit 4
No ratings yet
AI Unit 4
11 pages
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
No ratings yet
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
12 pages
Using Data Mining To Predict Student Performance
No ratings yet
Using Data Mining To Predict Student Performance
12 pages
KNIME CheatSheet Beginner A3 Web
100% (1)
KNIME CheatSheet Beginner A3 Web
1 page
Predictive Modelling of Crime Dataset Using Data Mining
No ratings yet
Predictive Modelling of Crime Dataset Using Data Mining
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

An Introduction To The WEKA Data Mining System

Uploaded by

An Introduction To The WEKA Data Mining System

Uploaded by

An Introduction to the WEKA Data Mining System

• Basic functionality of Weka by example

• Weka for document classification and clustering

Area DBMS OLAP Data Mining

9/5 sunny rainy overcast

 if outlook = overcast then play = yes

Discretize numeric attributes (data pre-processing

1. humidity=normal windy=false 4 ==> play=yes (4, 1)

If outlook = overcast then yes

P(play=yes | outlook=sunny, temperature=mild, humidity=normal, windy=false) = 0.8

P(play=no | outlook=sunny, temperature=mild, humidity=normal, windy=false) = 0.2

The key features responsible for Weka's success are:

• Data preprocessing and visualization

Initial Data Preparation

• Raw data (Japanese loan data)

Loan data – CSV format

Note how the plot

3. Merge deptA and deptB into one file departments-string.arff

4. Add the following ARFF file header in the beginning of departments-string.arff:

IF accounting=1 THEN class=A (Error=0, Coverage = 1 instance → overfitting )

IF science=1 THEN class=A (Error=0, Coverage = 7 instance)

Select document_name and

Weather data (weather.nominal.arff)

temperature hot -> no 2/4 5/14

humidity high -> no 3/7 4/14

• Weighted K-nearest neighbor

1. Create a training set – all departments excluding Theatre

• Zdravko Markov and Daniel T. Larose, Data Mining the Web:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.