0% found this document useful (0 votes)
13 views10 pages

Predicting Students Academic Perfomace U

Uploaded by

dung doan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views10 pages

Predicting Students Academic Perfomace U

Uploaded by

dung doan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Predicting Students Academic Perfomace using Naive Bayes Algorithm

Balamurugan E.
Sujith Jayaprakash Assoc. Professor, Department of ICT
Sr. Lecturer, Department of ICT BlueCrest College
BlueCrest College Accra, Ghana.
Accra, Ghana. e.balamurugan@bluecrest.edu.gh
sujith.jayaprakash@bluecrest.edu.gh 0279509431
0263011390
Vibin Chandar
Lecturer, Department of ICT
BlueCrest College
Accra, Ghana.
Vibin.chandar@bluecrest.edu.gh
0263011399

Abstract— In the present days, education plays a vital role to stimulate the people to lead their life more comfortable.

Due to sudden rising of various educational institutions all around the world most of the institutions are trying hard

to survive. Institutions offering specially higher education are striving hard to maintain the quality offered to the

students. There are lots of factors are influencing the quality of education institutions like Infrastructure, Teaching

and learning methods, Laboratories, Campus Placements, Linkages with Industries etc. One among the major factor

which influences the quality of an institution is the student feedback. Now a days institutions are paying more

attention towards the student feedback on their experience with their lecturers on the quality of delivery of course

content’s in Classroom. Retention of institutions with a good numbers is dependent on the understanding and

satisfying students need. Hence maintaining high quality standards is eminent for any institution to improve the

academic performance of students and to retain them in the system. In this paper, Naive Bayes algorithm is applied

for predicting student’s academic performance at the end semester exams by analyzing students feedback and their

performance in the mid-semester exams. This work helps the educational institutions to identify the weaker studens

in advance and arrange necessary training before they are going to appear for their final exams.

Keywords: Naive Bayes Algorithm, Student Feedback, Academic Performance, Student Retention, Knowledge discovery.

1
1. INTRODUCTION

Data mining has been used in the areas of Science and Engineering, such as Education, Genetics, Medicine,

Bioinformatics and electical power engineering. Data mining techniques and tools are used to extract meaning from

large set of data generated to peoples learning activites. It has been widely used in the areas of Business to analyse

the Customer Relation Management, Human Resource management, marketing etc., Data Mining has high impact in

the Business sector, Education is also tapping into the power of Data Mining.

Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different

perspectives and summarizing it into useful information. Information that can be used to increase revenue, cuts

costs, or both. It can be classified as Supervised and Unsupervised learning. In the supervised learning classification

requires the training data has to specify what we are trying to learn (the classes) and where as in unsupervised

learning the training data doesn’t specify what we are trying to learn (the clusters). Supervised learning is analogous

to human learning from past experiences to gain new knowledge in order to improve our ability to real world tasks

[1]. Various algorithms are used to perform supervised learning and few among them are Symbolic Machine Learning

algorithm, Semisymbolic machine learning algorithm, Nearest Neighbour Algorithm, Naive Bayes algorithm.

The Naive Bayes algorithm is a simple probabilistic classifier which is based on Bayes theorem with strong and naive

independence assumptions. It is one of the most basic classification techniques with various applications in email

spam detection, personal email sorting, document categorization, sexually explicit content detection, language

detection and sentiment detection. Despite the naive design and oversimplified assumptions that this technique uses,

Naive Bayes performs well in many complex real-world problems. Naive Bayes algorithm is highly scalable and

requires a number of parameters linear in the number of variables. A Naive Bayes classifier is a simple probabilistic

classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions.

In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is

unrelated to the presence (or absence) of any other feature. Several colleges and universities have adopted feedback

analysis system using various models in data-mining to improve student retention and to channel students to courses

and programs that the institutions judge most appropriate.

In this paper, Supervised learning approach through Naives Bayes algorithm [2] is used for the prediction of final

examination results of student’s based on their Feedback and their mid semester results as a training data to analyse

their academic performance. Various attributes of Students feedback has been taken as dependent variables and mid

2
semester exam result is taken as a explanatory variable.This paper is organised as chapter I : Introduction, Chapter II

:Related works, Chapter III : Proposed Methodology, Chapter IV : Results and Discussions Chapter V : Conclusion.

2. RELATED WORKS

Contemporaneous researches are introduced using various data mining technique to analysis the academic

performance of students at various levels, following are the few of some especially used for academic progression in

various modes.

M. Wook, Y. Hani Yamaya, N. Wahab, M. Rizal Mohd Isa, N. Fatimah Awang and H. Yann Seong compared two data

mining techniques which are: Artificial Neural Network and the combination of clustering and decision tree

classification techniques for predicting and classifying student's academic performance. As a result, the technique that

provides accurate prediction and classification was chosen as the best model. Using this model, the pattern that

influences the student's academic performance was identified. S. Kumar Yadav, B. Bharadwaj and S. Pal obtained the

university students data such as attendance, class test, seminar and assignment marks from the students' database, to

predict the performance at the end of the semester using three algorithms ID3, C4.5 and CART and shoes that CART

is thebest algorithm for classification of data[3] . N. Thai Nghe, P. Janecek and P. Haddawy compared the accuracy of

decision tree and Bayesian network algorithms for predicting the academic performance of undergraduateand

postgraduate students at two very different academic institutes. These predictions are most useful for identifying and

assisting failing students, and better determine scholarships. As a result, the decision tree classifier provides better

accuracy in comparison with the Bayesian network classifier [4]. M. Alam and S. A. Alam have presented a novel

algorithm implementing decision trees to maximize the profit-based objective function under resource constraints.

More specifically, they take any decision tree as input, and mine the best actions to be chosen in order to maximize

the expected net profit of all the customers. NBTree - The Naive Bayesian tree learner, NBTree (Kohavi 1996),

combined Naive Bayesian classification and decision tree learning. Bayesian classifiers are statistical classifier. The

Naive Bayes algorithm is a simple probabilistic classifier that calculates a set of probabilities by counting the

frequency and combinations of values in a given data set. In an NBTree, a local naive Bayes is deployed on each leaf

of a traditional decision tree, and an instance is classified using the local naive Bayes on the leaf into which it falls.

After a tree is grown, a naive Bayes is constructed for each leaf using the data associated with that leaf. An NBTree

classifies an example by sorting it to a leaf and applying the naive Bayes in that leaf to assign a class label to it.

3
3. METHODOLOGY

In this research proposed data mining technique is for predicting student’s academic performance by analyzing

student’s feedback using Naive Bayes algorithm [5]. The research process includes the following process (Figure 1).

A. Data Selection
B. Data Transformation
C. Implementation of Naive Bayes algorithm
D. Classification

A. Data Selection :

The data herein was collected by means of feedback rating-scale questionnaire, which is presented in Table 1. In

Table 1 there are nine questions which completely related to teaching and learning process of an institute. The

questions in the questionnaire are measured with a scale value of 1 to 5 whereas in Table 2. Then, the data was

collected from 700 students in various departments of BlueCrest College, Accra, Ghana in the academic year 2014

with the internal examination score. The internal score is taken as an average course wise score (Average of Internal

test I and Internal Test II ) .


Data Selection

Data Transformation

Naive Bayes

Classification Results

Figure 1: Proposed student’s academic performance analysis model.

B. Data Transformation :

The data derived from the feedback questionnaire was transformed into the proper format in order to be analysed

based on naive Bayes model.

Table 1 : Variables for Data Classification (Questionnaire)

S.No. Description Value

1. Do you feel supported by your lecturers? All the mentioned description


parameters are measured
2. Are your lecturers helpful? using a feedback scale score.
1. Below average
3. Are lecturers readily available for consultation out of 2. Average
lectures/ tutorials? 3. Satisfactory
4. Do lecturers display concern for students as 4. Good
individuals? 5. Excellent
5. Do lecturers make the subject content of the lecture/
tutorial as interesting as possible?
4
6. Is the lecturer able to keep your interest at high level
throughout the class?
7. Do lecturers engage with you during
the lecture/ tutorial?
8. Do lecturers seem interested in student’s queries during
the lecture/tutorial?
9. Are the methods of teaching used by
your lecturers innovative?

Table 2 : Variables for Feedback Measures (Scale)

S.No Measures Scale Values

a Below average 1

b Average 2

c Satisfactory 3

d Good 4

e Excellent 5

C. Implementation of Naive Bayes algorithm

The Naive Bayesian algorithm is based on Bayes theorem with independence assumptions between

predictors. A Naive Bayesian model is easy to build, with no complicated iterative parameter estimation which makes

it particularly useful for very large datasets. Despite its simplicity, the Naive Bayesian classifier often does

surprisingly well and is widely used because it often outperforms more sophisticated classification methods.

Bayes theorem provides a way of calculating the posterior probability, P(c|x), from P(c), P(x), and P(x|c). Naive

Bayes classifier assumes that the effect of the value of a predictor (x) on a given class (c) is independent of the values

of other predictors. This assumption is called class conditional independence.

P(c|x) = P(x|c).P(c)/P(x) (1)

 P(c|x) is the posterior probability of class (target) given predictor (attribute).

 P(c) is the prior probability of class.

 P(x|c) is the likelihood which is the probability of predictor given class.

 P(x) is the prior probability of predictor.

5
D. Classification rule

Classification rule is generated based on the classification process based on users request or research needs. This

can be derived specially for the needs on better understanding for each class of data in a database.

MAP: Maximum A Posterior rule generation for feedback prediction.

Assign x to c* if P(C  c * | X  x)  P(C  c | X  x) c  c * , c  c1 ,  ,c L


(2)

Generative classification with the MAP rule

Apply Bayesian rule to convert them into posterior probabilities from (1) and (2).

P( X  x | C  ci ) P(C  ci )
P(C  ci | X  x) 
P ( X  x)
 P( X  x | C  ci ) P(C  ci )
for i  1,2,  , L (3)

The implementation work is based on the collected data which possess various data mining aspects [6]. The Student

data is taken into account for the performance prediction. The proposed research work is categorized into two

modules. First the feedback results are analysed and same is compared with the internal test performance.

The same can be implemented using open source language Java whereas problem is designed as follows:

(i) Input : Sample values for Naive Bayes algorithm

(ii) Input values are given with a scale value between 1 to 5 for all the 9 feedback questioners.

(iii) Mean, Variation values are computed for each questions in the Questionnaire.

(iv) Evidence value is computed

(v) Posterior value is calculated for PASS

(vi) Posterior value is calculated for FAIL

(vii) Based on the Evidence value Prediction value is calculated.

(viii) Output : Prediction of student result is PASS / FAIL

Procedure 1: Implementation of naive Bayes Algorithm

For the above implementation test samples were taken from the student’s feedback and same as transformed

as input for the Procedure 1 finally the prediction result is shown as PASS / FAIL. For implementation a random

sample of 200 values are taken from the student feedback dataset and the values are imported to Ms-SQL database

and same can be given as input for procedure 1. Output for the result dataset with a value of PASS/FAIL.

6
4. RESULTS AND DISCUSSIONS

Data samples are taken out of a total number of 700 student’s record dataset, we chosen sample 200 students

record for our analysis [7]. The confusion matrix [8] demonstrates number of pass, fail in an Internal Examination.

The performance of the above algorithm evaluated using the following three methods are explained below:

Performance Measures

There are some parameters on the basis of which we evaluated the performance of the classifiers such as TP

rate, FP rate, precision, Recall, F- Measure. The Accuracy of a classifier on a given test set is the percentage of test set

tuples that are correctly classified by the classifier. The Error Rate or Misclassification rate of a classifier M, which is 1-

Acc (M), where Acc (M) is the accuracy of M. The Confusion Matrix is a useful tool for analysing how well your

classifier can recognize tuples of different classes. The sensitivity and specificity measures can be used to calculate

accuracy of classifiers. Sensitivity is also referred to as the true positive rate (the proportion of positive tuples that are

correctly identified), while Specificity is the true negative rate (that is, the proportion of negative tuples that are

correctly identified). These measures are defined as follows

(4)

(5)

(6)

Where (4),(5),(6) T-Pos is the number of true positives tuples that were correctly classified, Pos is the number of

positive tuples, T-Neg is the number of true negatives tuples that were correctly classified, Neg is the number of

negative tuples, and F-Pos is the number of false positives tuples that were incorrectly labelled. It can be shown that

accuracy is a function of sensitivity and specificity:

(7)

True Positive Rate: It is the proportion of actual positives which are predicted as positive. The formula is defines as,

(8)

7
Where Tp stands for true positive and Fn stands for false negative.

FP rate: It is the rate of negatives tuples that are incorrectly labelled. The formula is defined as,

(9.a)

(9.b)

TABLE 2. CONFUSION MATRIX

Total samples taken PASS FAIL

200 0 0

0 166 0

0 0 34

From a total number of 700 students record dataset, we chosen sample 200 students record for our analysis. The

confusion matrix demonstrates number of pass, fail in their internal examination. Number of pass students are 166.

Number of Fail student is 34. The data analysis is performed with the methods of precision, recall and f-measure.

These three methods are explained below:

Precision

Prediction is a calculation of positive predicted values precision, which is the fraction of retrieved documents

that are relevant. The precision is calculated using the formula as:

(10)

Precision takes all retrieved documents into account, but it can also be evaluated at a given cut-off rank,

considering only the topmost results returned by the system. This measure is called precision at n.

Recall

Recall in information retrieval is the fraction of the documents that are relevant to the query and that are

successfully retrieved. The formula for recall is as given below.

(11)

8
F-Measure

This is a measure that combines precision and recall, a harmonic mean of precision and recall, is known as the

traditional F-measure.

(12)

TABLE 3. Performance Measures

Algorithm Tp Fp Precision Recall F-Measure

Naive Bayes 0.947 0.474 0.922 0.947 00.934

Naive Bayes
1
0.5
Naive
0
Bayes

Figure 2. Representation of Performance measure values

5. CONCLUSIONS

Using Naive Bayes algorithm, we predicted the pass percentage and fail percentage of the Overall students

appeared for a particular examination with a comparison of their feedback regarding their course sessions and

internal marks. The results show the students’ performance and it is seems to be accurate. The comparison between

feedback and internal examination marks Navie Bayes algorithm gives the better prediction result and it is measured

using confusion matrix. The results are predicted within 2 seconds. This simple analysis works show that the proper

data mining application on student’s performance data can be efficiently used for vital hidden knowledge /

information retrieval from the vast data, which can be used for the process of decision making by the management of

an educational institution. It helps the institutions to identify the weaker students in advance and they can arrange

special measures to get good score. This paper also concludes with that for data mining application for effective and

faster results prediction, classification and clustering and the institutions can improve their quality based on the

analysis and to conduct the special training to their students.

9
6. REFERENCES

1. Alam, M. and Alam, S. (2012). Actionable Knowledge Mining from Improved Post Processing Decision Trees,

International Conference on Computing and Control Engineering, pp. 1-8.

2. Ayinde, A. Adetunji, A. Bello, M. and Odeniyi, O. (2013). Performance Evaluation of Naive Bayes and Decision

Stump Algorithms in Mining Students’ Educational Data. “IJCSI International Journal of Computer Science Issues,

Vol. 10, Issue 4”.

3. Azwa Abdul Aziz and Nor Hafieza Ismailand Fadhilah Ahmad, (2014). First Semester Computer Science

Students’ Academic Performances Analysis by Using Data Mining Classification Algorithms, Proceeding of the

International Conference on Artificial Intelligence and Computer Science, pp. 15-16.

4. Durairaj, M. and Vanitha, M. (2014). Educational Data mining for Prediction of Student Performance Using

Clustering Algorithms, International Journal of Computer Science and Information Technologies, pp. 5987-5991.

5. Jason, D. and Rennie, M. and Lawrence, S. (2010). Tackling the Poor Assumptions of Naive Bayes Text

Classifiers, “Artificial Intelligence Laboratory”.

6. Liu, B. (2011). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Data-Centric Systems and

Applications, Springer-Verlag Berlin Heidelber, pp. 63-68.

7. Rauf, A. and Sheeba (2012). Enhanced K-Mean Clustering Algorithm to Reduce Number of Iterations and

Time Complexity, Middle-East Journal of Scientific Research, pp. 959-963.

8. Thai-Nghe, N. Busche, A. and Schmidt-Thieme, L. (2009). Improving Academic Performance Prediction by

Dealing with Class Imbalance, International Swaps and Derivatives Association, pp. 878-883.

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy