Compare
Compare
In this section, we compare the performance of our machine learning models with those
reported in previous research. The goal is to determine how well our method works, particularly
when combined with a prior dataset and actual university exam questions.
● Performance from our proposed models using 2000 and 4000 samples,
● Results from other referenced models (from the image) wherever accuracy is available.
- LSTM 71.00%
BERT 88.50%
RoBERTa 88.20%
DistilBERT 88.50%
TextCNN 89.75%
BERT 87.57%
RoBERTa 87.50%
DistilBERT 86.30%
TextCNN 81.99%
Given the realistic and diverse nature of our dataset, it is clear from the table that our model
ensembling method is quite competitive. While the NCERT dataset yielded the highest result
(94.10%) due to its scale and structure, our combined dataset of real and research-based
questions still performed robustly.
● On 2000 data, our ensemble model matched and in some cases outperformed results
from other BERT-based and CNN models.
● Overall, our contextual models and ensemble strategy performed better than other
research that used standalone methods (TF+SVM, CNN, LSTM).
This comparison demonstrates our model's ability to handle a variety of question kinds and
cognitive levels in authentic assessment settings, in addition to highlighting the value of
ensemble learning.