Legal Case Classification Using Machine Learning With NLP
Legal Case Classification Using Machine Learning With NLP
Authorized licensed use limited to: MEHRAN UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 26,2024 at 21:40:17 UTC from IEEE Xplore. Restrictions apply.
improving algorithmic and hardware methods (iii) these factors measuring and checking the performance of the classifier. The
can help in classifying the case in civil and criminal cases. trial findings and the statistical distribution of characteristics
[5].
A method for determining the essential kind of semantic link
between things, known as semantic similarity, was suggested
by Rafe Athar Shaikha. It is built on Google Directory, the
Open Directory Project’s search interface. Two applications
of semantic similarity measurement are the generation of
entities through knowledge acquisition and the fine-grained
classification of entities [6].
It takes a supreme court decision involving Indian law
Fig. 1. Legal Case Classifier and generates a card which contains score of counts of how
frequently each Justice voted in favor of the ”pro-Indian”
C. Objectives result. In order to demonstrate that, despite a general trend, a
Whenever a crime is committed, many people need to know more ”liberal” Justice is more likely to promote the pro-Indian
what steps to take to get justice. Moreover, this scenario is interest, it links those findings to Justice’s political philosophy
especially true when the offence could be more serious. They [7].
are unaware if it is a civil or criminal case, as there are Grant Christensen uses NLP, referring to the indexed key-
different steps to follow for both offences. With the advent of word that stores multiple laws identified. Based on positive and
artificial intelligence, its advantages can also be used in various negative feedback, the system’s modelling takes place. The
other fields. Moreover, this is also true in the field of law. This positive or negative feedback revises the reward and makes
research will focus on solving the problem mentioned with the itself learn from the outcome, so the system provides a solution
help of NLP. based on user feedback [8].
Five traditional ML models named LR (Logistic Regres-
D. Challenges sion), Bagging model, RF(Random Forest) model and SVM
For the model’s accuracy, we must collect real-time data on were compared by D. Sangeetha. The four machine learning
civil and criminal cases, which is time-consuming.In India, models other than SVM with different settings and the text
there is not much research on AI’s benefit in law as, unlike in with semantic information were helpful for feature selection
other countries, Indian data is stored in hard copy form. Nev- for the models used for prediction [9].
ertheless, as digitization is increasing, data is being scanned A study was done in 2017 by Zhenyu Liu using data from
and stored as a soft copy, but it is still in process.As Indian actual judicial cases, the research compares decision trees,
law and types of cases have a vast range, it is impossible to SVM, KNN, and random forests. It describes the experimental
cover every case, and it will be time-consuming and tedious. setup, including the selection of features, the feature selection
procedure, and the division of the train-test set. The findings
II. RELATED WORK are presented after a performance analysis of each model’s
John J. Nay endeavoured to incorporate explicit legal knowl- accuracy, precision, recall, and F1 score. In the context of
edge into legal judgment prediction. Based on deep learning judicial situations, the study also addresses the advantages,
methods, the suggested model resembles factual information disadvantages, and variables affecting prediction performance
about laws as first-order logic rules. These are then seamlessly [10].
integrated into a co-attention model based on a network, This paper described the methodology used to develop the
forming an end-to-end approach. Including this knowledge question answering system. This involved techniques such as
imparts an inductive bias that aids in comprehending legal natural language processing (NLP), and text classification.
data [1]. The authors may discuss the process of curating a dataset
In 2021, Leilei Gan created a novel dataset for predicting of IPC sections and Indian amendment laws, as well as the
legal judgments in English. This dataset included cases from pre-processing steps applied to the legal texts. The author
the European Court of Human Rights. The new set of data presented the architecture and functionality of their question
outperformed the past model which was based on feature. answering system, explaining how it can handle user queries,
Additionally, the study investigated whether models exhibited identify relevant keywords, and retrieve appropriate answers
biases towards demographic information by employing data from the legal corpus. They may also discuss the metrics which
anonymization techniques [2]. are used in evaluation to assess how the system is performing,
Souvik Sengupta uses the vector space model and NLP tech- such as precision, recall, and accuracy [12].
niques; a new problem suggested an IPC section appropriate The paper is focused on applying ML techniques to crime
from user input consisting of crime-related reports [4]. pattern detection, analysis, and prediction tasks. It covers
Ambrish Srivastav utilised traditional machine learning clas- various aspects of the process, including feature selection,
sification techniques, and Leave-one-out cross-validation was model training, and evaluating the results. This may include
used to get the findings. Traditional Metrics are used for algorithms which are supervised such as DT, RF, SVM, or
Authorized licensed use limited to: MEHRAN UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 26,2024 at 21:40:17 UTC from IEEE Xplore. Restrictions apply.
unsupervised learning techniques like clustering algorithms. hidden size of 768. BERT’s contextual representation learning
This paper includes the selection of features and the training capabilities make it effective at understanding and capturing
process of the models [13]. the nuances in text, which can be valuable for classifying civil
and criminal cases. The Architecture of BERT-Base is shown
III. RESEARCH METHODOLOGY
in Figure 3.
A. Data Corpus Collection
In this research paper, Civil and Criminal Dataset is used,
and these are processed, which is used for training the model.
This dataset contains 393 samples separated into two classes
1) 205 Civil cases.
2)188 Criminal cases.
This sample dataset was collected from the Central Bureau of
Investigation and Indian Kanoon sites. Since the dataset was
minimal, dummy data was also added.
The civil cases dataset is divided based on the following:
• Personal injury claims
• Property disputes
• Breach of contract
• Employment disputes
• Divorce and family law cases
• Debt collection cases
• Landlord-tenant disputes
• Intellectual property disputes
• Consumer protection cases
• Medical malpractice claims
Authorized licensed use limited to: MEHRAN UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 26,2024 at 21:40:17 UTC from IEEE Xplore. Restrictions apply.
Final working architecture of Bert-Base model IV. RESULTS
After training Bert-Base Model, Na¨ıve Bayes, Random
Forest, and SVM using our collected dataset, the results are
as follows.
TABLE I
PREDICTION TABLE
ML Metrics Used
MODELS Accuracy Precision Recall F1-Score
Bert-Base 0.929 0.93 0.93 0.93
SVM 0.878 0.88 0.88 0.88
Naive Bayes 0.919 0.92 0.92 0.92
Random Forest 0.878 0.88 0.88 0.88
.
Authorized licensed use limited to: MEHRAN UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 26,2024 at 21:40:17 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Proposed Model Confusion Matrix Fig. 9. Confusion Matrix
Authorized licensed use limited to: MEHRAN UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 26,2024 at 21:40:17 UTC from IEEE Xplore. Restrictions apply.
REFERENCES
[1] John J. Nay (2021), “Natural Language Processing for Legal Texts.”,
Legal Informatics. Cambridge University Press, pp 1-35.
[2] Leilei Gan, Kun Kuang, Yi Yang, Fei Wu (2021), “Judgment Prediction
via Injecting Legal Knowledge into Neural Networks”, AAAI Technical
Track on Speech and Natural Language Processing I (Issue: Vol. 35 No.
14).
[3] Ilias Chalkidis, Ion Androutsopoulos, Nikolaos Aletras (2019), “Neural
Legal Judgment Prediction in English”, arXiv:1906.02059v1 [cs.CL].
[4] Souvik Sengupta, Vishwang Dave (2022), “Predicting applicable law
sections from judicial case reports using legislative text analysis with
machine learning” Springer: Journal of Computational Social Science,
pp. 503–516 (2022).
[5] Ambrish Srivastav, Shaligram Prajapat(2021), “Text similarity algo-
rithms to determine Indian penal code sections for offence report, IAES
International Journal of Artificial Intelligence (IJ-AI) March 2022, pp.
34-40.
[6] Rafe Athar Shaikha, Tirath Prasad Sahua, Veena Anand “Predicting
outcomes of Legal Cases based on Legal Factors using Classifiers”, In-
ternational Conference on Computational Intelligence and Data Science
(ICCIDS 2019).
[7] Jiahui Liu, Larry Birnbaum (2007),”Measuring Semantic Similarity
between Named Entities by Searching the Web Directory”, International
Conference on Web Technology.
[8] Grant Christensen (2021), ”Predicting Supreme Court Behavior in
Indian Law Cases”, Michigan Journal of Race and Law, vol. 26.
[9] D. Sangeetha, R. Kavyashri, S. Swetha, S. Vignesh (2016), ”Information
retrieval system for laws”, Eighth International Conference on Advanced
Computing (ICoAC).
[10] Zhenyu Liu, Huanhuan Chen (2017), ”A predictive performance com-
parison of machine learning models for judicial cases”, IEEE Sympo-
sium Series on Computational Intelligence (SSCI).
[11] Afnan Iftikhar, Syed Waqar Ul Qounain Jaffry, Muhammad Kamran
Malik (2019), “InformationMining from Criminal Judgments of Lahore
High Court”, IEEE
[12] R. P. Kamdi, A. J. Agrawal,(2015)”Keywords based Closed Domain
Question Answering System for Indian Penal Code Sections and Indian
Amendment Laws”, International Journal of Intelligent Systems and
Applications, vol. 7, no. 12, pp. 57–67
[13] Rohit Patil, Muzamil Kacchi, Pranali Gavali, Komal Pim-
paria(2020),“Crime Pattern Detection, Analysis Prediction using
Machine Learning.”, International Research Journal of Engineering
and Technology (IRJET).
[14] Nikolaos Aletras, Dimitrios Tsarapatsanis,Daniel Preo¸tiuc-Pietro
,Vasileios Lampos(2016) “Predicting judicial decisions of the European
Court of Human Rights: A Natural Language Processing perspective”,
PeerJ Computer Science
[15] Thaer sahmoud, Dr.MohammadA (2022), “Spam Detection Using
BERT” Computer Engineering Department, Islamic University of Gaza,
Palestine, arXiv.2206.02443.
Authorized licensed use limited to: MEHRAN UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 26,2024 at 21:40:17 UTC from IEEE Xplore. Restrictions apply.