Mini Project Report Format
Mini Project Report Format
Mini Project Report Format
2021-22
Name of Supervisor
Guide Name
Assistant Professor/Associate Professor/Professor
Name Team Member(s)
Student Name (Roll No)
Declaration
I/We herewith declare that the project work conferred during this report entitled
“……………………………………………” , in partial fulfillment of the necessity for
the award of the degree of Bachelor of Technology in Information Technology,
submitted to A.P.J. Abdul Kalam Pradesh Technical University, Uttar Pradesh, is an
authentic record of my/our own work distributed in Department of Information
Technology & Engineering, G.L. Bajaj Institute of Technology & Management, Greater
Noida. It contains no material antecedently printed or written by another person except
wherever due acknowledgement has been created within the text. The project work
reported during this report has not been submitted by me/us for award of the other degree
or certification.
Signature: Signature:
Name: Name:
Roll No : Roll No :
Signature:
Name:
Roll No :
Date:
Certificate
Date:
Acknowledgement
We would like to express our sincere thanks to our project supervisor Guide Name and
our Head of department Dr. P.C Vashist for their invaluable guidance and suggestions.
This project helped to us to understand the concept of machine learning and IOT. This
project enriches our knowledge and experience of working in a team and a live project.
Also, we would like to express gratitude to Faculty Name for his/her help in
preparation and overview of our project.
Lastly, we would like to thank all the faculties for providing their valuable time
whenever needed for helping us carry on with our project.
TABLE OF CONTENTS (Sample)
ABSTRACT 8
1.1 INTRODUCTION 9
1.2 HISTORY 10
1.3 OPENCV 11
1.5.2 PREPROCESSING 13
1.5.3 SEGMENTATION 14
1.7 ADVANTAGES 15
2.1 VEO 18
2.2 SOLOSHOT 21
3.1 INTRODUCTION 23
3.8.1 WORKING 33
4.5 RASBERRY PI 47
5.1 AGENDA 60
5.4 REQUIRENMENTS 63
5.5 SETUP 64
5.11 STEPS(YOLO) 73
6.2 CONCLUSION 79
REFERENCES 82
Remark:
Instructions for Formatting the Project Report:
1. All the fonts of the text should be in Times New Roman.
2. The Heading should be of Font Size=14 with Bold.
3. The text should be of Font Size=12.
3. There should be 1.5 line spacing between the Texts.
4. Figure & its caption should be center justified with font size 10.
5. Table and its caption should be center justified with font size 10.
6. All the text should be Justified (Select text -> Ctrl +J) in the
Project.
7. Project report should be plag free (less than 10% similarity).
8. Project Report should not less than 60 pages and printed on bond
paper.
9. Hard Binding should be blue with golden print ( contact to
supervisor).
CHAPTER-1
NTRODUCTION
Introduction
News has been the provider of information since centuries. In traditional times, there were news
agencies which were the source of news and hence, reliability and confidentiality remained with
the official organizations itself. In recent times, internet grew rapidly from rural to urban areas.
With the growth of internet, more users from all over the world got access to internet and to
spread the information in their way [1].
According to Economic Times report of 2019, there are 627 million internet users in India which
means India is home to world’s second largest internet user base [2]. However, with the
increasing popularity of social media, the internet becomes ideal breeding ground for fake news.
A research by BBC shows that nearly 72% Indians struggled to distinguish between fake and
real news [3]. Websites like The Onion[4], News Thump[5], The Poke News[6], and The Mash
News[7] are among the top rankers of ‘Fake’ or ‘misleading’ news propagator [8]. Hence, many
online fact checking resources like Snopes[9], FactCheck.org[10], Factmata.com[11],
PolitiFact.com[12] and many more grew rapidly. Social networking sites such as Facebook,
Whatsapp, and Google addressed this particular concern but the efforts hardly contributed in
solving the issue.
1.1 Detection Approaches Based on Machine Learning: Support Vector Machines (SVMs),
Random forests, logistic regression models, Conditional Random Field (CRF) classifiers,
Hidden Markov Models (HMMs) [13].
1.2 Detection approaches based on deep learning: The two most widely implemented
paradigms in modern artificial neural networks are Recurrent Neural Networks (RNN) and
Convolutional Neural Networks (CNN) [13].
This model will detect fake news by checking the credibility of the news provider, comment sentiment
analysis and content of the provided news. We will be using Natural Language Processing for pre-
processing the dataset and machine learning approach to fight fake news.
In India, fact check has recently been launched by India Today, Times of India, and AFP India
but these resources do not provide platform for users to check whether the news article they are
viewing is fake or real. AltNews [17] has been successful in India to provide platform for user to
clear their doubt, though it is yet to get more efficient and reliable.
Models like Fact Finder, only check whether the news is fake or real. On the other hand,
AltNews website or app works on fake news and publish viral fake news articles. Our model,
performs both work simultaneously.
PROPOSED WORK
In this paper a model is build based on pre-processing data with the use of NLTK library,
removing all the stopwords such as “the”, “is”, and “are” and only using those words which are
unique and provide us with relevant information. We also removed punctuations, numbers and
converted our dataset into lowercase letters. Also we have used Count Vectorizer or TF-IDF
matrix which tallies to how often the word in used in a given article in our dataset, Figure 2
depicts the process from collecting News Articles Dataset to using News Classification
Algorithm. Since the problem concerns with text classification and information extraction, we
have used Naïve Bayes classifier for text-based classification. For training and testing, we have
used Multinomial NB and Passive Aggressive Classifier with 33% training dataset. We will also
remove rare words occurring in our corpus with the help of Count Vectorizer [18-20].
The goal of the project is to make a website and app for user so that whenever he/she selects a
text, the app reflects with floating window and provides user with the percentage of fake and
real news of the selected text. The advantage with the app or website is that without opening or
uploading any content in the app, the app will detect fake news.
Fi
gure 2: Process Flow Diagram
METHODOLOGY
In this section, methodology of proposed model has been described. Figure 3 represents work
flow of methods involved in creating the model. The major steps involved in building the model
are:
1. Corpus of Text Document
5. Modeling
Figure 3: Methodology
Currently, the model has been trained using dataset from Kaggle [21] with 6335 rows and 4
columns. News articles will be scraped from, inshorts [22], with the help of python libraries
along with NLTK and spacy. A typical news article is also in the HTML section as depicted in
the following image:
Figure 4: The landing page for technology news articles and its corresponding HTML structure [23]
The specific HTML tags can also be used which contain the textual content [24]. Hence, with
the help of libraries such as BeautifulSoup and requests, useful content will be scraped.
Collected dataset contains 6335 rows and 4 columns; the head of the dataset has been depicted
in the following Figure 5:
Here, the nltk and spacy packages both have been leveraged to process the data. Stopwords can
be used to process data and remove the most common words used in our dataset such as
“and”,”the” and “is”. Along with stop words, HTML tags, accented text, expand contractions,
punctuations, numbers, and special characters are also needed to be removed since they do not
provide relevant information. Lemmatizing and stemming text are done with the help of
functions such as lemmatize_text() and simple_stemmer() respectively.
With the help of TF-IDF vectorizer, word importance in a given article in the entire corpus is
determined. [25]
For better understanding of the dataset, we use matplotlib and seaborn libraries for visualization
and plotting graphs. Using stripplot() method, present in seaborn library statistical plot as
depicted in Figure 6 was formed which shows 0~5000, datasets are REAL while from
5000~10000, datasets are FAKE. CountVectoriser library to remove the rare words was
imported.
Figure 6: Dataset Visualization of Fake news and Real news using Seaborn
X-axis represents label(fake or real), y-axis represents Index
Modeling and Grid Search
With the help of Multinomial NB and Passive Aggressive Classifier, 33% of the dataset was
trained and testing rest 67%. Using confusion matrix, highest accuracy model will be achieved.
[26]
Let’s consider the result as positive, when the classifier classifies news articles as fake:
● The number of True Positives is the number of news articles correctly classified as Fake
News;
● The number of False Positives is the number of news articles incorrectly classified as
Fake News;
● The number of True Negatives is the number of news articles correctly classified as True
News;
● The number of True Positives is the number of news articles incorrectly classified as
True News;
where:
tp – number of true positive examples;
fp – number of false positive examples.
The recall of a classifier is calculated as follows:
The figure shows the matrix without normalization. Here the results of the matrix changes as the
classification models or vectorizers are changed.
The precision for the given classifying model is 0.902; recall on the other hand is 0.486.
The precision of the model represents the relevant instances among the retrieved instances,
while recall is the fraction of total amount of relevant instances that were actually retrieved.
In future, VADER for sentiment analysis can be used which is more efficient algorithm and a
text classification model that provides us with highest accuracy. Also, existing Fake News
Detection models have worked for news and politics only, scope in Stock Markets, where shares
rise and fall very frequently, still persists.
REFERENCES
1. Kuriakose, Ammu, et al. "ALIKAH-A Clickbait and Fake News Detection System using Natural
Language Processing." 2019 3rd International Conference on Trends in Electronics and Informatics
(ICOEI). IEEE, 2019.
2. “India has second highest number of Internet users after China” - economictimes.com, 2019[Online].
Available : https://economictimes.indiatimes.com
3. “Ordinary Indians are fueling the country’s fake-news crisis” – qz.com, 2018[Online]. Available:
https://qz.com/india
4. “The Onion” – theonion.com [Online]. Available: https://www.theonion.com/
5. “News Thump” – newsthump.com [Online]. Available: https://newsthump.com/
6. “Poke News” – pokenews.com [Online]. Available:
https://thepoke.co.uk/category/news/
7. “Mash News” – mashnews.com [Online].
Available: https://www.thedailymash.co.uk/news
8. “Top 50 Fake News Websites And Blogs on the Web in 2019” – blog.feedspot.com, 2019[Online].
Available: https://blog.feedspot.com/fake_news_blogs/
9. “Snopes” – snopes.com [Online]. Available: https://www.snopes.com/
10. “FACTCHECK.ORG” – factcheck.org [Online]. Available: https://www.factcheck.org/
11. “FACTMATA” – factmata.com [Online]. Available: https://factmata.com/
12. “Fact Checking U.S. Politics | PolitiFact ” – politifact.com [Online].
Available: https://politifact.com/
13. Bondielli, Alessandro, and Francesco Marcelloni. "A survey on fake news and rumour detection
techniques." Information Sciences 497 (2019): 38-55.
14. “Protecting the EU Elections From Misinformation and Expanding Our Fact-Checking Program to New
Languages” – aboutfb.com[Online]. Available: https://about.fb.com/news
15. "B.S. Detector - Browser extension to identify fake news sites", Bsdetector.tech, 2018. [Online].
Available: http://bsdetector.tech/.
16. “Messenger platform Flock launches feature to identify fake news”, economictimes.com, 2019 [Online].
Available: https://m.economictimes.com/small-biz
17. “Alt News”, altnews.com [Online]. Available: https://www.altnews.in/
18. N. J. Conroy, V. L. Rubin, and Y. Chen, “Automatic deception detection: Methods for finding fake
news,” Proceedings of the Association for Information Science and Technology, vol. 52, no. 1, pp. 1–4,
2015.
19. S. Feng, R. Banerjee, and Y. Choi, “Syntactic stylometry for deception detection,” in Proceedings of the
50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2,
Association for Computational Linguistics, 2012, pp. 171–175.
20. Shlok Gilda,Department of Computer Engineering, Evaluating Machine Learning Algorithms for Fake
News Detection,2017 IEEE 15th Student Conference on Research and Development (SCOReD)
21. “Kaggle”, kaggle.com [Online]. Available: https://kaggle.com
22. “inshorts - stay informed”, inshorts.com [Online]. Available: https://inshorts.com
23. “A Practitioner's Guide to Natural Language Processing (Part I) — Processing & Understanding Text”,
towardsdatascience.com, 2019 [Online]. Available: https://towardsdatascience.com
24. M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised learning of sentence embeddings using
compositional n-gram features,” arXiv preprint arXiv:1703.02507, 2017.
25. H. Rashkin, E. Choi, J. Y. Jang, S. Volkova, Y. Choi, and P. G. Allen, “Truth of Varying Shades:
Analyzing Language in Fake News and Political Fact-Checking,” in Proceedings of the 2017
Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2931–2937.
26. M. Balmas, “When Fake News Becomes Real: Combined Exposure to Multiple News Sources and
Political Attitudes of Inefficacy, Alienation, and Cynicism,” Communic. Res., vol. 41, no. 3, pp. 430–
454, 2014.
27. Naive Bayes classifier. (n.d.) Wikipedia. [Online]. Available:
https://en.wikipedia.org/wiki/Naive_Bayes_classifier. Accessed Feb. 6, 2017.