0% found this document useful (0 votes)
23 views11 pages

Eti 2 - Compressed

The document discusses 5 potential big data project ideas: 1) traffic control using big data, 2) building a search engine, 3) medical insurance fraud detection, 4) designing a data warehouse for an e-commerce site, and 5) text mining. It provides details on each project idea and includes source code examples. The document aims to provide readers with real-world examples of big data projects they could implement.

Uploaded by

gayatripophale4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views11 pages

Eti 2 - Compressed

The document discusses 5 potential big data project ideas: 1) traffic control using big data, 2) building a search engine, 3) medical insurance fraud detection, 4) designing a data warehouse for an e-commerce site, and 5) text mining. It provides details on each project idea and includes source code examples. The document aims to provide readers with real-world examples of big data projects they could implement.

Uploaded by

gayatripophale4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

MATA MAHAKALI POLYTECHNIC,

WARORA DIST-CHANDRAPUR
DIPLOMA IN COMPUTER TECHNOLOGY
SESSION:2023

TITLE OF PROJECT:
Prepared a Report on Big Data Analysis

A PROJECT REPORT

SUMBITED BY
TEJASWINI LANDGE

GUIDED BY
D.M.Nagrikar
CERTIFICATE

This is certify
That Mr./Ms.
…………………………………………………………………………………..
From : Mata Mahakali Polytechnic Warora
Enroll No : 1193
During The academic year 2022-2023. The project
completed by individually in a group consisting of
D.M.Nagrikar Sir person the guidance of the facuity guide.

Has completed project of third year having title”prepare a


report on osi reference model”.

Tejaswini Landge

Name & Signature of Guide :


………………………………………
Table Of Contents

SR.NO CONTENT

1 INTRODUCTION

2 BIG DATA PROJECT IDEAS

3 WHY PROJECT IMPORTANCE

4 CONCLUSION
Introduction:

Almost 6,500 million linked gadgets communicate data via


the Internet nowadays. This figure will climb to 20,000 million
by 2025. This “sea of data” is analyzed by big data to
translate it into the information that is reshaping our world.
Big data refers to massive data volumes – both organized
and unstructured – that bombard enterprises daily. But it’s
not simply the type or quantity of data that matters; it’s also
what businesses do with it. Big data may be evaluated for
insights that help people make better decisions and feel
more confident about making key business decisions. Big
data refers to vast, diversified amounts of data that are
growing at an exponential rate. The volume of data, the
velocity or speed with which it is created and collected, and
the variety or scope of the data points covered (known as the
“three v’s” of big data) are all factors to consider. Big data is
frequently derived by data mining and is available in a variety
of formats.

Unstructured and structured big data are two types of big


data. For large data, the term structured data refers to data
that has a set length and format. Numbers, dates, and
strings, which are collections of words and numbers, are
examples of organized data. Unstructured data is
unorganized data that does not fit into a predetermined
model or format. It includes information gleaned from social
media sources that aid organizations in gathering
information on customer demands.

.
Big Data Project Ideas:

1. Traffic control using Big Data

Big Data initiatives that simulate and predict traffic in real-


time have a wide range of applications and advantages. The
field of real-time traffic simulation has been modeled
successfully. However, anticipating route traffic has long
been a challenge. This is because developing predictive
models for real-time traffic prediction is a difficult endeavor
that involves a lot of latency, large amounts of data, and
ever-increasing expenses.

The following project is a Lambda Architecture application


that monitors the traffic safety and congestion of each street
in Chicago. It depicts current traffic collisions, red light, and
speed camera infractions, as well as traffic patterns on 1,250
street segments within the city borders.

These datasets have been taken from the City of Chicago’s


open data portal:

 Traffic Crashes shows each crash that occurred within


city streets as reported in the electronic crash reporting
system (E-Crash) at CPD. Citywide data are available
starting September 2017.
 Red Light Camera Violations reflect the daily number of
red light camera violations recorded by the City of
Chicago Red Light Program for each camera since 2014.
 Speed Camera Violations reflect the daily number of
speed camera violations recorded by each camera in
Children’s Safety Zones since 2014.
 Historical Traffic Congestion Estimates estimates traffic
congestion on Chicago’s arterial streets in real-time by
monitoring and analyzing GPS traces received from
Chicago Transit Authority (CTA) buses.
 Current Traffic Congestion Estimate shows current
estimated speed for street segments covering 300 miles
of arterial roads. Congestion estimates are produced
every ten minutes.The project implements the three
layers of the Lambda Architecture:

 Batch layer – manages the master dataset (the source of


truth), which is an immutable, append-only set of raw
data. It pre-computes batch views from the master
dataset.

batch layer) or building views from the processed dat


2. Search Engine

To comprehend what people are looking for, search engines must


deal with trillions of network objects and monitor the online behavior
of billions of people. Website material is converted into quantifiable
data by search engines. The given project is a full-featured search
engine built on top of a 75-gigabyte In this project, we will use several
datasets like stopwords.txt (A text file containing all the stop words in
the current directory of the code) and wiki_dump.xml (The XML file
containing the full data of Wikipedia). Wikipedia corpus with sub-
second search latency. The results show wiki pages sorted by TF/IDF
(stands for Term Frequency — Inverse Document Frequency)
relevance based on the search term/s entered.
3. Medical Insurance Fraud Detection

A unique data science model that uses real-time analysis and


classification algorithms to assist predict fraud in the
medical insurance market. This instrument can be utilized by
the government to benefit patients, pharmacies, and doctors,
ultimately assisting in improving industry

 Part D prescriber services- data such as name of doctor,


addres of doctor, disease, symptoms etc.
 List of Excluded Individuals and Entities (LEIE)
database: This database contains a rundown of people
and substances that are prohibited from taking an
interest in governmentally financed social insurance
programs (for example Medicare) because of past
medicinal services extortion.
 Payments Received by Physician from Pharmaceuticals
 CMS part D dataset- data by Center of Medicare and
Medicaid Services

It has been developed by taking consideration of different


key features with applying different Machine Learning
Algorithms to see which one performs better.

Source Code – Medical Insurance Fraud


4. Data Warehouse Design for an E-Commerce Site

A data warehouse is essentially a vast collection of data for a


company that assists the company in making educated
decisions based on data analysis. The data warehouse
designed in this project is a central repository for an e-
commerce site, containing unified data ranging from
searches to purchases made by site visitors. The site can
manage supply based on demand (inventory management),
logistics, the price for maximum profitability, and
advertisements based on searches and things purchased by
establishing such a data warehouse. Recommendations can
also be made based on tendencies in a certain area, as well
as age groups, sex, and other shared interests. This is a data
warehouse implementation for an e-commerce website
“Infibeam” which sells digital and consumer electronics.

Source Code – Data Warehouse Design

Text Mining ProjectYou will be required to perform text


analysis and visualization of the delivered documents as part
of this project. For beginners, this is one of the best deep
learning project ideas. Text mining is in high demand, and it
can help you demonstrate your abilities as a data scientist.
You can deploy Natural Language Process Techniques to
gain some useful information from the link provided below.
The link contains a collection of NLP tools and resources for
various languages.

Source Code – Text Mining


5. Text Mining Project

Big Data Cybersecurity

The major goal of this Big Data project is to use complex


multivariate time series data to exploit vulnerability
disclosure trends in real-world cybersecurity concerns. This
project consists of outlier and anomaly detection
technologies based on Hadoop, Spark, and Storm are
interwoven with the system’s machine learning and
automation engine for real-time fraud detection and intrusion
detection to forensics.

For independent Big Data Multi-Inspection / Forensics of


high-level risks or volume datasets exceeding local
resources, it uses the Ophidia Analytics Framework. Ophidia
Analytics Framework is an open-source big data analytics
framework that contains cluster-aware parallel operators for
data analysis and mining (subsetting, reduction, metadata
processing, and so on). The framework is completely
connected with Ophidia Server: it takes commands from the
server and responds with alerts, allowing processes to run
smoothly.

Source Code – Big Data Cybersecurity


Conclusion:

We’ve examined some of the best big data project


ideas in this article. We began with some simple
projects that you can complete quickly. After you’ve
completed these beginner tasks, I recommend going
back to understand a few additional principles before
moving on to the intermediate projects. After you’ve
gained confidence, you can go on to more advanced
projects.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy