Big Data and Blockchain Basics: Dr. Poonam Saini Poonamsaini@pec - Edu.in
Big Data and Blockchain Basics: Dr. Poonam Saini Poonamsaini@pec - Edu.in
Presenter:
Dr. Poonam Saini
poonamsaini@pec.edu.in
Data platform landscape map
• Complex array of current data platform providers
• Compares platform capabilities
• Understand where providers intersect and diverge
• Identify shortlists of choices to choose enterprise needs
Big Data Everywhere!
• Lots of data is being collected and warehoused
– Web data, e-commerce
– Purchases at grocery stores
– Bank/Credit Card transactions
– Social Network
– Healthcare Network
– Machines/Automobiles
What is “big data”?
• "Big Data are high-volume, high-velocity, high-variety
information assets that require new forms of processing to
enable enhanced decision making, insight discovery and
process optimization” (Gartner 2012)
• Complicated (intelligent) analysis of data may make a small
data “appear” to be “big”
• Bottom line: Any data that exceeds our current capability of
processing can be regarded as “big”
Why is “big data” a “big deal”?
• Government
– Obama administration announced “big data” initiative
– Many different big data programs launched
• Private Sector
– Walmart handles more than 1 million customer transactions every hour, which is
imported into databases estimated to contain more than 2.5 petabytes of data
– Facebook handles 40 billion photos from its user base.
– Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts
world-wide
• Science
– Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days.
– Biomedical computation like decoding human Genome & personalized medicine
– Social science revolution
Lifecycle of Data: 4 “A”s
Aggregation Int
ata eg
d D rat
e re ed
catt Da
S ta
Acquisition Analysis
ge
Log w led
dat o
a Application Kn
Computational View of Big Data
Data Visualization
Formatting, Cleaning
Storage Data
Big data- from 3v’s to 4V’s
Big Data’s Properties
• Variety - the stored data is not all of the same type or category
– Structured data - data that is organized in a structure so that it is
identifiable e.g. SQL data
– Semi-structured data - a form of structured data that has a self-
describing structure yet does not conform with the formal structure
of a relational database e.g. XML
– Unstructured data - data with no identifiable structure e.g. image
Big Data’s Properties…
• Volume - The “Big” in Big data and represents the large volume
or size of the data
– At present the data existing is in petabytes and is supposed to
increase to zetabytes in the near future
– For example big social networking sites are producing data in order of
terabytes everyday and this amount of data is difficult to handle using
traditional systems
Big Data’s Properties…
www.imarticus.org 23
What is Hadoop?
Hadoop is Transforming
Businesses Across Industries
1 in 4
Organizations use Hadoop to manage
their data today
(up from 1 out of 10 in 2012)
Python is an open source software that is Multi-purpose language that can be used
Cost of Ownership free to download. Versatility to build an entire application
Big data Python has become one of the big go-to languages for big data processing due to its wide selection of libraries
compatibility
Companies Already
Onboard Python
Google IBM
Yahoo National Weather
Quora Service
Nokia & Many More…
ABN
AMRO Bank
www.imarticus.org 26
What is Data Visualization?
Data visualization is the presentation of data in a pictorial or graphical format. For centuries, people have depended on
visual representations such as charts and maps to understand information more easily and quickly.
27
Why Tableau for Data Visualization?
Tableau is a powerful, flexible Data Visualization tool that is easy to learn, easy to use, and has powerful libraries for data
visualization and presentation.
Tableau has become one of the big go-to software programs for Data visualization due to the wide variety of tools it
Big data
provides and compatibility with Big Data platforms such as Hadoop.
compatibility
28
Why Tableau for Data Visualization?
www.imarticus.org 29
Profiling and monitoring tools
Technologies to handle big data- the layers
Yup! He sent the money
Blockchain
Overview and Fundamentals
Decentralization
Harder to attack
malicious entities cannot exploit system's users
– being not at one place
Hash Functions
Cryptographic Concepts
Digital Signing
Anatomy of a Block
Nonce
Hash of previous block
Nodes and Network
Full Nodes-store the entire blockchain and verify Nodes are the computers that make
up a blockchain network
everything, every single transaction