bdc notes
bdc notes
DATE
PAGE
Structured
TYPES seMT TTUttrtd
Lunsturturet
pRAWBACKS
2. SE MI STRUCrURAED DATA
Semi sru ttyred dala not bournd by any
riqtd schema for data storage
There ajve' soime
sonMe fe qturey like key
paiywhich is uwed t0 hep ht
AfernHattng en+fti ro tath othev
’
his tyr *ternal
of Info riÝn atton
typicalty (omes
St socda
Me dia ptattorms btt, other Sott'web
based data 4eeds
Audio Emaíl
semi
Json
dae sevializaHen othey markup
key vatue anguaqes
No sQL
Vidyalekharn
DATE
PAGE
UNSTRVCTURED DATA
that doesn't
I4 is kind of data havin
set of mUle.
oefined schema
IH manqment is
tx-texts photoS,lg ile!
A-udio
(VI4eos
nstruetyred
Text
Messq es
HISTORy of HADOOp
NeNseen
Hadoop qpen
by Aprhe J/w foun dation whjoh tS witten in
JAVA proceASing of hugt dotajc
with he H/N.
tondvdel
ATter a lot reseqreh they Nutth
cost aou nd
that uLh a systm will tttttt (ost
winh
half a mill lon dolloVs. In hlwiand along
MontH running (ost ot $3000 0 approx.
which S Yoryr txpunsive.
Papm that
acyoSS
In 2003, they car
tame
desribed he
. th i's ¥Ile d GFSi(G0oq
uted fle syste
half soluton
f)e syjiem) whlch found
thu oblen
publised ne more papr 0n
In 200+; tyooq1e
tchnique map Rtouce. No w
Map Reduce
papt
Wy qnethe halt so|n a the problt m
(UHing qnd Mite cafaYui Rtduce) tn
Douq cutHiny
ttehnyul telqES and Map
ther Nukh projett
cutHng found that Nuteh is 1imitd
In 200s, node dus ters b{cauJ
20 t 0 40
to
he profec there vere hwo enginog
which ae wotking the prolect
(utng jolned Yahoo wih
Ln 2006, (uttin oin
Yenamedt
Nutth Progret and he
Hadoop
Wy
The name Hadoop
yello w tlephant toy 's
GOPI
Vidyalekhan
DATE
PAGE
In hs, shuhured
dta rs meJHy ctured dote yoroce sed
proce JS
4) e s sca ab|? It
highly Scql a ble.
than Hadoop
ACID pYorrth
follow ACID doesnot folow the
Deuy wlth uhat Dells orth Deals wiHh wht Houw can
huppenkdtn He why did f we make
happined tn the futuYc
the past
GOPI
Vidyalekhan
DATE
PAGE
Predlctlve AnalyHes
USes datd to detemine the probab|e ou(ome of
PreserlptNe AnalyHes
Synthesizes d ata y Mathe mattca sence
bu'ness rute and
big qthine learnng FO
Make preorctHon and then suggest
of predieHen
to
cpton
EX
Heal thcare
3trateq ic planning
4:
Diaqnestte Anaytes
ne qene ralls ust historieal ddata over other data
wütortcal
to answr any quetisn or for the so|^ of
any prOblem.
Datel oTgranrsqHon
Re tafl
Healtheare
finance (6) Tranpartat'o
t proce-s big dattog store
Vidyalekhari
DATE
PAGE
rient Bloct o
Read| D4}4 odes
Data nodes
Kepticottt
wYite
Rar
ftient
keptictlon no ot copies ho nn qny tols GOPI
fa (tor de
Vidyalekhansa
DATE
PAGE
gecondoryName node
helper to the Namnode
saves the metada tq in case 0f fallure .
Status
1eplicaof meta &toragl
6oks heattbeat dqty node
awe
pATA STORAYE AND REPLI(ATION
blo ck
files ure toTed in a sequence
UYe of sqme size extept lat block.
AllB10C¢S
Te po vt blocks
port then the Mep licq +f0 n
to Te
i1 the datq node fails
iacr changes.
-|28
Hadoo
Vidyalekhan
DATE
PAGE
soltch
24Y4
napme Dqta DataNode
Nede
Job
Tiackeu Doa Nod
4DVANTAGE
O Fault Tolerence
eqn. make copies a t e blocks tor lback-up
purpose.
(clyter Datanodes)
RACK AWARENESS
treduce nw afic)
phyitel
hadoop lyter.
hadoop hutu
naks
82
B1 r B
2 B3 6 B, B
B2
3 . B3
B3
4
R2
R3
R1
fye Maste COP
blacks name Vidyalekhan
batanodes slaC DATE
PAGE
Digtrlbwted
2get btoc
Namenoce
Ile syotem
HDFS
Irens 3read FS
6close.
hput sam
eient vm 4read 6Tead.
L1'e node Datd Ddtd Data
Node Node Node
HOFS W&eks on the SI Team In4 datq aceegt ptt an
Means it s p eks wtR once and d many
Peatuyes
Tregte
HDES
cllent File syst enm Name No de.
2 Wslte.
pa ta queue
8tehm AUK queue:
(ltent J VM
c)fen node
4.wIlte pacte
rs qcknowledgoment
tlpeine
2
NOTE TMP VE
what happens if datansde Raily whie wrlHng
ke in tlDF S
4:ttens > he pipehne ges closed., packefs n the
qutue U then added, to ront 9 the dala
farttaty
walttent
que e makmg Aatamo dej do0m stream
hon the faed node to not miss any packet