Introduction To Hadoop
Introduction To Hadoop
Todays Topics
Functional programming
MapReduce
+ow is it di,erent)
Traditional notions of -data. and -instructions. are
not applicable
(+ 1 2) 3
(* 3 4) 12
(sqrt (+ (* 3 3) (* 4 4))) 5
(define x 3) x
(* x 5) 15
'(1 2 3 4 5)
'((a 1) (b 2) (c 3))
Functions
(define (foo x y)
(sqrt (+ (* x x) (* y y))))
(define foo
(lambda (x y)
(sqrt (+ (* x x) (* y y)))))
(foo 3 4) 5
5t!er Features
3n 4c!eme: e%eryt!ing is an s;
e0pression
+ig!er;order functions
(define (factorial n)
(if (= n 1)
1
(* n (factorial ( n 1)))))
(factorial 6) !2"
(define (factorialiter n)
(define (a#x n to$ $rod#ct)
(if (= n to$)
(* n $rod#ct)
(a#x (+ n 1) to$ (* n
$rod#ct))))
(a#x 1 n 1))
(factorialiter 6) !2"
2isp MapReduce)
Fold e0amples7
4um of s9uares7
(ma$ (lambda (x) (* x x))
'(1 2 3 4 5))
'(1 4 % 16 25)
(fold + " '(1 2 3 4 5)) 15
(fold * 1 '(1 2 3 4 5)) 12"
(define (s#mofsq#ares &)
(fold + " (ma$ (lambda (x) (* x x)) &)))
(s#mofsq#ares '(1 2 3 4 5)) 55
2isp MapReduce
5bser%ations7
3mplementations7
+andles sc!eduling
+andles sync!roni?ation
+andles faults
4ew Problem
(!y)
Not enoug! R8M to !old all t!e data in memory
Dis access is slow: dis t!roug!put is good
No data cac!ing
2ittle beneft due to large data sets: streaming reads
(e now t!is is a7
4calability bottlenec
AF4 solutions7
4!adow masters
Minimi?e master in%ol%ement
Ne%er mo%e data t!roug! it: use only for metadata
$and cac!e metadata at clients'
2arge c!un si?e
Master delegates aut!ority to primary replicas in data
mutations $c!un leases'
Metadata storage
Namespace management=locing
Aarbage *ollection
Mutations
2ease mec!anism7
Paralleli?ation Problems