syllabus
syllabus
TB1: Ch 6: 6.1-6.5
MODULE-4
Introduction to Hive: What is Hive, Hive Architecture, Hive data types, Hive file formats, Hive Query
Language (HQL), RC File implementation, User Defined Function (UDF).
Introduction to Pig: What is Pig, Anatomy of Pig, Pig on Hadoop, Pig Philosophy, Use case for Pig, Pig Latin
Overview, Data types in Pig, Running Pig, Execution Modes of Pig, HDFS Commands, Relational Operators,
Eval Function, Complex Data Types, Piggy Bank, User Defined Function, Pig Vs Hive.
1
Text, Web Content and Link Analytics: Introduction, Text Mining, Web Mining, Web Content and Web
Usage Analytics, Page Rank, Structure of Web and Analyzing a Web Graph.
TB2: Ch5: 5.2,5.3, Ch 9: 9.1-9.4
1. Identify and list various Big Data concepts, tools and applications.
2. Develop programs using HADOOP framework.
3. Make use of Hadoop Cluster to deploy Map Reduce jobs, PIG, HIVE and Spark programs.
4. Analyze the given data set and identify deep insights from the data set.
5. Demonstrate Text, Web Content and Link Analytics.
CIE for the theory component of the IPCC (maximum marks 50)
● IPCC means practical portion integrated with the theory of the course.
● CIE marks for the theory component are 25 marks and that for the practical component is 25
marks.
● 25 marks for the theory component are split into 15 marks for two Internal Assessment Tests (Two
Tests, each of 15 Marks with 01-hour duration, are to be conducted) and 10 marks for other
2
assessment methods mentioned in 22OB4.2. The first test at the end of 40-50% coverage of the
syllabus and the second test after covering 85-90% of the syllabus.
● Scaled-down marks of the sum of two tests and other assessment methods will be CIE marks for the
theory component of IPCC (that is for 25 marks).
● The student has to secure 40% of 25 marks to qualify in the CIE of the theory component of IPCC.
CIE for the practical component of the IPCC
● 15 marks for the conduction of the experiment and preparation of laboratory record, and 10 marks
for the test to be conducted after the completion of all the laboratory sessions.
● On completion of every experiment/program in the laboratory, the students shall be evaluated
including viva-voce and marks shall be awarded on the same day.
● The CIE marks awarded in the case of the Practical component shall be based on the continuous
evaluation of the laboratory report. Each experiment report can be evaluated for 10 marks. Marks of
all experiments’ write-ups are added and scaled down to 15 marks.
● The laboratory test (duration 02/03 hours) after completion of all the experiments shall be
conducted for 50 marks and scaled down to 10 marks.
● Scaled-down marks of write-up evaluations and tests added will be CIE marks for the laboratory
component of IPCC for 25 marks.
● The student has to secure 40% of 25 marks to qualify in the CIE of the practical component of the IPCC.
SEE for IPCC
Theory SEE will be conducted by University as per the scheduled timetable, with common question
papers for the course (duration 03 hours)
1. The question paper will have ten questions. Each question is set for 20 marks.
2. There will be 2 questions from each module. Each of the two questions under a module (with a
maximum of 3 sub-questions), should have a mix of topics under that module.
3. The students have to answer 5 full questions, selecting one full question from each module.
4. Marks scored by the student shall be proportionally scaled down to 50 Marks
The theory portion of the IPCC shall be for both CIE and SEE, whereas the practical portion will have
a CIE component only. Questions mentioned in the SEE paper may include questions from the
practical component.
Suggested Learning Resources:
Books:
1. Seema Acharya and Subhashini Chellappan “Big data and Analytics” Wiley India Publishers, 2nd Edition,
2019.
2. Rajkamal and Preeti Saxena, “Big Data Analytics, Introduction to Hadoop, Spark and Machine Learning”,
McGraw Hill Publication, 2019.
Reference Books:
1. Adam Shook and Donald Mine, “MapReduce Design Patterns: Building Effective Algorithms and Analytics for
Hadoop and Other Systems” - O'Reilly 2012
2. Tom White, “Hadoop: The Definitive Guide” 4th Edition, O’reilly Media, 2015.
3. Thomas Erl, Wajid Khattak, and Paul Buhler, Big Data Fundamentals: Concepts, Drivers & Techniques,
Pearson India Education Service Pvt. Ltd., 1st Edition, 2016
4. John D. Kelleher, Brian Mac Namee, Aoife D'Arcy -Fundamentals of Machine Learning for Predictive Data
Analytics: Algorithms, Worked Examples, MIT Press 2020, 2nd Edition
3
Web links and Video Lectures (e-Resources):
● https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset
● https://www.youtube.com/watch?v=bAyrObl7TYE&list=PLEiEAq2VkUUJqp1k-g5W1mo37urJQOdCZ
● https://www.youtube.com/watch?v=VmO0QgPCbZY&list=PLEiEAq2VkUUJqp1kg5W1mo37urJQOdCZ&in
dex=4
● https://www.youtube.com/watch?v=GG-VRm6XnNk https://www.youtube.com/watch?v=JglO2Nv_92A