Important Da
Important Da
UNIT-1
UNIT-2
What is HDFS Limitation? What do you understand about the HDFS Federation?
What are the three important classes of MapReduce?
What is the use of hive in the Hadoop ecosystem?
What are the two MapReduce Daemons?
List the Daemons that are part of YARN Architecture.
How many blocks will be created for a file that is 300 MB? The default block size is 64 MB and
the replication factor is 3.
What is an active and passive NameNode?
UNIT-3
UNIT-4
UNIT-5
PART-B
UNIT-1
Discuss the following in detail
Define Big Data. Explain the Evolution of Big Data and their characteristics.
Define data, web data, Big Data. Also explain the structured, semi structured and unstructured
data.
Analyse how the unstructured data is getting processed? Explain the sources of unstructured
data? What are the challenges in handling with Big data?
i) In-memory Analytics
ii) In -Database Processing
iii) Shared nothing architecture.
What are the key questions to be answered by all organizations stepping into analytics? Justify
with example.
Explain about the process involved in data mining. Also explain the algorithms used in data
mining process.
UNIT-2
What are the goals of Hadoop framework? Discuss and Illustrate the ecosystem of Hadoop?
ii) What are the alternate solutions to MapReduce in Hadoop 2.0? Analyse
Write in detail about the steps involved in MapReduce to achieve the high throughput?
Explain the significances of Hadoop distributed file systems and its application.
UNIT-3
Analyse the usage of numerical and Categorical data. Also emphasize your answer with
relevant examples.
Analyse the way to handle missing data. Give example to support your views.
UNIT-4
Explain the Alon-Matias-Szegedy Algorithm for Second Moments and Higher order moments.
UNIT-5
Explain how data mining techniques are used in Sales and Marketing.
Describe the data mining techniques used in finance and manufacturing sectors.
Create a case study to evaluate the data mining and data analytics for a healthcare industry.
Analyse how innovative insurance organizations extract value from uncertain data.
PART-C
UNIT-1
You are the university library. You see a few students browsing through the library catalogue
on a Kiosk. You observe the librarians busy at work issuing and returning books. You see a
few students fill up the feedback form on the services offered by the library. Quite a few
students are learning using the e-learning content. Think on the different types of data that are
being generated in this scenario. Support your answer with logic.
UNIT-2
Create a MapReduce program to count the occurrences of similar words across 50 files.
Consider a collection of literature survey made by a researcher in the form of a text document
with respect to cloud and big data analytics. Using Hadoop and MapReduce, develop an
application to count the occurrence of pre-dominant words.
UNIT-3
Here are the counts (in thousands) of earned degrees in the U.S. for a recent year, classified by
degree type and sex of degree recipient.
Problems:
i) If you choose a degree recipient at random, what is the probability you pick a woman?
ii) If you choose a male degree recipient at random, what is the probability that you pick
someone who earned a professional degree?
iii) If you pick a degree recipient at random, what is the probability you pick a woman with a
doctorate?
IV) If you pick a Bachelor's degree recipient at random, what is the probability you pick a
man?
2. How a bank turned challenges into opportunities to serve its customers using NoSQL
Database. Demonstrate with architectural and database design.
"address": {
"building": "1007",
"zipcode": "10462"
},
"borough": "Bronx",
"cuisine": "Bakery",
"grades": [
],
i) Write a MongoDB query to display all the documents in the collection restaurants.
ii) Write a MongoDB query to display the fields restaurant_id, name, borough and
cuisine for all the documents in the collection restaurant.
iii) Write a MongoDB query to display the fields restaurant_id, name, borough and
cuisine, but exclude the field _id for all the documents in the collection restaurant.
iv) Write a MongoDB query to find the restaurants that achieved a score, more than 80
but less than 100.
v) Write a MongoDB query to find the restaurant Id, name, borough and cuisine for
those restaurants which contain 'Wil' as first three letters for its name.
vi) Write a MongoDB query to find the restaurant Id, name, borough and cuisine for
those restaurants which contain 'Reg' as three letters somewhere in its name.
4) Create a MongoDB instance and create the table with following fields. ModelNo, Brand,
Color, Price, Size(Height, Width)
ii) Update Brand name from Adidas to Puma in mongodb for the very first matching record.
iii) insert one more brand info of your choice.
iv) Print all shoes which are available in blue color and width 2cm.
v) Print all shoes which are available either in blue or Neon Color using $in expression.
v) Delete all records for Adidas brand from this collection.
vi) Update height for Nike Shoes to 12cm.
vii)Drop shoes collection.