0% found this document useful (0 votes)
8 views11 pages

Big Data Case Study

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

Big Data Architecture and Ecosystem

Case Study
402

HDFS
1. Create a folder on hdfs named as case_study.

2. Upload the file to the hdfs folder created earlier.


3. List all the files uploaded in the hdfs folder.

4. To Copy files from source to destination.


hadoop fs -cp /source_path /destination_path

5. To move file from local to HDFS.


hadoop fs -moveFromLocal /localpath /hdfsdestination

6. To read the content of a file.


hadoop fs -cat /user/cloduera/case_study/sample.txt

7. To Displays free space.


8. To returns the checksum information of a file.

9. To Create a file of zero length.

10. To Append single src, or multiple srcs from local file system to
the destination file system.

11. To Count the number of directories, files and bytes under the
paths that match the specified file pattern.

12. To display the extended attribute names and values (if any) for a
file or directory.

13. To Concatenate existing source files into the target file.


hadoop fs -concat /case_study/target_file.txt /case_study/src_file1.txt
/case_study/src_file2.txt

14. To Displays sizes of files and directories contained in the given


directory or the length of a file in case it’s just a file.

15. To take a source directory and a destination file as input and


concatenates files in src into the destination local file.
hadoop fs -getmerge /user/cloudera/case_study /local/destination/concatenated_file.txt
HIVE
16. create a database named as exam_db.

17. create 2 tables in the same database.


18. load sample data into the relevant tables.

19. display the data loaded into the tables.


20. create a table with a partition and insert some random data.

21. Create a table with 5 buckets and insert some random data.

22. Write a query to update one of the column’s values.


Query - update students set age = 20 where student_id = 2;

23. Create a table same as table created earlier.

24. Write a query to delete a record.

delete from students where student_id = 3;

25. Write queries to explain joins in hive.


JOIN -

FULL OUTER JOIN -


PIG

26. Create a random dataset using rollno, name, gpa and year.

Created using text file editor in Cloudera.

27. To load the dataset in pig shell.

28. Pig script to display the structure and some random data loaded.
illustrate students;

29. Pig script to display resultset of name, gpa and year.

dump q29;
30. Pig script to group data by year.

dump group_by_year;

31. Pig script to group data by gpa.

dump group_by_gpa;

32. Pig script to display count of records year-wise.

dump count_by_year;
33. Pig script to display sum and average of gpa.

dump sum_avg_gpa;

34. Pig script to write the results to the file.

STORE students INTO '/home/cloudera/Desktop/cspigresults’ USING PigStorage(',');


35. Pig script to display all records.

dump students;

Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy