0% found this document useful (0 votes)
205 views22 pages

Pig Tutorial PDF

This document provides instructions for downloading and configuring Pig on Hadoop. It outlines steps to download Pig, extract the files, configure configuration files, start Hadoop and Pig, run Pig queries on sample data files, and compare Pig to HBase and Hive.

Uploaded by

vishnu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views22 pages

Pig Tutorial PDF

This document provides instructions for downloading and configuring Pig on Hadoop. It outlines steps to download Pig, extract the files, configure configuration files, start Hadoop and Pig, run Pig queries on sample data files, and compare Pig to HBase and Hive.

Uploaded by

vishnu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Pig

For online Hadoop training, send mail to neeraj.ymca.2k6@gmail.com


Agenda
Download Pig tar.gz file
Extract the content of Pig tar.gz
Configure pig-env.sh file
Configure pig.properties file
Start your Hadoop
Start Pig shell
Input file for Pig query
Access HDFS from Pig shell
Execute Pig commands
Store Pig query's output into HDFS
Check the output
Comparison of HBase/Hive/Pig
Download Pig from Apache website
www.apache.org/dyn/closer.cgi/pig
Select a stable version of Pig
Click on pig-0.11.0-tar.gz
Save pig-0.11.0-tar.gz file
Untar pig-0.11.0-tar.gz file

5
Configure pig-env.sh file

Create pig-env.sh file in PIG_HOME/conf

Add the following entries in PIG_HOME/conf/pig-env.sh file

export JAVA_HOME=/usr
export PIG_HOME=/home/neeraj/local_cluster_home/pig-0.11.0
export HADOOP_HOME=/home/neeraj/local_cluster_home/hadoop-1.0.3
export PIG_CLASSPATH=$HADOOP_HOME/conf/
Configure pig.properties file

Add the following entries in PIG_HOME/conf/pig.properties file

fs.default.name=hdfs://localhost:9000
mapred.job.tracker=localhost:9001

Copy core-site.xml, hdfs-site.xml & mapred-site.xml file from


HADOOP_HOME/conf to PIG_HOME/conf
Start your Hadoop
Check Hadoop processes
&
Safemode
Make sure that safe mode is off before you start Pig
Start Pig shell
Input file for Pig
Access HDFS from Pig shell
Execute Pig query
records = LOAD '/pig_input_files/temprature.txt' AS (year:chararray, temperature:int);

filtered_records = FILTER records BY temperature != 9999;

grouped_records = GROUP filtered_records BY year;

max_temp = FOREACH grouped_records GENERATE group,MAX(filtered_records.temperature);

DUMP max_temp;
Execute Pig query
records = LOAD '/pig_input_files/temprature.txt' AS (year:chararray, temperature:int);

filtered_records = FILTER records BY temperature != 9999;

grouped_records = GROUP filtered_records BY year;

max_temp = FOREACH grouped_records GENERATE group,MAX(filtered_records.temperature);

STORE max_temp INTO '/pig_output_files';


Pig job details
Output of Pig query
Exit from Pig shell
HBase/Hive/Pig
HBase/Hive/Pig suitability

HBase is suitable when...


When you need to handle unstructured data
When you need to edit the data
When you need versioned data

Hive is suitable when...


When you need to handle structured data
When you don't need to edit the data
When you comfortable in SQL syntax
Pig is suitable when...
When you need to handle structured data
When you don't need to edit the data
When you are comfortable in scripting
…Thanks…

For online Hadoop training, send mail to neeraj.ymca.2k6@gmail.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy