0% found this document useful (0 votes)

71 views115 pages

Unit 5 PIG&HIVE

Apache Pig and Hive are tools for analyzing large datasets on Hadoop. Pig provides a high-level language called Pig Latin for writing data analysis programs. Pig Latin scripts are compiled into MapReduce jobs. Hive provides a SQL-like interface called HiveQL and is used for data warehousing. It stores schema information in a metastore and processes queries into MapReduce jobs which are executed on Hadoop. Both Pig and Hive simplify Big Data analysis by allowing users to write scripts without coding MapReduce jobs directly in Java.

Uploaded by

Kishore Parimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views115 pages

Unit 5 PIG&HIVE

Uploaded by

Kishore Parimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 115

UNIT - 5

Applications on Big Data

Using
Pig & Hive
Hadoop Tools:
● The Hadoop ecosystem contains different subprojects (tools)
such as Sqoop, Pig, and Hive that are used to help Hadoop
modules.

➔ Sqoop: It is used to import and export data to and from

between HDFS and RDBMS.
➔ Pig: It is a procedural language platform used to develop a
script for MapReduce operations.
➔ Hive: It is a platform used to develop SQL type scripts to do
MapReduce operations.
What is Pig?

Apache Pig is an abstraction over MapReduce. It is a

tool/platform which is used to analyze large sets of data
representing them as data flows.
• Pig is generally used with Hadoop; we can perform all the
data manipulation operations in Hadoop using Apache Pig.
• To write data analysis programs, Pig provides a high level
language known as Pig Latin.
• This language provides various operators using which
programmers can develop their own functions for
reading, writing, and processing data.
Apache Pig
To analyze data using Apache Pig, programmers
need to write scripts using Pig Latin language.
● All these scripts are internally converted to Map
and Reduce tasks.
● Apache Pig has a component known as Pig Engine
that accepts the Pig Latin scripts as input and
converts those scripts into MapReduce jobs.
Pig Environment:

Pig is made of two Components:

● Pig Latin language : Used to express data flow

● Execution Environment : To run Pig Latin Programs

Pig Components:
Installing and Running Pig

● Download a stable release from http://pig.apache.org/releases.html, and

unpack the tarball in a suitable place on your workstation:
% tar xzf pig-x.y.z.tar.gz

● It’s convenient to add Pig’s binary directory to your command-line path.

● % export PIG_INSTALL=/home/tom/pig-x.y.z
● % export PATH=$PATH:$PIG_INSTALL/bin
● set the JAVA_HOME environment variable
● pig -help to get usage instructions.
Why do we need Apache Pig?

Using Pig Latin, programmers can perform MapReduce tasks easily

without having to type complex codes in Java.
• Apache Pig uses multi-query approach, thereby reducing the length of codes.
For example, an operation that would require you to type 200 lines of code
(LoC) in Java can be easily done by typing as less as just 10 LoC in Apache
Pig. Ultimately, Apache Pig reduces the development time by almost 16
times.
• Pig Latin is SQL-like language and it is easy to learn Apache Pig when you
are familiar with SQL.
• Apache Pig provides many built-in operators to support data operations
like joins, filters, ordering, etc. In addition, it also provides nested data
types like tuples, bags, and maps that are missing from MapReduce.
Features of Pig:
• Rich set of operators: It provides many operators to perform operations
like join, sort, filer, etc.
• Ease of programming: Pig Latin is similar to SQL and it is easy to write a
Pig script if you are good at SQL.
• Optimization opportunities: The tasks in Apache Pig optimize their
execution automatically.
• Extensibility: Using the existing operators, users can develop their
own functions to read, process, and write data.
• UDF’s: Pig provides the facility to create User-defined Functions in
other programming languages such as Java and invoke or embed them in
Pig Scripts.
• Handles all kinds of data: Apache Pig analyzes all kinds of data, both
structured as well as unstructured. It stores the results in HDFS.
Pig vs. MapReduce
Pig vs. SQL
Pig vs. Hive
Applications of Apache Pig:

• Processes large volume of data

• Performs data processing in search platforms
• Processes time-sensitive data loads
• Used by telecom companies to de-identify the user call data information
• Used by data scientists for performing tasks involving ad-hoc processing
and quick prototyping across large datasets.
• To process huge data sources such as web logs.
Applications of Apache Pig:

• For exploring large datasets, Pig Scripting is used.

• Provides the supports across large data-sets for Ad-hoc queries.
• In the prototyping of large data-sets & processing algorithms.
• For collecting and processing large amounts of datasets in the form of search logs and web
crawls.
• Used where the analytical insights are needed using the sampling.
• Cycle’s time-delicate data loads and
• Cycles a huge volume of data.
• Performs data handling in search stages.
• Supports fast prototyping and impromptu(done without being planned, organized, or
rehearsed) inquiries across huge datasets.
Apache Pig – History:

• In 2006, Apache Pig was developed as a research

project at Yahoo, especially to create and execute
MapReduce jobs on every dataset.
• In 2007, Apache Pig was open sourced via
Apache incubator.
• In 2008, the first release of Apache Pig came out.
• In 2010, Apache Pig graduated as an Apache
top-level project.
Pig Architecture :
Apache Pig – Components :
Parser: Initially the Pig Scripts are handled by the Parser.
It checks the syntax of the script, does type checking, and other miscellaneous checks.
The output of the parser will be a DAG (directed acyclic graph), which represents
the Pig Latin statements and logical operators.
Optimizer: The logical plan (DAG(directed acyclic graph)) is passed to the logical optimizer,
which carries out the logical optimizations such as projection and pushdown.
Compiler: The compiler compiles the optimized logical plan into a series of MapReduce
jobs.
Execution engine: Finally the MapReduce jobs are submitted to Hadoop in a sorted
order. Finally, these MapReduce jobs are executed on Hadoop producing the desired
results.
Apache Pig – Data Model :
Apache Pig – Elements:
• Atom
– Any single value in Pig Latin, irrespective of their data type is known as an Atom.
– It is stored as string and can be used as string and number. int, long, float, double,
chararray, and bytearray are the atomic values of Pig.
– A piece of data or a simple atomic value is known as a field.
– Example: ‘raja’ or ‘30’
• Tuple
– A record that is formed by an ordered set of fields is known as a tuple, the fields can be
of any type. A tuple is similar to a row in a table of RDBMS.
– Example: (Raja, 30)
Apache Pig – Elements:
• Bag
– A bag is an unordered set of tuples.
In other words, a collection of tuples (non-unique) is known as a bag.
Each tuple can have any number of fields (flexible schema).
A bag is represented by ‘{}’. It is similar to a table in RDBMS, but unlike a table in RDBMS,
it is not necessary that every tuple contain the same number of fields or that the fields in
the same position (column) have the same type.
– Example: {(Raja, 30), (Mohammad, 45)}
– A bag can be a field in a relation; in that context, it is known as inner bag.
– Example: {Raja, 30, {9848022338, raja@gmail.com,}}
Apache Pig – Elements:

• Relation

– A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no
guarantee that tuples are processed in any particular order).

• Map

– A map (or data map) is a set of key-value pairs. The key needs to be of type chararray
and should be unique. The value might be of any type. It is represented by ‘[]’ – Example:
[name#Raja, age#30]
Pig Operators
Pig Data Model
Pig Data Model
Applications on Big Data Using Hive
Introduction to Hive
What is hive?
• Hive is a data warehouse infrastructure tool to process structured
data in Hadoop.
• It resides on top of Hadoop to summarize Big Data, and makes
querying and analyzing easy.
• Initially Hive was developed by Facebook, later the Apache
Software Foundation took it up and developed it further as an
open source under the name Apache Hive.
• It is used by different companies. For example, Amazon uses it in
Amazon Elastic MapReduce.
Hive is not-
• A relational database
• A design for OnLine Transaction Processing (OLTP)
• A language for real-time queries and row-level
updates
Features of Hive:
• It stores schema in a database and processed
data into HDFS.
• It is designed for OLAP(Online Analytical Processing).
• It provides SQL type language for querying
called HiveQL or HQL.
• It is familiar, fast, scalable, and extensible.
Hive Architecture:
Hive Architecture:
• User Interface
❖ Hive is a data warehouse infrastructure software that can create
interaction between user and HDFS. The user interfaces that Hive supports
are Hive Web UI, Hive command line, and Hive HD Insight (In Windows
server).
• Meta Store
❖ Hive chooses respective database servers to store the schema or
Metadata of tables, databases, columns in a table, their data types, and
HDFS mapping.
● HiveQL Process Engine
❖ HiveQL is similar to SQL for querying on schema info on the Metastore.
❖ It is one of the replacements of traditional approach for MapReduce
program.
❖ Instead of writing MapReduce program in Java, we can write a query for
MapReduce job and process it.
• Execution Engine
❖ The conjunction part of HiveQL process Engine and MapReduce
is Hive Execution Engine.
❖ Execution engine processes the query and generates results as
same as MapReduce results.
❖ It uses the flavor of MapReduce.

• HDFS or HBASE
❖ The Hadoop distributed file system or HBASE are the data storage
techniques to store data into file system.
Working of Hive:
Execution of Hive:
• Execute Query
- The Hive interface such as Command Line or Web UI sends query to
Driver (any database driver such as JDBC, ODBC, etc.) to execute.
• Get Plan
- The driver takes the help of query compiler that parses the query to
check the syntax and query plan or the requirement of query.
• Get Metadata
- The compiler sends metadata request to Metastore (any database).
• Send Metadata
- Metastore sends metadata as a response to the compiler.
• Send Plan
- The compiler checks the requirement and resends the plan to
the driver. Up to here, the parsing and compiling of a query is
complete.
• Execute Plan
- The driver sends the execute plan to the execution engine.
• Execute Job
- Internally, the process of execution job is a MapReduce job.
- The execution engine sends the job to JobTracker, which is in
Name node and
- it assigns this job to TaskTracker, which is in Data node.
- Here, the query executes MapReduce job.
• Metadata Ops
- Meanwhile in execution, the execution engine can
execute metadata operations with Metastore.
• Fetch Result
- The execution engine receives the results from
Data nodes.
• Send Results
- The execution engine sends those resultant values
to the driver.
• Send Results
- The driver sends the results to Hive Interfaces.
Applications on Big Data using Hive
🠶 When to use Hive
• Most suitable for data warehouse applications where relatively static data is
analyzed.
• Fast response time is not required.
• Data is not changing rapidly.
• An abstraction to underlying MR program.
• Hive of course is a good choice for queries that lend themselves to being
expressed in SQL, particularly long-running queries, where fault tolerance is
desirable.
• Hive can be a good choice if you’d like to write feature-rich, fault-tolerant,
batch (i.e., not near-real-time) transformation or ETL jobs in a pluggable
SQL engine.
Applications Supported by Hive are:-
⮚ Log Processing
⮚ Text Mining
⮚ Document Indexing
⮚ Google Analytics
⮚ Predictive Modeling
⮚ Hypothesis Testing
Hive Services
The Hive shell is only one of several services that you can run using the hive
command.

You can specify the service to run using the --service option.

Type hive --service help to get a list of available service names;

the most useful are described below.
cli
The command line interface to Hive (the shell). This is the default service.

hiveserver
Runs Hive as a server exposing a Thrift service, enabling access from a range of clients
written in different languages.

Applications using the Thrift, JDBC, and ODBC connectors need to run a Hive server to
communicate with Hive.

Set the HIVE_PORT environment variable to specify the port, so the server will listen on
(defaults to 10,000).
hwi
The Hive Web Interface is an alternative to using the Hive command line
interface.

Using the web interface is a great way to get started with Hive.

The Hive Web Interface, abbreviated as HWI, is a simple graphical user

interface (GUI)
The Hive Web Interface (HWI)

As an alternative to the shell, you might want to try Hive’s simple web interface. Start
it using the following commands:
% export ANT_LIB=/path/to/ant/lib
% hive --service hwi
jar
The Hive equivalent to hadoop jar, a convenient way to run Java applications that includes both Hadoop and Hive classes
on the classpath.
metastore

• By default, the metastore is run in the same process as the Hive service.
• Using this service, it is possible to run the metastore as a standalone (remote) process.
• Set the METASTORE_PORT environment variable to specify the port the server will
listen on.
FileSystem :

Your Hive data is stored in HDFS, normally under /user/hive/warehouse.

The /user/hive and /user/hive/warehouse directories need to be created if they do not already exist.

Make sure this location (or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) exists and
is writable by the users whom you expect to be creating tables.
Attention:

● Cloudera recommends setting permissions on the Hive warehouse directory to 1777,

● making it accessible to all users, with the sticky bit set.
● This allows users to create and access their tables, but prevents them from deleting tables they do not own.

Jobclient :

org.apache.hadoop.mapred

Class JobClient

● java.lang.Object
● org.apache.hadoop.mapred.JobClient

All Implemented Interfaces:

AutoCloseable, Configurable, Tool
@InterfaceAudience.Public

@InterfaceStability.Stable

public class JobClient

extends CLI

implements AutoCloseable

JobClient is the primary interface for the user-job to interact with the cluster.
JobClient provides facilities to submit jobs, track their progress, access component-tasks'
reports/logs, get the Map-Reduce cluster status information etc.
The job submission process involves:

1. Checking the input and output specifications of the job.

2. Computing the InputSplits for the job.
3. Setup the requisite accounting information for the
DistributedCache of the job, if necessary.
4. Copying the job's jar and configuration to the map-reduce system
directory on the distributed file-system.
5. Submitting the job to the cluster and optionally monitoring it's status.

Normally the user creates the application, describes various facets of the
job via JobConf and then uses the JobClient to submit the job and
monitor its progress.
Here is an example on how to use JobClient:
// Create a new JobConf
JobConf job = new JobConf(new Configuration(), MyJob.class);
// Specify various job-specific parameters
job.setJobName("myjob");
job.setInputPath(new Path("in"));
job.setOutputPath(new Path("out"));
job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);
// Submit the job, then poll for progress until the job is complete
JobClient.runJob(job);
Job Control
At times clients would chain map-reduce jobs to accomplish complex tasks which cannot be done via a single map-reduce job.
This is fairly easy since the output of the job, typically, goes to distributed file-system and that can be used as the input for the next job.
However, this also means that the onus on ensuring jobs are complete (success/failure) lies squarely on the clients.
In such situations the various job-control options are:
1. runJob(JobConf) : submits the job and returns only after the job has completed.
2. submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling
decisions.
3. JobConf.setJobEndNotificationURI(String) : setup a notification on job-completion, thus avoiding polling.

Hive clients
If you run Hive as a server (hive --service hiveserver), then there are a number of different mechanisms for connecting to it from
applications. The relationship between Hive clients and Hive services is illustrated in Figure 12-1.

Thrift Client
The Hive Thrift Client makes it easy to run Hive commands from a wide range of programming languages.
Thrift bindings for Hive are available for C++, Java, PHP, Python, and Ruby.
They can be found in the src/service/src subdirectory in the Hive distribution.
JDBC Driver

Hive provides a Type 4 (pure Java) JDBC driver, defined in the class org.apache.hadoop.hive.jdbc.HiveDriver.
When configured with a JDBC URI of the form jdbc:hive://host:port/dbname, a Java application will connect to
a Hive server running in a separate process at the given host and port. (The driver makes calls to an interface
implemented by the Hive Thrift Client using the Java Thrift bindings.)

You may alternatively choose to connect to Hive via JDBC in embedded mode using the URI jdbc:hive://.
In this mode, Hive runs in the same JVM as the application invoking it, so there is no need to launch it as a
standalone server since it does not use the Thrift service or the Hive Thrift Client.

ODBC Driver
The Hive ODBC Driver allows applications that support the ODBC protocol to connect to Hive. (Like the JDBC driver, the ODBC
driver uses Thrift to communicate with the Hive server.)

The ODBC driver is still in development, so you should refer to the latest instructions on the Hive wiki for how to build and run it.

There are more details on using these clients on the Hive wiki at https://cwiki.apache .org/confluence/display/Hive/HiveClient.
The Metastore

The metastore is the central repository of Hive metadata.

The metastore is divided into two pieces:
a service and
the backing store for the data.

By default, the metastore service runs in the same JVM as the Hive service and contains an embedded Derby database instance backed by the local disk.
This is called the embedded metastore configuration.

Using an embedded metastore is a simple way to get started with Hive;

however, only one embedded Derby database can access the database files on disk at any one time,
which means you can only have one Hive session open at a time that shares the same metastore.

Trying to start a second session gives the error:

Failed to start database 'metastore_db'

when it attempts to open a connection to the metastore.

The solution to supporting multiple sessions (and therefore multiple users) is to use a standalone database.

This configuration is referred to as a local metastore, since the metastore service still runs in the same process as the Hive service,

but connects to a database running in a separate process, either on the same machine or on a remote machine.
Any JDBC-compliant database may be used by setting the javax.jdo.option.* configuration properties listed in Table 12-1.
MySQL is a popular choice for the standalone metastore.
In this case, javax.jdo.option.ConnectionURL is set to jdbc:mysql://host/dbname?createDatabaseIf
NotExist=true, and javax.jdo.option.ConnectionDriverName is set to
com.mysql.jdbc.Driver. (The user name and password should be set, too, of course.)

The JDBC driver JAR file for MySQL (Connector/J) must be on Hive’s classpath, which is simply
achieved by placing it in Hive’s lib directory.

Going a step further, there’s another metastore configuration called a remote metastore, where one
or more metastore servers run in separate processes to the Hive service.
This brings better manageability and security, since the database tier can be completely firewalled off,
and the clients no longer need the database credentials.

A Hive service is configured to use a remote metastore by setting hive.meta store.local to false,
and hive.metastore.uris to the metastore server URIs, separated by commas if there is more than one.
Metastore server URIs are of the form thrift:// host:port, where the port corresponds to the one set by
METASTORE_PORT when starting the metastore server.
HIVE (HQL) Commands
HIVE (HQL) Commands
Q1: How to enter the HIVE Shell?

Go to the Terminal and type hive, you will see the hive on the prompt.

[cloudera@quickstart Desktop]$ hive

Q2: Create a database

create database emp_details;

use emp_details;

Q3: How to create Managed Table in HIVE?

create table emp(empno int, ename string, job string, sal int, deptno int)
row format delimited fields terminated by ',';

Q4: How to load the data from LOCAL to HIVE TABLE

Suppose you created a comma separated file in local system named empdetails.txt

1,A,clerk,4000,10
2,A,clerk,4000,30
3,B,mgr,8000,20
4,C,peon,2000,40
5,D,clerk,4000,10
6,E,mgr,8000,50

hive> LOAD DATA LOCAL INPATH

'/home/cloudera/Desktop/empdetails.txt' OVERWRITE INTO TABLE emp;
# Note: If 'LOCAL' is omitted then it looks for the file in HDFS.
The keyword 'OVERWRITE' signifies that existing data in the table is deleted. If the 'OVERWRITE' keyword is omitted, data files are appended to existing data sets.
Q5: How to check where the managed table is created in hive db
[cloudera@quickstart Desktop]$ hadoop fs -ls /user/hive/warehouse/emp_details.db
Found 2 items
drwxrwxrwx - cloudera supergroup 0 2018-07-24 02:40 /user/hive/warehouse/emp_details.db/emp
drwxrwxrwx - cloudera supergroup 0 2018-07-24 02:28 /user/hive/warehouse/emp_details.db/emp1
Also check the contents inside emp:
[cloudera@quickstart Desktop]$ hadoop fs -ls
/user/hive/warehouse/emp_details.db/emp
Found 1 items
-rwxrwxrwx 1 cloudera supergroup 104 2018-07-24 02:40 /user/hive/warehouse/emp_details.db/emp/empdetails.txt
Now see the contents inside empdetails.txt
[cloudera@quickstart Desktop]$ hadoop fs -cat /user/hive/warehouse/emp_details.db/emp/empdetails.txt
1,A,clerk,4000,10
2,A,clerk,4000,30
3,B,mgr,8000,20
4,C,peon,2000,40
5,D,clerk,4000,10
6,E,mgr,8000,50
Q6:Check the schema of the created table emp?
describe emp;

For a detailed schema use:

describe extended emp;

3
Q7: How to see all the tables present in database
show tables;

Q8: Select all the enames from emp table

select ename from emp;

Q9: Get the records where name is 'A'

select * from emp where ename='A';

Q10: Count the total number of records in the created table

Count aggregate function is used count the total number of the records in a table.
select count(1) from emp;
OR
Select count(*) from emp;
Q11: Group the sum of salaries as per the deptno select deptno, sum(sal) from emp
group by deptno;

Q12: Get the salary of people between 1000 and 2000

select * from emp where sal between 1000 and 2000;

Q13: Select the name of employees where job has exactly 5 characters
hive> select ename from emp where job LIKE '_____';

Q14: List the employee names where job has l as the second character

hive> select ename from emp where job LIKE '_l%';

Q15: Retrieve the total salary for each department select deptno, sum(sal) from emp
group by deptno;

Q16: Add a column to the table

alter table emp add COLUMNS(lastname string);

Q17: How to Rename a table

alter table emp rename to emp1;

18: How to drop table

drop table emp;
Fundamentals of HBase
Fundamentals of ZooKeeper

Az 104 PDF New PDF
100% (1)
Az 104 PDF New PDF
911 pages
DDBoost Admin Guide 759-0017-0002 Rev A 2.5 PDF
No ratings yet
DDBoost Admin Guide 759-0017-0002 Rev A 2.5 PDF
114 pages
CPSC441: Data Link Layer
No ratings yet
CPSC441: Data Link Layer
68 pages
Thesis Report On Cloud Computing Security
100% (3)
Thesis Report On Cloud Computing Security
8 pages
ASKI Skills and Knowledge Institute, Inc.: Brgy. Sampaloc, Talavera 3114, Nueva Ecija, Philippines
100% (1)
ASKI Skills and Knowledge Institute, Inc.: Brgy. Sampaloc, Talavera 3114, Nueva Ecija, Philippines
3 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
POS-58-Series Programmer Manual
No ratings yet
POS-58-Series Programmer Manual
56 pages
Tesla K40 Active Board Spec BD 06949 001 - v03 PDF
No ratings yet
Tesla K40 Active Board Spec BD 06949 001 - v03 PDF
25 pages
Install and Manage Automatically A Kubernetes Cluster On VMware VSphere With Terraform and Kubespray
No ratings yet
Install and Manage Automatically A Kubernetes Cluster On VMware VSphere With Terraform and Kubespray
128 pages
Db2 For Linux, Unix, and Windows - Version 11+ Highlights
No ratings yet
Db2 For Linux, Unix, and Windows - Version 11+ Highlights
164 pages
FIN240 Questions 1
No ratings yet
FIN240 Questions 1
5 pages
Deployment Guidelines
No ratings yet
Deployment Guidelines
29 pages
CPU-Z Validator 3.1
No ratings yet
CPU-Z Validator 3.1
62 pages
1.2 Challenges of Conventional Systems
100% (1)
1.2 Challenges of Conventional Systems
19 pages
PPS - Computer Fundamentals Notes
No ratings yet
PPS - Computer Fundamentals Notes
29 pages
1.3 Evolution of Analytic Scalability
No ratings yet
1.3 Evolution of Analytic Scalability
15 pages
Cucm - Guía de Restauración
No ratings yet
Cucm - Guía de Restauración
18 pages
Python Socket - Network Programming Tutorial
No ratings yet
Python Socket - Network Programming Tutorial
18 pages
Writing An Hadoop MapReduce Program in Python
No ratings yet
Writing An Hadoop MapReduce Program in Python
21 pages
Cnsa Course Based Project
No ratings yet
Cnsa Course Based Project
10 pages
Fa Coaching Tool UserGuide
No ratings yet
Fa Coaching Tool UserGuide
16 pages
Research Paper: Jimmy Persson (pt98jpr) Gustav Evertsson (Pt99gev) Blekinge Institute of Technology, Sweden
No ratings yet
Research Paper: Jimmy Persson (pt98jpr) Gustav Evertsson (Pt99gev) Blekinge Institute of Technology, Sweden
10 pages
CANopen 30
No ratings yet
CANopen 30
14 pages
LLVM Framework Research and Applications
No ratings yet
LLVM Framework Research and Applications
6 pages
Bda Unit-6
No ratings yet
Bda Unit-6
7 pages
01-Ethernet Port Commands
No ratings yet
01-Ethernet Port Commands
33 pages
Change Notes FW V1200 TP3xxxx GEN3 EN
No ratings yet
Change Notes FW V1200 TP3xxxx GEN3 EN
7 pages
Baselines in P6
No ratings yet
Baselines in P6
3 pages
1st Year Chap-1 (1st Half
No ratings yet
1st Year Chap-1 (1st Half
1 page
Debug 1214
No ratings yet
Debug 1214
3 pages
TC-NC552S: 5MP Starlight Vandalproof Mini IR Dome Camera
No ratings yet
TC-NC552S: 5MP Starlight Vandalproof Mini IR Dome Camera
1 page
Microprocesso (Final)
No ratings yet
Microprocesso (Final)
9 pages
DMS Software: Set Language and Country Set The Serial Interface Set Company (Name/logo)
No ratings yet
DMS Software: Set Language and Country Set The Serial Interface Set Company (Name/logo)
2 pages
CyberArk Issues
No ratings yet
CyberArk Issues
3 pages
(Lehe4764-02) Emcp Monitoring Software
0% (1)
(Lehe4764-02) Emcp Monitoring Software
2 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6454)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2791)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2133)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4102)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4088)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 5 PIG&HIVE

Uploaded by

Unit 5 PIG&HIVE

Uploaded by

UNIT - 5

Applications on Big Data

➔ Sqoop: It is used to import and export data to and from

Apache Pig is an abstraction over MapReduce. It is a

Pig is made of two Components:

● Pig Latin language : Used to express data flow

● Execution Environment : To run Pig Latin Programs

● Download a stable release from http://pig.apache.org/releases.html, and

● It’s convenient to add Pig’s binary directory to your command-line path.

Using Pig Latin, programmers can perform MapReduce tasks easily

• Processes large volume of data

• For exploring large datasets, Pig Scripting is used.

• In 2006, Apache Pig was developed as a research

Type hive --service help to get a list of available service names;

The Hive Web Interface, abbreviated as HWI, is a simple graphical user

Your Hive data is stored in HDFS, normally under /user/hive/warehouse.

● Cloudera recommends setting permissions on the Hive warehouse directory to 1777,

All Implemented Interfaces:

public class JobClient

1. Checking the input and output specifications of the job.

The metastore is the central repository of Hive metadata.

Using an embedded metastore is a simple way to get started with Hive;

Trying to start a second session gives the error:

Failed to start database 'metastore_db'

when it attempts to open a connection to the metastore.

[cloudera@quickstart Desktop]$ hive

Q2: Create a database

create database emp_details;

Q3: How to create Managed Table in HIVE?

Q4: How to load the data from LOCAL to HIVE TABLE

hive> LOAD DATA LOCAL INPATH

For a detailed schema use:

Q8: Select all the enames from emp table

Q9: Get the records where name is 'A'

Q10: Count the total number of records in the created table

Q12: Get the salary of people between 1000 and 2000

select * from emp where sal between 1000 and 2000;

hive> select ename from emp where job LIKE '_l%';

Q16: Add a column to the table

Q17: How to Rename a table

18: How to drop table

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.