Eco System Notes
Eco System Notes
Eco System Notes
You need to ready the single node machine till the JPS command. On CLI ( Terminal ) we can see
that namenode , datanode , job tracker & task tracker is running with their ID’s.
( This command make new file called “WordCount.Java & wright below command)
//package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();public void map(LongWritable key, Text value,
OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);conf.setMapperClass(Map.class);
//conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
(After this command you need to press CTRL + C to go back to our machine IP command line)
(We export the CLASSPATH to our hadoop jar file. Java Import which is defined at begining will
take from .jar file)
( Javac is Java Compalier. -d is use to debunging. In Wordcount_classes folder 3 classes will store)
( We have three file in Map class , reduce class & WordCount.Class ( Driver class)
Step6. Cd <Enter>
( tar is tape drive archieval and jar is java archieval. C is create , v is for verbously ( show us what
you doind , f is for forcefully ( don’t show me unnessary error) , give the name “wordcount.jar. -C
is for you will find the classes in wordcount_classes. It will create & store in parent direct as per / . )
Step8. ls <Enter>
( To exit from machine IP command & come to home folder so that we can send our java to cluster.
Step 11. Now connect again to your machine & type ls <Enter>
( We run the jar file. In this jar we have WordCount & we will process of .txt data & it will save in
result)
Step 17. Goto terminal & type this command hadoop fs -lsr /user/ubuntu/result
( We this command we have the same information which is on GUI get same information on terminal with new
folder results )
( We sort the data in numerical order & paste in to file called result )
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
Step 5. cd $HIVE_HOME/conf
( Configuration folder of hive, after enter if command then it will configure successfully )
( As we dont have any data set so we are download event log file from this command)
( We delete the result & results file from hadoop with this command)
Step 9. create table serverdata (time STRING, ip STRING, country STRING, status STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION '/user/ubuntu/';
step 13. select * from serverdata where country = "IN" AND status = "ERROR";
Step 14. select * from serverdata where country = "FR" AND status = "SUCCESS";
( This command need processing hence our Map & reduce process will run & we will get the result)
( This command need processing hence our Map & reduce process will run & we will get the result)
( This command need processing hence our Map & reduce process will run & we will get the result)
Step 17. create table doc(text string) row format delimited fields terminated by '\n' stored as textfile;
Step 18. load data inpath '/user/ubuntu/serverlog.log' overwrite into table doc;
( To load data )
Step 19. SELECT word, COUNT(*) FROM doc LATERAL VIEW explode(split(text, ' ')) lTable as
word GROUP BY word;
Apache Pig Installation
Note : Consumer key & access token used may not work. Change it with your
personal consumer key & access token.