/tmp/hadoop-samim/dfs/name has been successfully formatted.
12/07/15 15:54:21 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Shamim-2.local/192.168.0.103
************************************************************/
Start all Hadoop components $ bin/hadoop-daemon.sh start namenode
hadoop-daemon.sh start jobtracker
hadoop-daemon.sh start datanode
h adoop-daemon.sh start tasktracker
hadoop-daemon.sh start secondarynamenode
starting namenode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-namenode-Shamim-2.local.out
starting jobtracker, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-jobtracker-Shamim-2.local.out
starting datanode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-datanode-Shamim-2.local.out
you can check all the log file to make sure that everything goes well.
Use the hadoop command-line tool to test the file system: $ hadoop dfs -ls /
hadoop dfs -mkdir /test_dir
echo "A few words to test" > /tmp/myfile
hadoop dfs -copyFromLocal /tmp/myfile /test_dir
hadoop dfs -cat /test_dir/myfile
A few words to test
And Hadoop is running! Remember the Linux tips:
» cd – means change to a directory
» Linux user forward rather than backslashes
» Unless you set your path, you will need to change (cd) to this directory:
/home/ec2-user/hadoop-0.20.2/bin
or wherever you copied HADOOP to. You will need to run the commands as follows:
./hadoop –dfs –ls
Let’s Use PIG
Pig is described by Apache foundation as:
Pig is a dataflow programming environment for processing very large files . Pig's language is called Pig Latin . A Pig Latin program consists of a directed
acyclic graph where each node represents an operation that transforms data.
Operations are of two flavors: (1) relational-algebra style operations such as
join, filter, project; (2) functional-programming style operators such as map,
reduce.
Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.
Change to Pig Directory and Run Sample Script From The Tutorial
This script queries The Excite search engine search log file. Please be aware this will take some time to run! This is checking for the frequency of search phrases and uses Hadoop.
The Query Phrase Popularity script (script1-local.pig or script1-hadoop.pig) processes a search query log file from the Excite search engine and finds search phrases that occur with particular high frequency during certain times of the day.
The output file will report the following and perform basic functions and statistics:
hour , ngram , score , count , mean
Run the following command:
cd /home/ec2-user/pig-0.10.1/tutorial/pigtmp
And this command:
pig ../scripts/script1-hadoop.pig
You will see a lot of processing information dumped to the screen like:
2013-02-09 00:07:11,446 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-02-09 00:07:11,454 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-02-09 00:07:11,454 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-02-09 00:07:11,456 [Thread-5] INFO
2013-02-09 00:07:16,614 [Thread-14] INFO org.apache.hadoop.mapred.MapTask - kvstart = 0; kvend = 262144; length = 327680
2013-02-09 00:07:20,808 [communication thread] INFO