Free Big Data on a Shoestring by Nicholas Bessmer Page B

Book: Big Data on a Shoestring by Nicholas Bessmer Read Free Book Online

/tmp/hadoop-samim/dfs/name has been successfully formatted.
    12/07/15 15:54:21 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at Shamim-2.local/192.168.0.103
    ************************************************************/

Start all Hadoop components $ bin/hadoop-daemon.sh start namenode

    hadoop-daemon.sh start jobtracker
    hadoop-daemon.sh start datanode
    h adoop-daemon.sh start tasktracker
    hadoop-daemon.sh start secondarynamenode

    starting namenode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-namenode-Shamim-2.local.out
    starting jobtracker, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-jobtracker-Shamim-2.local.out
    starting datanode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-datanode-Shamim-2.local.out
    you can check all the log file to make sure that everything goes well.

Use the hadoop command-line tool to test the file system: $ hadoop dfs -ls /

    hadoop dfs -mkdir /test_dir
    echo "A few words to test" > /tmp/myfile
    hadoop dfs -copyFromLocal /tmp/myfile /test_dir
    hadoop dfs -cat /test_dir/myfile
    A few words to test

    And Hadoop is running! Remember the Linux tips:

    »         cd – means change to a directory
    »         Linux user forward rather than backslashes
    »         Unless you set your path, you will need to change (cd) to this directory:

    /home/ec2-user/hadoop-0.20.2/bin
    or wherever you copied HADOOP to. You will need to run the commands as follows:

    ./hadoop –dfs –ls


Let’s Use PIG

    Pig is described by Apache foundation as:

    Pig is a dataflow programming environment for processing very large files . Pig's language is called Pig Latin . A Pig Latin program consists of a directed
acyclic graph where each node represents an operation that transforms data.
Operations are of two flavors: (1) relational-algebra style operations such as
join, filter, project; (2) functional-programming style operators such as map,
reduce.

Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.
Change to Pig Directory and Run Sample Script From The Tutorial

    This script queries The Excite search engine search log file. Please be aware this will take some time to run! This is checking for the frequency of search phrases and uses Hadoop.

    The Query Phrase Popularity script (script1-local.pig or script1-hadoop.pig) processes a search query log file from the Excite search engine and finds search phrases that occur with particular high frequency during certain times of the day.
    The output file will report the following and perform basic functions and statistics:

    hour , ngram , score , count , mean

    Run the following command:

    cd /home/ec2-user/pig-0.10.1/tutorial/pigtmp
    And this command:

    pig ../scripts/script1-hadoop.pig

    You will see a lot of processing information dumped to the screen like:

    2013-02-09 00:07:11,446 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-02-09 00:07:11,454 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-02-09 00:07:11,454 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-02-09 00:07:11,456 [Thread-5] INFO
2013-02-09 00:07:16,614 [Thread-14] INFO org.apache.hadoop.mapred.MapTask - kvstart = 0; kvend = 262144; length = 327680
2013-02-09 00:07:20,808 [communication thread] INFO

Free Big Data on a Shoestring by Nicholas Bessmer Page B

Similar Books

Halon-Seven

Love, Lucas

7 More MILF Stories

Geek Tragedy

Prank List

May Day

Marchese's Forgotten Bride

Rise

The Night at the Crossroads

Some Day Somebody