Thursday, March 14, 2013

BIG DATA HADOOP Testing with MapReduce Examples Part 1

BIG DATA is getting Bigger and Bigger
BIG DATA Getting Started with HADOOP
BIG DATA Cloudera and Oracle
BIG DATA CDH Single Node Setup
BIG DATA HADOOP Services Startup and Shutdown
BIG DATA Moving a file to HDFS
BIG DATA HADOOP Testing with MapReduce Examples Part 1
BIG DATA HADOOP Testing with MapReduce Examples Part 2
BIG DATA HADOOP Testing with MapReduce Examples Part 3


hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar - jar file for testing hadoop

wordcount example reads text files and counts how often words occur and here I am passing the name.txt which was copied to the HDFS


hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output
13/03/13 14:58:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/13 14:58:06 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "localhost:9001"
13/03/13 14:58:06 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
        at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
        at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
        at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1188)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1184)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapreduce.Job.connect(Job.java:1183)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1236)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
hadoop@bigdataserver1:~/hadoop>

Solution to resolve the above error was to source the HADOOP_MAPRED_HOME in the hadoop-env.sh file.


Ran again and it resulted in another error


hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/partition/InputSampler$Sampler
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
        at java.lang.Class.getMethod0(Class.java:2670)
        at java.lang.Class.getMethod(Class.java:1603)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.(ProgramDriver.java:60)
        at org.apache.hadoop.util.ProgramDriver.addClass(ProgramDriver.java:103)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:51)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.partition.InputSampler$Sampler
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        ... 12 more
hadoop@bigdataserver1:~/hadoop>

Solution is to source the mapreduce classpath in the hadoop-env.sh file.



# Extra Java CLASSPATH elements.  Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
  if [ "$HADOOP_CLASSPATH" ]; then
    export HADOOP_CLASSPATH=/home/hadoop/hadoop/share/hadoop/mapreduce/*:$HADOOP_CLASSPATH:$f
  else
    export HADOOP_CLASSPATH=$f
  fi
done


hadoop@bigdataserver1:~/hadoop> hadoop classpath
/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/*:/home/hadoop/hadoop/share/hadoop/common/*:/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop/share/hadoop/hdfs/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/mapreduce/share/hadoop/mapreduce/*
hadoop@bigdataserver1:~/hadoop> ls /home/hadoop/hadoop/share/hadoop/mapreduce/share/hadoop/mapreduce/*
/bin/ls: /home/hadoop/hadoop/share/hadoop/mapreduce/share/hadoop/mapreduce/*: No such file or directory
hadoop@bigdataserver1:~/hadoop> pwd
/home/hadoop/hadoop
hadoop@bigdataserver1:~/hadoop> echo $CLASSPATH

hadoop@bigdataserver1:~/hadoop> vi etc/hadoop/hadoop-env.sh
hadoop@bigdataserver1:~/hadoop> echo $HADOOP_HOME

hadoop@bigdataserver1:~/hadoop> export HADOOP_HOME=/home/hadoop/hadoop
hadoop@bigdataserver1:~/hadoop> $HADOOP_HOME/contrib/capacity-scheduler/*.jar
hadoop@bigdataserver1:~/hadoop> ls $HADOOP_HOME/contrib/capacity-scheduler/*.jar
/bin/ls: /home/hadoop/hadoop/contrib/capacity-scheduler/*.jar: No such file or directory
hadoop@bigdataserver1:~/hadoop> echo $HADOOP_CLASSPATH

hadoop@bigdataserver1:~/hadoop> ls /home/hadoop/hadoop/share/hadoop/mapreduce
hadoop-mapreduce-client-app-2.0.0-cdh4.2.0.jar     hadoop-mapreduce-client-jobclient-2.0.0-cdh4.2.0.jar        lib
hadoop-mapreduce-client-common-2.0.0-cdh4.2.0.jar  hadoop-mapreduce-client-jobclient-2.0.0-cdh4.2.0-tests.jar  lib-examples
hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar    hadoop-mapreduce-client-shuffle-2.0.0-cdh4.2.0.jar
hadoop-mapreduce-client-hs-2.0.0-cdh4.2.0.jar      hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar
hadoop@bigdataserver1:~/hadoop>


hadoop@bigdataserver1:~/hadoop> vi etc/hadoop/hadoop-env.sh
update class path

hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar WordCount /bigdata1/name.txt /bigdata1/output
Unknown program 'WordCount' chosen.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
hadoop@bigdataserver1:~/hadoop>



Looks positive that the mapreduce is working but with a wrong syntax.




Popular Posts