hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar - jar file for testing hadoop
wordcount example reads text files and counts how often words occur and here I am passing the name.txt which was copied to the HDFS
hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output
13/03/13 14:58:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/13 14:58:06 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "localhost:9001"
13/03/13 14:58:06 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1188)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1184)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1183)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1236)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
hadoop@bigdataserver1:~/hadoop>
Solution to resolve the above error was to source the HADOOP_MAPRED_HOME in the hadoop-env.sh file.
Ran again and it resulted in another error
hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/partition/InputSampler$Sampler
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
at java.lang.Class.getMethod0(Class.java:2670)
at java.lang.Class.getMethod(Class.java:1603)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.(ProgramDriver.java:60)
at org.apache.hadoop.util.ProgramDriver.addClass(ProgramDriver.java:103)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.partition.InputSampler$Sampler
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 12 more
hadoop@bigdataserver1:~/hadoop>
Solution is to source the mapreduce classpath in the hadoop-env.sh file.
# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=/home/hadoop/hadoop/share/hadoop/mapreduce/*:$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
wordcount example reads text files and counts how often words occur and here I am passing the name.txt which was copied to the HDFS
hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output
13/03/13 14:58:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/13 14:58:06 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "localhost:9001"
13/03/13 14:58:06 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
at org.apache.hadoop.mapreduce.Cluster.
at org.apache.hadoop.mapreduce.Cluster.
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1188)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1184)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1183)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1236)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
hadoop@bigdataserver1:~/hadoop>
Solution to resolve the above error was to source the HADOOP_MAPRED_HOME in the hadoop-env.sh file.
Ran again and it resulted in another error
hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/partition/InputSampler$Sampler
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
at java.lang.Class.getMethod0(Class.java:2670)
at java.lang.Class.getMethod(Class.java:1603)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.
at org.apache.hadoop.util.ProgramDriver.addClass(ProgramDriver.java:103)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.partition.InputSampler$Sampler
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 12 more
hadoop@bigdataserver1:~/hadoop>
Solution is to source the mapreduce classpath in the hadoop-env.sh file.
# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=/home/hadoop/hadoop/share/hadoop/mapreduce/*:$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
hadoop@bigdataserver1:~/hadoop> hadoop classpath
/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/*:/home/hadoop/hadoop/share/hadoop/common/*:/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop/share/hadoop/hdfs/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/mapreduce/share/hadoop/mapreduce/*
hadoop@bigdataserver1:~/hadoop> ls /home/hadoop/hadoop/share/hadoop/mapreduce/share/hadoop/mapreduce/*
/bin/ls: /home/hadoop/hadoop/share/hadoop/mapreduce/share/hadoop/mapreduce/*: No such file or directory
hadoop@bigdataserver1:~/hadoop> pwd
/home/hadoop/hadoop
hadoop@bigdataserver1:~/hadoop> echo $CLASSPATH
hadoop@bigdataserver1:~/hadoop> vi etc/hadoop/hadoop-env.sh
hadoop@bigdataserver1:~/hadoop> echo $HADOOP_HOME
hadoop@bigdataserver1:~/hadoop> export HADOOP_HOME=/home/hadoop/hadoop
hadoop@bigdataserver1:~/hadoop> $HADOOP_HOME/contrib/capacity-scheduler/*.jar
hadoop@bigdataserver1:~/hadoop> ls $HADOOP_HOME/contrib/capacity-scheduler/*.jar
/bin/ls: /home/hadoop/hadoop/contrib/capacity-scheduler/*.jar: No such file or directory
hadoop@bigdataserver1:~/hadoop> echo $HADOOP_CLASSPATH
hadoop@bigdataserver1:~/hadoop> ls /home/hadoop/hadoop/share/hadoop/mapreduce
hadoop-mapreduce-client-app-2.0.0-cdh4.2.0.jar hadoop-mapreduce-client-jobclient-2.0.0-cdh4.2.0.jar lib
hadoop-mapreduce-client-common-2.0.0-cdh4.2.0.jar hadoop-mapreduce-client-jobclient-2.0.0-cdh4.2.0-tests.jar lib-examples
hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar hadoop-mapreduce-client-shuffle-2.0.0-cdh4.2.0.jar
hadoop-mapreduce-client-hs-2.0.0-cdh4.2.0.jar hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar
hadoop@bigdataserver1:~/hadoop>
hadoop@bigdataserver1:~/hadoop> vi etc/hadoop/hadoop-env.sh
update class path
hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar WordCount /bigdata1/name.txt /bigdata1/output
Unknown program 'WordCount' chosen.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
hadoop@bigdataserver1:~/hadoop>
Looks positive that the mapreduce is working but with a wrong syntax.