Thursday 14 March 2013

BIG DATA HADOOP Testing with MapReduce Examples Part 3


In BIG DATA HADOOP Testing with MapReduce Examples Part 1 and BIG DATA HADOOP Testing with MapReduce Examples Part 2 I have resolved some of the issues in getting the HADOOP running but still I have some issues left over and this time it is "Invalid shuffle port number -1 returned" when the mapreduce jobs are submitted.


hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output
13/03/13 14:59:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/13 14:59:39 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/03/13 14:59:39 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/03/13 14:59:40 INFO input.FileInputFormat: Total input paths to process : 1
13/03/13 14:59:41 INFO mapreduce.JobSubmitter: number of splits:1
13/03/13 14:59:41 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/03/13 14:59:41 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/03/13 14:59:41 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
13/03/13 14:59:41 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/03/13 14:59:41 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/03/13 14:59:41 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/03/13 14:59:41 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/03/13 14:59:41 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/03/13 14:59:41 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/03/13 14:59:41 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/03/13 14:59:41 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/03/13 14:59:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1363184126427_0001
13/03/13 14:59:41 INFO client.YarnClientImpl: Submitted application application_1363184126427_0001 to ResourceManager at /0.0.0.0:8032
13/03/13 14:59:42 INFO mapreduce.Job: The url to track the job: http://bigdataserver1:8088/proxy/application_1363184126427_0001/
13/03/13 14:59:42 INFO mapreduce.Job: Running job: job_1363184126427_0001
13/03/13 14:59:53 INFO mapreduce.Job: Job job_1363184126427_0001 running in uber mode : false
13/03/13 14:59:53 INFO mapreduce.Job:  map 0% reduce 0%
13/03/13 14:59:54 INFO mapreduce.Job: Task Id : attempt_1363184126427_0001_m_000000_0, Status : FAILED
Container launch failed for container_1363184126427_0001_01_000002 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1363184126427_0001_m_000000_0
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:170)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:399)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

13/03/13 14:59:55 INFO mapreduce.Job: Task Id : attempt_1363184126427_0001_m_000000_1, Status : FAILED
Container launch failed for container_1363184126427_0001_01_000003 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1363184126427_0001_m_000000_1
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:170)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:399)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

13/03/13 14:59:57 INFO mapreduce.Job: Task Id : attempt_1363184126427_0001_m_000000_2, Status : FAILED
Container launch failed for container_1363184126427_0001_01_000004 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1363184126427_0001_m_000000_2
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:170)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:399)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

13/03/13 14:59:59 INFO mapreduce.Job:  map 100% reduce 0%
13/03/13 14:59:59 INFO mapreduce.Job: Job job_1363184126427_0001 failed with state FAILED due to: Task failed task_1363184126427_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

13/03/13 15:00:00 INFO mapreduce.Job: Counters: 4
        Job Counters
                Other local map tasks=3
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=0
                Total time spent by all reduces in occupied slots (ms)=0
hadoop@bigdataserver1:~/hadoop>


Solution is to update yarn-site.xml with the values below and then restart the HADOOP cluster.














hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output4
13/03/13 15:23:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/13 15:23:15 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/03/13 15:23:15 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/03/13 15:23:16 INFO input.FileInputFormat: Total input paths to process : 1
13/03/13 15:23:16 INFO mapreduce.JobSubmitter: number of splits:1
13/03/13 15:23:16 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/03/13 15:23:16 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/03/13 15:23:16 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
13/03/13 15:23:16 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/03/13 15:23:16 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/03/13 15:23:16 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/03/13 15:23:16 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/03/13 15:23:16 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/03/13 15:23:16 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/03/13 15:23:16 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/03/13 15:23:16 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/03/13 15:23:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1363188167312_0001
13/03/13 15:23:17 INFO client.YarnClientImpl: Submitted application application_1363188167312_0001 to ResourceManager at /0.0.0.0:8032
13/03/13 15:23:17 INFO mapreduce.Job: The url to track the job: http://bigdataserver1:8088/proxy/application_1363188167312_0001/
13/03/13 15:23:17 INFO mapreduce.Job: Running job: job_1363188167312_0001
13/03/13 15:23:27 INFO mapreduce.Job: Job job_1363188167312_0001 running in uber mode : false
13/03/13 15:23:27 INFO mapreduce.Job:  map 0% reduce 0%
13/03/13 15:23:32 INFO mapreduce.Job:  map 100% reduce 0%
13/03/13 15:23:38 INFO mapreduce.Job:  map 100% reduce 100%
13/03/13 15:23:38 INFO mapreduce.Job: Job job_1363188167312_0001 completed successfully
13/03/13 15:23:38 INFO mapreduce.Job: Counters: 43
        File System Counters
                FILE: Number of bytes read=2369
                FILE: Number of bytes written=140677
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1474
                HDFS: Number of bytes written=1631
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3851
                Total time spent by all reduces in occupied slots (ms)=4999
        Map-Reduce Framework
                Map input records=200
                Map output records=199
                Map output bytes=2165
                Map output materialized bytes=2369
                Input split bytes=104
                Combine input records=199
                Combine output records=183
                Reduce input groups=183
                Reduce shuffle bytes=2369
                Reduce input records=183
                Reduce output records=183
                Spilled Records=366
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=57
                CPU time spent (ms)=2730
                Physical memory (bytes) snapshot=358436864
                Virtual memory (bytes) snapshot=926806016
                Total committed heap usage (bytes)=303431680
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1370
        File Output Format Counters
                Bytes Written=1631
hadoop@bigdataserver1:~/hadoop>



Finally I have verified my HADOOP cluster.

BIG DATA HADOOP Testing with MapReduce Examples Part 2

In BIG DATA HADOOP Testing with MapReduce Examples Part 1 the map reduce example was not complete. I ran the wordcount with the proper lower case to see what happens and again it error-ed.


hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output
13/03/13 14:58:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/13 14:58:06 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "localhost:9001"
13/03/13 14:58:06 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
        at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
        at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
        at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1188)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1184)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapreduce.Job.connect(Job.java:1183)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1236)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
hadoop@bigdataserver1:~/hadoop>

Solution is to update configuration file mapred-site.xml with the correct property values.


















Running again results in another round of errors


hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output
13/03/13 14:59:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/13 14:59:39 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/03/13 14:59:39 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/03/13 14:59:40 INFO input.FileInputFormat: Total input paths to process : 1
13/03/13 14:59:41 INFO mapreduce.JobSubmitter: number of splits:1
13/03/13 14:59:41 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/03/13 14:59:41 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/03/13 14:59:41 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
13/03/13 14:59:41 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/03/13 14:59:41 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/03/13 14:59:41 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/03/13 14:59:41 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/03/13 14:59:41 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/03/13 14:59:41 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/03/13 14:59:41 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/03/13 14:59:41 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/03/13 14:59:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1363184126427_0001
13/03/13 14:59:41 INFO client.YarnClientImpl: Submitted application application_1363184126427_0001 to ResourceManager at /0.0.0.0:8032
13/03/13 14:59:42 INFO mapreduce.Job: The url to track the job: http://bigdataserver1.f1:8088/proxy/application_1363184126427_0001/
13/03/13 14:59:42 INFO mapreduce.Job: Running job: job_1363184126427_0001
13/03/13 14:59:53 INFO mapreduce.Job: Job job_1363184126427_0001 running in uber mode : false
13/03/13 14:59:53 INFO mapreduce.Job:  map 0% reduce 0%
13/03/13 14:59:54 INFO mapreduce.Job: Task Id : attempt_1363184126427_0001_m_000000_0, Status : FAILED
Container launch failed for container_1363184126427_0001_01_000002 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1363184126427_0001_m_000000_0
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:170)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:399)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

13/03/13 14:59:55 INFO mapreduce.Job: Task Id : attempt_1363184126427_0001_m_000000_1, Status : FAILED
Container launch failed for container_1363184126427_0001_01_000003 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1363184126427_0001_m_000000_1
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:170)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:399)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

13/03/13 14:59:57 INFO mapreduce.Job: Task Id : attempt_1363184126427_0001_m_000000_2, Status : FAILED
Container launch failed for container_1363184126427_0001_01_000004 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1363184126427_0001_m_000000_2
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:170)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:399)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

13/03/13 14:59:59 INFO mapreduce.Job:  map 100% reduce 0%
13/03/13 14:59:59 INFO mapreduce.Job: Job job_1363184126427_0001 failed with state FAILED due to: Task failed task_1363184126427_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

13/03/13 15:00:00 INFO mapreduce.Job: Counters: 4
        Job Counters
                Other local map tasks=3
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=0
                Total time spent by all reduces in occupied slots (ms)=0
hadoop@bigdataserver1:~/hadoop>


Hunting for fix