Thursday, March 14, 2013

BIG DATA CDH Single Node Setup

BIG DATA is getting Bigger and Bigger
BIG DATA Getting Started with HADOOP
BIG DATA Cloudera and Oracle
BIG DATA CDH Single Node Setup
BIG DATA HADOOP Services Startup and Shutdown
BIG DATA Moving a file to HDFS
BIG DATA HADOOP Testing with MapReduce Examples Part 1
BIG DATA HADOOP Testing with MapReduce Examples Part 2
BIG DATA HADOOP Testing with MapReduce Examples Part 3


To get started with a single node setup here are some simple steps.

- Get a machine which supports CDH , in my case I had a SUSE Linux
- Create a hadoop user
- Download hadoop tar ball hadoop-2.0.0-cdh4.2.0.tar.gz
- Download and Install JDK 1.6 or 1.7
- Unpack the tar ball hadoop-2.0.0-cdh4.2.0.tar.gz
- Source the JAVA_HOME variable
- Modify the hadoop configuration files
- Set password less ssh to the localhost
- Format hadoop namenode
- Start the services
- Validate the setup

I am jumping directly to the configuration files configuration and then to formatting namenode




















Formatting the hadoop nodename

hadoop@bigdataserver1:~> hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

13/03/13 11:15:08 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = bigdataserver1/10.216.9.25
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.0.0-cdh4.2.0
STARTUP_MSG:   classpath = /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-server-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-lang-2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/junit-4.8.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-core-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/zookeeper-3.4.5-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/stax-api-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jets3t-0.6.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jline-0.9.94.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-api-1.6.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kfs-0.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-json-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-math-2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-annotations-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-el-1.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/avro-1.7.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.cloudera.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/hadoop/share/hadoop/common/lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/hadoop/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-io-2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-auth-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-6.1.26.cloudera.2.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.0.0-cdh4.2.0-tests.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.0.0-cdh4.2.0-sources.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.0.0-cdh4.2.0-test-sources.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.0.0-cdh4.2.0.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jersey-server-1.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-lang-2.5.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jersey-core-1.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/zookeeper-3.4.5-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.3.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jline-0.9.94.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-io-2.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.cloudera.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.0-cdh4.2.0-sources.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.0-cdh4.2.0-tests.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.0-cdh4.2.0-test-sources.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-server-1.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-core-1.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-guice-1.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/netty-3.2.4.Final.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/hadoop-annotations-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/avro-1.7.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-io-2.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-site-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.0.0-cdh4.2.0-tests.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/netty-3.2.4.Final.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/avro-1.7.3.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/commons-io-2.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.2.0-tests.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.0.0-cdh4.2.0.jar
STARTUP_MSG:   build = file:///var/lib/jenkins/workspace/CDH4.2.0-Packaging-Hadoop/build/cdh4/hadoop/2.0.0-cdh4.2.0/source/hadoop-common-project/hadoop-common -r 8bce4bd28a464e0a92950c50ba01a9deb1d85686; compiled by 'jenkins' on Fri Feb 15 10:42:32 PST 2013
STARTUP_MSG:   java = 1.6.0_18
************************************************************/
13/03/13 11:15:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-5d53a7be-005c-4d8b-9f93-088c795cbb35
13/03/13 11:15:10 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
13/03/13 11:15:10 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
13/03/13 11:15:10 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
13/03/13 11:15:10 INFO blockmanagement.BlockManager: defaultReplication         = 1
13/03/13 11:15:10 INFO blockmanagement.BlockManager: maxReplication             = 512
13/03/13 11:15:10 INFO blockmanagement.BlockManager: minReplication             = 1
13/03/13 11:15:10 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
13/03/13 11:15:10 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
13/03/13 11:15:10 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
13/03/13 11:15:10 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
13/03/13 11:15:10 INFO namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
13/03/13 11:15:10 INFO namenode.FSNamesystem: supergroup          = supergroup
13/03/13 11:15:10 INFO namenode.FSNamesystem: isPermissionEnabled = false
13/03/13 11:15:10 INFO namenode.FSNamesystem: HA Enabled: false
13/03/13 11:15:10 INFO namenode.FSNamesystem: Append Enabled: true
13/03/13 11:15:11 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/03/13 11:15:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
13/03/13 11:15:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
13/03/13 11:15:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
13/03/13 11:15:11 INFO namenode.NNStorage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
13/03/13 11:15:11 INFO namenode.FSImage: Saving image file /tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
13/03/13 11:15:11 INFO namenode.FSImage: Image file of size 121 saved in 0 seconds.
13/03/13 11:15:11 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
13/03/13 11:15:11 INFO util.ExitUtil: Exiting with status 0
13/03/13 11:15:11 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bigdataserver1/10.216.9.25
************************************************************/
hadoop@bigdataserver1:~>



Start the service using start-all.sh

Health Check : you can verify the service using the below url 
http://bigdataserver1.bigdata.com:50070/dfshealth.jsp

Popular Posts