Senthil Rajendran's Blog: BIG DATA Getting Started with HADOOP

Thursday, March 14, 2013

BIG DATA Getting Started with HADOOP

BIG DATA is getting Bigger and Bigger
BIG DATA Getting Started with HADOOP
BIG DATA Cloudera and Oracle
BIG DATA CDH Single Node Setup
BIG DATA HADOOP Services Startup and Shutdown
BIG DATA Moving a file to HDFS
BIG DATA HADOOP Testing with MapReduce Examples Part 1
BIG DATA HADOOP Testing with MapReduce Examples Part 2

BIG DATA HADOOP Testing with MapReduce Examples Part 3

Hadoop was founded by Douglass Read Cutting and picked the name from one of his son's toy elephant.

It is a open source project from Apache that evolved rapidly into a major technology movement. It is capable to handle large data set which are structured and unstructured. It had ability to run on low cost clusters and can scale up rapidly.

Hadoop architecture helps applications run on nodes which has thousands of terabytes of data. It has a distributed file system called HDFS (Hadoop Distributed File System) with fast data transfer rates between the clustered nodes and also support failures. It does not require RAID Storage as it achieves reliability by replication technology on multiple hosts.

Hadoop is a collection of components like MapReduce , Hive , Pig , NoSQL Database , ZooKeeper , Ambari , HCatalog , Oozie, Hue and more.

MapReduce is one key component. It is a framework for writing applications that processes large amount of data which are structured and unstructured in nature. MapReduce applications can be designed for a single node Hadoop and then deployed on a 100 node Hadoop.

MapReduce engine had one JobTracker to which jobs are submitted. The JobTracker then pushes work to the TaskTracker nodes in the cluster. JobTracker and TaskTracker help to complete the allocated job.

I will share my experience in getting a single node hadoop running and then running a mapreduce sample application in details.

Thursday, March 14, 2013

BIG DATA Getting Started with HADOOP

Popular Posts