Introduction to Hadoop

  • What is Big data,  Hadoop&NoSQL
  • Brief history of Hadoop
  • Hadoop distributions
  • Hadoop Real time use cases
  • Hadoop Architecture and Releases
  • Hadoop installation modes
    • Standalone installation.
    • Single Node Cluster
    • Multi Node Cluster
  • Hardware & Software Requirements
  • Hadoop Eco System
  • Enterprise Roles

Hadoop Distributed File System

  • HDFS architecture
  • HDFS Concepts
  • 5 daemons of Hadoop
    • NameNode and its functionality
    • DataNode and its functionality
    • JobTracker and its functionality
    • TaskTrack and its functionality
    • Secondary Name Node and its functionality.
  • Data Storage in HDFS
  • Accessing HDFS
    • CLI (Command Line Interface)
    • Hadoop Admin commands
    • Java Based Approach
  • Demo examples and assignment

MapReduce Programming

  • MapReduce architecture
  • MapReduce Programming Model
  • Different phases of  Map Reduce Algorithm
  • Different Data types in Map Reduce
  • How to write a Map Reduce Applications
    • The Driver Code
    • The Mapper
    • The Reducer
  • Input formatters
  • Text Input Format
  • KeyValue InputFormat
  • Sequence File Input Format
  • Introduction to MapReduce Streaming
  • Data localization in Map Reduce
  • Combiner(Mini Reducer) and Partioner
  • Distributed Cache
  • MapReduce – YARN
  • Sample Programs to understand theMap Reduce applications.
  • Assignments on 5 different data sets.

Apache Pig

  • Introduction to Apache  Pig
  • Map Reduce Vs Apache Pig
  • SQL Vs Apache Pig
  • Different datatypes in Pig

Modes ofExecution in Pig

  • Local Mode
  • Map Reduce OR Distributed Mode
  • Execution Mechanism
  • Grunt Shell
  • Embedded Script
  • Transformations in Pig
  • How to write a simple pig script
  • UDFs in Pig
  • Sample Programs to understand Pig Latin
  • Assignments’ using 3 data sets.

Apache HIVE

  • Hive Introduction
  • Hive architecture
  • Hive Meta Store
  • Integration with Hadoop
  • Query Language(Hive QL)
  • Configuring Hive with MySQL MetaStore
  •   SQL VS Hive QL
  •   Hive UDF
  •   Hive UDAF
  • Sample Programs to understand Hive SQL.
  • Assignments’ using 3 data sets.

Apache Sqoop

  • Introduction to Sqoop.
  • MySQL client and Server Installation
  • How to connect to Relational Database using Sqoop
  • Different Sqoop Commands
  • Different flavors of Imports
  • Assignment using MySql data base.

Apache HBase

  • HBase introduction
  • HBase usecases
  • Basics
  • Column families
  • Scans
  • HBase Architecture
  • Clients
  • REST
  • Thrift
  • HBase Admin
  • Schema Definition
  • Basic CRUD Operations
  • Sample Programs to understand the HBase.
  • Assignments.

Apache Flume

  • Flume Introduction
  • Flume Architecture
  • Flume Configurations
  • How to configure Flume-NG
  •  Real Time Twitter data Use Case using Apache Flume

Apache Oozie

  • Oozie Introduction
  • Oozie Architectrure
  • Oozie Configuration Files
  • Oozie Job Submission
  • Workflow.xml
  • Coordinator.xml
  • job.coordinator.properties
  • Configuration of MapReduce job workflow


  • Installing and running ZooKeeper
  • ZooKeeper Service
  • Building applications with ZooKeeper
  • ZooKeeper in Production

MongoDB (As part of NoSQL Databases)

  • Need of NoSQL Databases
  • Relational VS Non-Relational Databases
  • Introduction to MongoDB
  • Features of MongoDB
  • Installation of MongoDB
  • Basic operations

Introduction to Apache Mahout
Real Time user cases

Week1:  Introduction to Hadoop
Hadoop Installation
Linux Basics
Core Java
Week2:  Introduction to MapReduce
Writing Map Reduce Applications.
SQL Commands
Week3:  Apache PIG
Apache HIVE
Apache Sqoop
Week4: Apache Hbase
Apache Flume
Apache Oozie
Apache ZoopKeeper
Introduction to Mahout.

Week Wise Schedule

4 Weeks –2 hours each day (Monday to Friday)
Total: 40 hours of training.


Core Java – Additional classes will be provided for Non- IT students.
Linux– We cover from ground up, basic knowledge o as well as usage of any one of the Linux OS is recommended.
SQL - Additional classes will be provide for Non- IT students.

Benefits of the Course

  • In-depth coverage of every topic including new features of Map Reduce 2.
  • Covers all the modules

1. Installation of each and every tool
2. Using Cloudera
3. Using Apache latest stable distribution.
4. Using Hortonworks distribution.

  • Content is very well planned and organized
  • Complete Support in assignments, even after the course is completed.
  • Live software installation experience.
  • Complete Course Material with Hands on practice walkthroughs.
  • Mock Interviews will be conducted on one-to-one basis during the completion of each major module including final mock interview at the end of the course duration.
  • 2 hours of special classes for Cloudera admin or developer aspirants.