Training Module

Big Data / Hadoop Modules

Module 1- Big Data-Introduction :-

  • What is Big Data?
  • Understanding Big Data
  • Characteristics of Big Data - The Five V's
  • Big Data vs. Data & Information
  • Big data Adoption
  • Big Data Challeneges

Module 2- Big Data Platfoem- Hadoop :-

  • what is Hadoop?
  • Hadoop & its components
  • Hadoop ecosyetms
  • Hadoopp Hdfs (hadoop distyributed file system)
  • Hadoop Proceessing(map-reduce)
  • Basic hadoop commands
  • Single node multinode hadoop architecture
  • Limitations of Covential systems like mysql etc.

Module 3- Hadoop HDFS (Hadoop Distributed File System) :-

  • Hadoop file system
  • HDFS architecture
  • HDFS Design & Concepts
  • Blocks, Name nodes and Data nodes
  • HDFS High-Availability and HDFS Federation
  • Hadoop DFS The Command-Line Interface
  • Basic File System Operations
  • Anatomy of File Read,File Write
  • Block Placement Policy and Modes
  • More detailed explanation about Configuration files
  • Metadata, FS image, Edit log, Secondary Name Node and Safe Mode
  • FSCK Utility. (Block report)
  • How to override default configuration at system level and Programming level
  • HDFS Federation

Module 4- Hadoop Processing Paradigm- Map Reduce :-

  • Hadoop map reduce framework
  • Processing Concept of Map-reduce
  • Traditional proceesinmg vs map-reduce processing
  • map-reduce introduction
  • YARN architectuture
  • Input splits and hdfs blocks
  • Map reduce-combiner & partitioner

Module- 5 Hadoop Ecosystem- Apache Hive-Hadoop Warehouse :-

  • What is eco-system?
  • Introduction to Apache Hive
  • Hive vs Pig
  • Hive Architecture and Components
  • Hive Metastore
  • Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models
  • Hive-partition & Bucketing
  • introduction to partition & Bucketing
  • Comaprison of Mysql partition vs. Hive partition
  • Hive Tables & improt data
  • Hive opeartions on demo database of Healthcare Dataset & Twitter Dataset
  • Hive Installation, Introduction and Architecture
  • Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
  • Meta store, Hive QL
  • OLTP vs. OLAP
  • Hands on Exercises

Module-6 Hbase- Database :-

  • Introduction to hadoop database
  • Comparison of Relational database & Column oriented database
  • HBase Installtion & Architecture
  • Hbase queries for creating table & importing data
  • Linking of HBase & Zookeeper

Module -7 Apache Pig :-

  • Introduction to Apache Pig
  • MapReduce vs Pig
  • Pig Components & Pig Execution
  • Pig Data Types & Data Models in Pig
  • Shell and Utility Commands
  • Pig Demo of Healthcare Dataset
  • Pig Installation
  • Data Processing
  • Schema on read
  • Primitive data types and complex data types
  • Loading and Storing
  • Filtering, Grouping and Joining
  • Working with Functions
  • User Defined Functions

Module -8 Apache Spark :-

  • What is Spark
  • Spark Ecosystem
  • Spark Components
  • What is Scala
  • Why Scala ?
  • Spark Context
  • Spark RDD

Module -9 R Language :-

  • Indroduction of R langauge
  • Installtion of rstudio & R language
  • Why we used R ?
  • Integration of R and hadoop system
  • Study about Rhadoop & RHIPE tools
  • Analysis of dataset using RHadoop
  • Live projects related with social media using RHadoop

Cerrtification Project :-

  • Analyses of a Online Big Mart Database & Twitter server
Your browser does not support the canvas element.