Big Data and Hadoop Developer Weekend Batch

Big Data and Hadoop Developer Weekend Batch

Big Data and Hadoop Developer Weekend Batch

Training Cost: $395.00
Training Type Instructor Based Online Training
Schedule 24-March-2018, 7:30 AM PST | 9:30 AM EST | 7:00 PM IST
Audience and Prerequisites

The program is best suited for Developers, Engineers and Architects who want to pursue there future in Hadoop and related tolls to solve Real World Data Problems.

The participant should be from Programming background or should have an ability toprogram on JAVA, Scala or Python as some the Hands – On exercise will be fromthese Programming Languages. Familiarly with Linux commands and SQL is helpful for better understanding.

No prior knowledge or understanding of Hadoop or Big Data is required.

Objectives

Get Certified as a Cloudera/Hortonworks Hadoop Developer.


Total 60 Hours Training :  20 hours of live session + 20 hours of labs and assignment + 20 hours Project.

Live Session: 10 Sessions of 2-3 hours each.

Training will be conducted with the help of LMS (Learning Management System), GOTOWebinar Application and all video will be accessed by the participants after the Training through LMS. 

All codes for Assignments will be accessed through Github.

A separate Access to our COSO IT Big Data Labs will be given for Project, Assignment and Practicing. 

Learn Concepts and Technique of Big Data on Hadoop Ecosystem for solving Real Time Enterprises Data Problems through COSO IT’s Real Time Clusters not on a virtual machine.

Course Outcomes:

After completion of Training Program, participants will be able to:

Certification COSO IT Certified Big Data and Hadoop Developer.
Curriculum

1. Big Data Introduction.

  • Understanding Business Analytics lifecycle.

  • Hadoop Introduction.

  • Understanding Hadoop Characteristics.

  • Understanding Hadoop Ecosystem.

  • Understanding Hadoop Core components.

  • Assignment - Installation and Basic Labs Documents.

  • Access to LMS (Learning Management system).

  • Access to Real Time Clusters.

2. HDFS Internal and Yarn:

  • Data Replication and Rack Awareness.

  • HDFS File Read and Write Anatomy.

  • Understanding HDFS Architecture.

  • Common Hadoop Shell Commands.

  • HDFS Federation.

  • Understanding YARN.

  • Firing our First Map Reduce Job.

  • Checking the output of M/R Job and Understanding the dump of Map Reduce Job.

  • HDFS Commands document, Lab files and Assignment  based on commands.

3. Introduction to Map-Reduce:

  • Understanding Hadoop Cluster Modes.

  • Configuration Files, Web URLs, and Split vs Blocks.

  • Map Reduce Use-Cases.

  • Solving a problem in Traditional way.

  • Understanding Map Reduce way.

  • Map Reduce Anatomy.

  • Advantages of Map Reduce, and Map Reduce Flow.

  • Map-Reduce Commands documents, Lab files and Assignment  based on commands.

  • Project 1 - On Map-Reduce.

4. Pig and Advanced Pig:

  • Pig Background.

  • Need for Pig.

  • Pig Vs M/R.

  • Pig Definition.

  • Pig Latin.

  • Pig Users.

  • Pig usage at Yahoo.

  • Pig Interaction Modes.

  • Pig Program Execution.

  • Pig Data Model.

  • Pig Data Types.

  • Pig operator and specialized Joins.

  • Pig Commands document, Lab document and Assignment  based on commands.

  • Project 2 - On Pig.

5. Hive and Advanced Hive:

  • Hive Background.

  • Hive Definition.

  • Pig vs Hive.

  • Hive Components.

  • Hive architecture.

  • Hive meta Store.

  • Hive Design.

  • Hive Data Model.

  • Partition and Buckets.

  • Hive Commands document, Lab document and Assignment  based on commands.

  • Project 3 - On Hive.

6. H-Base and Advanced H-Base:

  • No-Sql Background and Description.

  • Real Time Scenarios.

  • No-Sql Landscape.

  • H-Base Definition.

  • H-Base Characteristics.

  • H-Base History.

  • H-Base Vs RDBMS.

  • H-Base Data Model – Graphical Representation.

  • H-Base Data Model – Logical VS Physical representation.

  • Version Concepts.

  • Region, Region Servers and Zookeeper.

  • H-Base Commands document, Lab document and Assignment  based on commands.

7. Oozie and Sqoop:

  • Oozie Workflow.

  • Oozie servers.

  • Oozie Co-ordinator.

  • Oozie Bundles.

  • Configuration XML and Properties file.

  • Creating Oozie applications.

  • Oozie Scheduling.

  • Scoop set-up between Hadoop and RDBMS.

  • Exporting Data from Hadoop into RDBMS.

  • Importing Data from RDBMS into Hadoop.

  • Sqoop Commands document, Lab document and Assignment  based on commands.

  • OOZIE Commands document, Lab document and Assignment  based on commands.

8. Zookeeper:

  • Zookeeper Master.

  • Zookeeper Slave.

  • Concept of Ephemeral Nodes.

  • Persistent and Optional Sequential Numbering.

  • Configuration Management.

  • Flume Agent.

  • Source.

  • Sync.

  • Defining the flume flow.

  • Configuring Individual Concepts.

  • Configuring entire Flume set up.

  • Zookeeper Commands document, Lab document and Assignment  based on commands.

9. Flume:

  • What is Apache Flume

  • Flume Architecture.

  • Flume Agent.

  • Source.

  • Sync.

  • Defining the flume flow.

  • Configuring Individual Concepts.

  • Configuring entire Flume set up.

  • Flume Commands document, Lab document and Assignment  based on commands.

  • Project 4 - Flume.


Certificate will be issued after completion of all Assignments and 4 Projects. 24*7 Support from Instructor is provided with one-on-one session for clearing problems.

Class Room Location Online - At Your Desk.

Get Ready for developing Big data Applications on Hadoop and for any Hadoop Developer Exams and Jobs. Training and Practice on Real-Time Hadoop Clusters! 

Training will be conducted with the help of LMS (Learning Management System), GOTOWebinar Application and all video will be accessed by the participants after the Training through LMS. A Separate access to our Labs for Projects and Assignments are to be given.

Certification Program for Expertise in Developing Big Data Solution on Hadoop.

Why Big Data Technologies Trainings on Real-Time Clusters is important?

  • Real world experience means that one gets to work on real machinery in a real production environment. Quite often the experience of working in a real environment is far different from that of a simulated one. Obviously, the real environment is valued more by the Big Data world as opposed to the simulated one. We make that possible as we provide Training basically on a cluster of computers (networked computers) which comes pre-installed with the necessary technology stack, including Apache Hadoop, Apache Spark,and other related technologies.

  • As compared to a simulated environment such as a Virtual Machine of Hadoop (basically a virtual machine that needs to be downloaded and run on a single computer), a cluster based learning provides a far more real experience. A virtual machine is run only on one machine while most of the Big Data technology components run on multiple computers.