|Audience and Prerequisites |
Become Certified Hadoop Administrator
About The CourseThe Big Data Hadoop Administration training course is perfect for professionals and graduates who want to make a career as Big Data Hadoop Administrator as the course is designed to help them master the Hadoop based admin activities like planning, installing, configuring, monitoring and performance reviewing of large and complex Hadoop clusters. The course comprises of fundamentals of Hadoop, HDFS, Hadoop clusters, MapReduce and HBase and training will make students proficient in handling work on Hadoop clusters while deploying the knowledge gained here on real life Hadoop work scenarios.
Audience & Pre-Requisites
There are no particular pre-requisite for undergoing this training. However, basic knowledge of Linux can be helpful. The Hadoop Administration training course is specifically designed for Hadoop Developers / Admin / Architects along with IT Managers, QA professionals and Support Engineers.
Why Take Big Data Hadoop Administration Training Course?
For working with rapidly streaming huge volumes of Big Data in a distributed environment, Hadoop is the most important and sought after framework. Owing to the high velocity of Big Data and the need to retrieve insights in real time, Hadoop has assumed a pertinent position in the IT department of most organisations and this is why there has been a constant surge in the demand for Hadoop administrators who can handle the criticality of such operations.
Objectives Of The Course
- Understanding the Hadoop architecture and its various components
- Learning installation and configuration of Hadoop
- Deeper insights about HDFS (Hadoop Distributed File System)
- Understanding MapReduce abstraction process and its working
- Troubleshooting issues in cluster and recovering from node failures
- Understanding the concepts of Hive, Oozie, Pig, Flume and Sqoop
- Optimizing Hadoop clusters for improved performance
1. Introduction To Hadoop & HDFS
- Fundamentals of Hadoop, its usage in Big Data applications, how it differs from the traditional database management systems, main components of Hadoop and its architecture.
- Overview of the HDFS (Hadoop Distributed File System), the HDFS architecture, learning the process of file storage in a distributed environment by HDFS, various Hadoop file systems, failure components and recovery techniques in addition to understanding load balancing and block placement in Hadoop cluster.
- In-depth understanding of the working of HDFS, learning about the commands and operations carried out in HDFS, how file is read by HDFS, etc.
2. Hadoop Cluster Planning & Deployment
- Learning how to design and configure multi-node Hadoop cluster, HDFS block replication, capacity management, understanding Hadoop cluster’s network topology and Hadoop based rack awareness.
- Understandings steps involved in Hadoop installation, types of Hadoop deployment, best practices for memory, disk and CPU allocations, work profiling along with in-depth understanding Hadoop cluster’s distributed architecture.
3. MapReduce Abstraction & Configuration Of Hadoop Cluster
- Fundamentals of MapReduce abstraction, its working pattern on large datasets, the mapping and reducing functions, the several components in the MapReduce process, associated terminologies, etc.
- Understanding the configuration of Hadoop in the cluster, the various values and parameters for configuration, learning the parameters in MapReduce and HDFS, include and exclude configuration files, Hadoop environment configuration files, etc.
4. Hadoop Administration, Maintenance, Monitoring & Troubleshooting
- Introduction to Hadoop administration and maintenance, learning the various directory structures and files, understanding metadata and data backup, datanode and filenode, the failure and recovery procedures, maintaining Hadoop clusters, node addition and removal, , understanding of Schedulers, the MapReduce programming model, etc.
- Learning Hadoop cluster monitoring and troubleshooting, deploying stack traces and logs for this purpose, understanding the various open source tools for Hadoop cluster monitoring.
5. Job Scheduling In Hadoop & Project Work
- What is scheduling in Hadoop, the Fair Scheduler to ensure fair sharing in every queue, the Capacity Scheduler for simulation of the Hadoop cluster for FIFO, configuring Fair Scheduler, etc.
- Hadoop admin project work.