|Audience and Prerequisites |
26-November-2017, 7 PM IST | 6:30 AM PST | 8:30 AM EST
About The Course
This combo course is specifically designed to provide hands on understanding on how to create Spark applications using Scala programming. The Apache Spark & Scala training course offers in-depth knowledge about what Spark is all about and how it is different from Hadoop. The students getting enrolled for this course can get knowledge of techniques for improving application performance and enabling high speed processing by making use of Spark RDDs along with understanding the customisation of Spark using Scala.
Audience & Pre-Requisites
- The program is best suited for software engineers, ETL developers, data engineers, data scientists, analytics professionals and graduates looking to make a career in Big Data.
- The pre-requisite comes in the form of basic knowledge of SQL, databases and any query language.
Curriculum-Apache Spark and Scala Developer
1. Introduction To Scala & Execution Of The Scala Code
- Understanding Scala and its deployment in Apache Spark analytics and Big Data applications.
- Importance of Scala, introduction to the concept of REPL, in-depth learning of Scala pattern matching, high order function, currying, using Scala for data analysis, etc.
- Learning what is Scala interpreter, implicit classes, static object timer and string equality testing in Scala.
2. Classes & Trait Concepts In Scala
- Learning about the Classes concept in Scala, what is constructor overloading, concept of object equality, types of hierarchy in Scala and val and var methods.
- Understanding sealed traits, constructor, tuple, wild, constant pattern and variable pattern.
- Understanding traits in Scala, advantages and linearization of traits in Scala, etc.
3. Scala Java Interoperability & Scala Collections
- Traits implementation in Scala and Java and handling of extending multiple traits.
- Understanding the concept of Scala collections, their classification and points of distinction between Iterator and Iterable in Scala.
- Introduction to the concepts of Mutable and Immutable collections in Scala, understanding queues in Scala, lists and arrays, list buffer and array buffer in Scala, double ended queue, stacks, sets, deque, maps, tuples in Scala.
4. Fundamentals Of Spark & Working With RDDs
- Introduction to Spark, how Spark is better than MapReduce, understanding the concepts – in memory MapReduce, Spark Hadoop YARN, YARN revision, HDFS revision, how Spark is better than Hadoop and how to deploy Spark without Hadoop.
- Learning the installation process of Spark, concept of RDD (Resilient Distributed Datasets), working with Spark Shell, understanding functional programming of Spark and its architecture.
- In-depth study of Spark RDDs, its general operations, understanding how to use RDD for speedy and efficient data processing.
- Learning how Spark can improve processing speed of MapReduce, introduction to the concept of key value pair in RDDs along with various RDD based operations.
5. Writing & Implementing Spark Applications & Basic Spark Streaming
- Understanding the comparison between Spark applications with Spark Shell, creating Spark applications using Java or Scala and deploying a Spark application.
- Learning what is parallel processing in Spark, deploying them on cluster, understanding Spark partitions, partitioning of RDDs on file basis, HDFS and data locality and mastering the parallel operations techniques.
- Understanding distributed persistence, RDD persistence overview and RDD lineage.
- Understanding the concept of Spark streaming, creating an application on Spark stream, processing of Spark stream along with streaming request count and DStreams.
- Understanding multi-batch operations in Spark, sliding window operations, state operations and advanced data sources.
- Learning the common use cases in Spark, concept of iterative algorithm, analysing with graph processing in Spark in addition to the introduction to K-Means and machine learning.
6. Learning To Improve Spark Performance & Understanding Spark SQL & Data Frames
- Understanding the importance of various variables in Spark like shared variables, broadcast variables and the common performance issues, learning about accumulators and troubleshooting the problems affecting performance.
- Learning the concept of Spark SQL, the context of SQL in Spark for offering structured data processing solutions, understanding the Spark Data Frames, querying and transforming data in Data Frames, how Data Frame renders the benefit of both Spark SQL and Spark RDD and deployment of Hive on Spark as an execution engine.
7. Scheduling / Partitioning & Capacity Planning In Spark
- Understanding about the scheduling and partitioning in Spark, within and around application scheduling, dynamic sharing, static partitioning, fair scheduling, standby Masters with Zookeeper, Spark master high availability, single node recovery with local file system, etc.
- Learning how to devise capacity planning in Spark, creating maps, transformations and the concept of concurrency in Java and Scala.
- Understanding about Spark based log analysis, working with various buffers like array, compact and protocol buffer and first log analyzers in Spark.