|Audience and Prerequisites |
Start your Carrier in Analytics
About The Course:
R is a popular programming language that is widely deployed for serving various purposes such as statistical analysis, graphic representation and reporting. This training is designed to let students understand the core concepts of R, import data in several formats for statistical computing, various functions in R, variables, data structures and flow of control.
Audience & Pre-Requisites
While there are no pre-requisites for undergoing this training, basic knowledge of programming language can be helpful. The R Programming training course is specifically designed for Business Intelligence professionals, Software Engineers, Data Analysts, SAS Developers seeking to explore open source technology and graduates or professionals wanting to make a career in data science.
1. Introduction To R & R-Packages
- Understanding R language for statistical programming, introduction to R Studio, features of R, the statistical packages, getting familiar with different data types and functions, learning its deployment in various scenarios, etc.
- Using SQL for applying ‘join’ function, visualization and debugging tools, components of R Studio like code editor and learning about R-bind.
- Learning code compilation, R functions, and data in well-defined format called R-Packages, learn about R-Package structure, Package metadata and testing, CRAN (Comprehensive R Archive Network), Vector creation and variables values assignment.
2. Matrices, Vectors & Sorting Dataframe
- R functionality, Rep Function, generating Repeats, Transpose and Stack Function and Sorting and generating Factor Levels.
- Fundamentals of matrix and vector in R, understanding functions like Merge, Strsplit, rowSums, rowMeans, Matrix manipulation, sequencing, repetition, indexing, colMeans, colSums, etc.
3. Generating Plots & Reading Data From External Files
- Generating plot in R, Bar Plots, Line Plots, Graphs, Histogram and understanding components of Pie Chart.
- Learning the subscripts in plots in R, obtaining parts of vectors, using subscripts with arrays, logical variables with lists and reading data from external files.
4. Variance Analysis & K-Means Clustering
- Understanding the concept of Analysis of Variance (ANOVA) statistical technique, working with Histograms, Pie Charts, deploying ANOVA with R, one way and two way ANOVA.
- K-Means Clustering for Cluster Algorithm, Cluster & Affinity Analysis, cohesive subset of items, working with large datasets, solving clustering issues, association rule mining affinity analysis for the purpose of data mining and analysis in addition to learning co-occurrence relationships.
5. Association Rule Mining & Understanding Relationship With Regression
- Understanding Association Rule Mining, the several concepts of Association Rule Mining, different methods to predict relations between variables in large datasets, the algorithm and rules of Association Rule Mining and learning what single cardinality is.
- Introduction to Simple Linear Regression, the different equations of Slope, Line, Y-Intercept Regression Line, the least square criterion, deploying analysis using Regression, standard error to estimate, calculating and analysing the results, and measure of variation.
- Simple Linear Regression analysis, Two variable Relationship, Scatter Plots, Line of best fit, etc.
- In-depth understanding of measure of variation, co-efficient of determination, F-Test, test statistics with an F-distribution, prediction linear regression and advanced regression in R.
- Logistic Regression in R, Logistic Regression Mean, advanced logistic regression, learning how to do prediction using logistic regression, ensuring the model accuracy, understanding sensitivity and specificity, confusion matrix, etc.
- Learning what ROC is, a graphical plot that illustrates binary classifier system, ROC curve in R for the purpose of determination of specificity / sensitivity trade-offs for a binary classifier.
6. ROC & Kolmogorov Smirnov Chart
- Detailed learning about ROC, area under ROC Curve, data set partitioning, converting the variable, understanding the process of checking for multi-collinearlity, correlation between two or more variables, advanced data set partitioning, interpretation of the output, predicting the output, detailed confusion matrix, deployment of the Hosmer-Lemeshow test for analysing whether the observed event rates are in accordance with the expected event rates.
- Data analysis with R, learning about the WALD test, the importance of the area under ROC Curve and Kolmogorov Smirnov Chart.
7. R Integration With Hadoop & Database Connectivity With R
- Understanding how to create an integrated environment for deployment of R on Hadoop platform, working with R Hadoop, R Hadoop Integrated Programming Environment, R programming for MapReduce job, etc.
- Connecting to different databases from the R environment, deployment of the ODBC tables for reading the data, visualizing the performance of the algorithm by making use of the Confusion Matrix.