Hadoop Big Data Certification Training Course Outline
Module 1: Understanding Hadoop
- What is Web Hadoop?
- Why is Hadoop Important?
- Hadoop Architecture
- Challenges of Using Hadoop
Module 2: Processing Distributed Data
- HDFS
- MapReduce
- Architecture
- Processing Data
Module 3: Introduction to Data Storage and Processing
- Overview
- Projects for Structured Data Storage and Processing
Module 4: Defining Hadoop Cluster Requirements
- Hadoop Cluster
- Advantages
- Hadoop Cluster Architecture
- Best Practices for Building Hadoop Cluster
Module 5: Configuring a Cluster
- Types of Configuration Files Drive Hadoop Configuration
- Code Example
Module 6: Maximising HDFS Robustness
- Three Types of Failures in HDFS
- Data Disk Failure, Heartbeats, and Re-Replication
- Cluster Rebalancing
- Data Integrity
- Metadata Disk Failure
- Snapshots
Module 7: Managing Resources and Cluster Health
- Managing Resources
- Managing HDFS Cluster
- Secondary NameNode Configuration
- MapReduce Cluster Management
Module 8: Maintaining a Cluster
- FileSystem Checks
- HDFS Balancer Utility
- Add New Nodes to Cluster
- Decommissioning a Node from Cluster
- Datanode Volume Failures
- Database Backups
- HDFS Metadata Backup
- Purging Older Log Files
Module 9: Extending Hadoop and Implementing Data Ingress
- Extending Hadoop Towards Data Lake
Module 10: Extending Hadoop and Implementing Data Ingress
- Hadoop Built-in Ingress and Egress Tools
Module 11: Planning for Backup, Recovery, and Security
- Introduction to Backup and Recovery
- Goals and Objectives
Module 12: Introduction to Big Data
- What is Big Data?
- Three V’s
- Sources of Big Data
Module 13: Storing Big Data
- Introduction to Big Data Storage
- Key Requirements of Big Data Storage
- Big Data Storage Architectures
Module 14: Processing Big Data
- Introduction to Data Processing
- Big Data Processing Frameworks
- What is a Traditional Approach?
- MapReduce
- Hadoop and Big Data
- Distributed Storage System
- YARN
- Hadoop 1.0/Hadoop 2.0
- Advantages of Hadoop
- Hadoop Ecosystem
- Hortonworks Data Platform
Module 15: Tools and Techniques to Analyse Big Data
- Apache Hadoop
- Microsoft HDInsight
- NoSQL
- Hive
- Sqoop
- PolyBase
- Big Data in Excel
- Presto
Module 16: Developing a Big Data Strategy
- Steps to Develop a Big Data Strategy
- Understanding Business Objectives
- Have a Clear Strategy for Hadoop
- Build a Data-Driven Culture
- Choose the Right Platform
- Start Small
Module 17: Implementing Big Data Solution
- Steps for Implementing a Big Data Solution
- Collect and Load Data
- Process, Query, Transform Data
- Consume and Visualise Data
- Build End-To-End Solutions