Deprecated: Return type of Ai1wm_Recursive_Directory_Iterator::hasChildren($allow_links = true) should either be compatible with RecursiveDirectoryIterator::hasChildren(bool $allowLinks = false): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home1/greatsyd/public_html/staging1/wp-content/plugins/all-in-one-wp-migration-unlimited-main--old/lib/vendor/servmask/iterator/class-ai1wm-recursive-directory-iterator.php on line 57

Deprecated: Return type of Ai1wm_Recursive_Directory_Iterator::rewind() should either be compatible with FilesystemIterator::rewind(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home1/greatsyd/public_html/staging1/wp-content/plugins/all-in-one-wp-migration-unlimited-main--old/lib/vendor/servmask/iterator/class-ai1wm-recursive-directory-iterator.php on line 35

Deprecated: Return type of Ai1wm_Recursive_Directory_Iterator::next() should either be compatible with DirectoryIterator::next(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home1/greatsyd/public_html/staging1/wp-content/plugins/all-in-one-wp-migration-unlimited-main--old/lib/vendor/servmask/iterator/class-ai1wm-recursive-directory-iterator.php on line 42

Deprecated: Return type of Ai1wm_Recursive_Extension_Filter::getChildren() should either be compatible with RecursiveFilterIterator::getChildren(): ?RecursiveFilterIterator, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home1/greatsyd/public_html/staging1/wp-content/plugins/all-in-one-wp-migration-unlimited-main--old/lib/vendor/servmask/filter/class-ai1wm-recursive-extension-filter.php on line 47

Deprecated: Return type of Ai1wm_Recursive_Extension_Filter::accept() should either be compatible with FilterIterator::accept(): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home1/greatsyd/public_html/staging1/wp-content/plugins/all-in-one-wp-migration-unlimited-main--old/lib/vendor/servmask/filter/class-ai1wm-recursive-extension-filter.php on line 37

Deprecated: Return type of Ai1wm_Recursive_Exclude_Filter::getChildren() should either be compatible with RecursiveFilterIterator::getChildren(): ?RecursiveFilterIterator, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home1/greatsyd/public_html/staging1/wp-content/plugins/all-in-one-wp-migration-unlimited-main--old/lib/vendor/servmask/filter/class-ai1wm-recursive-exclude-filter.php on line 41

Deprecated: Return type of Ai1wm_Recursive_Exclude_Filter::accept() should either be compatible with FilterIterator::accept(): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home1/greatsyd/public_html/staging1/wp-content/plugins/all-in-one-wp-migration-unlimited-main--old/lib/vendor/servmask/filter/class-ai1wm-recursive-exclude-filter.php on line 37

Deprecated: Return type of Ai1wm_Recursive_Newline_Filter::accept() should either be compatible with FilterIterator::accept(): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home1/greatsyd/public_html/staging1/wp-content/plugins/all-in-one-wp-migration-unlimited-main--old/lib/vendor/servmask/filter/class-ai1wm-recursive-newline-filter.php on line 28
Apache Spark – Greater Insights

Apache Spark

Course Overview

Apache Spark is an open-source, distributed computing system that is designed for fast, flexible, and expressive data processing. Spark is a general-purpose data processing engine that can handle a wide variety of data types and processing workloads, including batch processing, stream processing, machine learning, and graph processing. Training in Apache Spark typically covers the fundamental concepts and architecture of Spark, as well as how to develop and deploy Spark-based applications. It may also cover topics such as Spark streaming, Spark SQL, and Spark machine learning.

At the end of the training, participants will be able to:

  • 1.Understand the architecture of Spark and explain its business use cases
  •  
  • 2.Distribute, store, and process data using RDDs in a Hadoop cluster
  •  
  • 3.Use Spark SQL for querying DBs
  •  
  • 4.Write, configure, and deploy Spark applications on a cluster
  •  
  • 5.Use the Spark shell for interactive data analysis
  •  
  • 6.Process and query structured data using Spark SQL

Pre-requisite

Knowledge of Apache Hadoop ecosystem, SQL, Linux CLI and Scala is required.

Duarion

2 days

Course Outline

  1. What is Apache Spark?
  2. Starting the Spark Shell
  3. Using the Spark Shell
  4. Getting Started with Datasets and DataFrames
  5. DataFrame Operations
  1. Working with DataFrames and Schemas
  2. Creating DataFrames from Data Sources
  3. Saving DataFrames to Data Sources
  4. DataFrame Schemas
  5. Eager and Lazy Execution
  6. Analyzing Data with DataFrame Queries
  7. Querying DataFrames Using
  8. Column Expressions
  9. Grouping and Aggregation Queries
  10. Joining DataFrames
  1. RDD Overview
  2. RDD Data Sources
  3. Creating and Saving RDDs
  4. RDD Operations
  1. Writing and Passing
  2. Transformation Functions
  3. Transformation Execution
  4. Converting Between RDDs and DataFrames
  1. Querying Tables in Spark Using SQL
  2. Querying Files and Views
  3. The Catalog API
  4. Comparing Spark SQL, Apache Impala, and Apache Hive-on-Spark
  1. Apache Spark Applications
  2. Writing a Spark Application
  3. Building and Running an Application
  4. Application Deployment Mode
  5. The Spark Application Web UI
  6. Configuring Application Properties
  1. Review: Apache Spark on a Cluster
  2. RDD Partitions
  3. Example: Partitioning in Queries
  4. Stages and Tasks
  5. Job Execution Planning
  6. Example: Catalyst Execution Plan
  7. Example: RDD Execution Plan
  1. Data Processing
  2. Common Apache Spark Use Cases
  3. Iterative Algorithms in Apache Spark
  4. Machine Learning
  5. Example: k-means

Reviews