Spark Optimization 2 with Scala

MP4 | Video: h264, 1280x800 | Audio: AAC, 44.1 KHz, 2 Ch
Genre: eLearning | Language: English | Duration: 24 Lessons (7h 53m) | Size: 1.25 GB
Go fast or go home.
Master Spark internals so your jobs go lasers blazing and your cluster pulls maximum weight.
In this course, we cut the weeds at the root. We dive deep into Spark and understand what tools you have at your disposal - and you might just be surprised at how much leverage you have. You will learn 20+ techniques and optimization strats. Each of them individually can give at least a 2x perf boost for your jobs (some of them even 10x), and I show it on camera.
You'll understand Spark internals to explain how Spark is already pretty darn fast
You'll be able to predict in advance if a job will take a long
You'll diagnose hag jobs, stages and tasks
You'll spot and fix data skews
You'll make the right tradeoffs between speed, memory usage and fault-tolerance
You'll be able to configure your cluster with the optimal resources
You'll save hours of computation in this course alone (let alone in prod!)
You'll control the parallelism of your jobs with the right partitioning
And some extra perks:
You'll have access to the entire code I write on camera (~1400 LOC)
You'll be invited to our private Slack room where I'll share latest updates, discounts, talks, conferences, and recruitment opportunities
(soon) You'll have access to the takeaway slides
(soon) You'll be able to the videos for your offline view
Deep understanding of Spark internals so you can predict job performance
stage & task decomposition
reading query plans before jobs will run
reading DAGs while jobs are running
performance differences between the different Spark APIs
packaging and deploying a Spark app
configuring Spark in 3 different ways
understanding the state of the art in Spark internals
leveraging Catalyst and Tungsten for massive perf
Understanding Spark Memory, Caching and Checkpointing
Tuning Spark executor memory zones
caching for speedy data reuse
making the right tradeoffs between speed, memory usage and fault tolerance
using checkpoints when jobs are failing or you can't afford a recomputation
Partitioning
leveraging repartitions
using coalesce to avoid shuffles
picking the right number of partitions at a shuffle to match cluster capability
using custom partitioners for custom jobs
Cluster tuning, fixing problems
allocating the right resources in a cluster
fixing data skews and straggling tasks with salting
fixing serialization problems
using the right serializers for free perf improvements
This course is for Scala and Spark programmers who need to improve the run and memory footprint of their jobs. If you've never done Scala or Spark, this course is not for you. I'll generally recommend that you take Spark Optimization 1 first, but it's not a requirement.
DOWNLOAD
uploadgig
https://uploadgig.com/file/download/048590F56e7392ca/4nUMhCoA__Spark_Opti.part1.rar
https://uploadgig.com/file/download/58bE5c9712da84Cf/4nUMhCoA__Spark_Opti.part2.rar
rapidgator
https://rapidgator.net/file/ccf21d1ba7873350a6a9a3a36f0b2402/4nUMhCoA__Spark_Opti.part1.rar
https://rapidgator.net/file/4a4852a1943ada3c60d9695d469e8da5/4nUMhCoA__Spark_Opti.part2.rar
nitroflare
http://nitroflare.com/view/8CBAF4F33603CEC/4nUMhCoA__Spark_Opti.part1.rar
http://nitroflare.com/view/39E80E84A4B76DD/4nUMhCoA__Spark_Opti.part2.rar


Information
Users of Guests are not allowed to comment this publication.