Demystifying Spark Jobs to Optimize for Cost and Performance
-
Apache Spark is one of the most popular engines for distributed data processing on Big Data clusters. Spark jobs come in all shapes, sizes and cluster form factors. Ranging from 10’s to 1000’s of nodes and executors, seconds to hours or even days for job duration, megabytes to petabytes of data and simple data scans to complicated analytical workloads. Throw in a growing number of streaming workloads to huge body of batch and machine learning jobs —
The post Demystifying Spark Jobs to Optimize for Cost and Performance appeared first on Cloudera Engineering Blog.
https://blog.cloudera.com/blog/2019/04/demystifying-spark-jobs-to-optimize-for-cost-and-performance/
© Lightnetics 2024