Faster Swarms of Data : Accelerating Hive Queries with Parquet Vectorization
-
Background
Apache Hive is a widely adopted data warehouse engine that runs on Apache Hadoop. Features that improve Hive performance can significantly improve the overall utilization of resources on the cluster. Hive processes data using a chain of operators within the Hive execution engine. These operators are scheduled in the various tasks (for example, MapTask, ReduceTask, or SparkTask) of the query execution plan. Traditionally, these operators are designed to process one row at a time.
The post Faster Swarms of Data : Accelerating Hive Queries with Parquet Vectorization appeared first on Cloudera Engineering Blog.
https://blog.cloudera.com/blog/2018/12/faster-swarms-of-data-accelerating-hive-queries-with-parquet-vectorization/
© Lightnetics 2024