Performance Analysis of MySQL Partition, Hive Partition-Bucketing and Apache Pig

Performance Analysis of MySQL Partition, Hive Partition-Bucketing and Apache Pig

Data is streaming and increasing at fast rate and it must be handled in a timely manner. Various factors contribute in increase of volume and varieties of data. Many applications like social media, banking, web search engines, financial services plays an important role in increasing data size. To handle this big amount of data some efficient tools are needed that provide quick response.  Hadoop framework provides such systems that can process data at fast pace .

MySQL partition concept provides partitioning of data (logical splitting of a table) that is transparent to user. Partitioning means to distribute different portions of a table and store as separate table in different locations. The required partitioning function is selected according to the type of partition needed. Different types of partitioning can be provided by MySQL like range, list, hashing and key. Apache Hive provides partitioning-bucketing concept that organizes tables into multiple partitions based on the values of partitioned columns such as country, state, date, city etc. When partitioning is done, it is easy to put a query in a portion of partitioned data. Tables or partitioned tables are further sub divided into buckets. This is an efficient concept in Hive to provide extra structure for efficient querying. Bucketing operates based on the hash functions of some columns of a table. In this concept, it is seen that MySQL takes more time as compared to partitioned column data when querying large data set. Partitioned data table takes less time for execution. Similarly, Hive bucketing takes less time as compared to Hive partitioned when a query is fired against a large data set.

 

Get In Touch 

Email- info@mtechthesis.in

Mob- +91-94175-45651


Click