Next is a brief quick benchmark/comparison of different MySQL based storage engines I have been working lately for Big Data analytics. The comparison includes: disk space used, load time, query performance, as well as some comments. It is not intended as a formal benchmark.
During the last few days I have running out of disk space in my 2TB partition I use for my research experiments. On that partition, I mainly have a MySQL database with partitioned tables by week and over 2 years of web log and performance data. At first, I was comparing InnoDB vs MyISAM query performance and disk usage. MyISAM is a quite faster than InnoDB loading data, specially when DISABLING KEYS first, but then, reenabling the keys was a problem MyISAM on large tables. MyISAM doesn’t seem to scale well to a large number of partitions, while InnoDB does. An advantage of MyISAM tables besides fast loading, is that the tables occupy less disk space than InnoDB: InnoDB occupies about 40% more space than MyISAM for this type of tables, consisting of various numeric IDs and a couple of URLs. However, had many crashes with MyISAM having to repair tables many times. For data analysis that is annoyance but not a serious problem. Wouldn’t use MyISAM in production/OLTP servers, maybe if back in the early 2000′s…
Anyhow, after optimizing the configuration for both engines and having to choose between:
Decided to explore other non distributed file system options like Hadoop, with easy MySQL migration and found:
ARCHIVE: a compressed engine for MySQL, doesn’t support keys (except for the numeric primary key). Already familiar with it for backups and integrated into MySQL. Supports basic partitioning.
InfoBright ICE: a compressed column table storage, fork from MySQL, open source with fast loading. As cons, requires a different installation, and advanced features only in the commercial version.
InfoBright IEE: commercial version of the storage. Promesses multi-core improvements for query and loading over the open-source version. Decided to give it a try for comparison.