Real performance test Exasol vs SAP HANA vs Greenplum vs ClickHouse vs Impala vs MemSQL
ATK
News
Real performance test Exasol vs SAP HANA vs Greenplum vs ClickHouse vs Impala vs MemSQL
When choosing an analytical database, of course, you should test on your own data and queries that your users will ask. But to create a short list of DBMSs for testing, everyone wants to see real data from other people's tests. The guys from the Data Warehouse Department of Tinkoff Bank shared their in-memory (and not only) database testing results and observations on Hubra. Today, here is a brief extract of their performance tests of Exasol, Greenplum, ClickHouse, SAP HANA, MemSQL and Cloudera Impala.
DBMS requirements and use case
The current Greenplum database-based storage was not very satisfactory in terms of query execution speed. DBMS testing was performed to select a front-end storage database for a selective data set (up to 4 TB, with a possible increase). The target database had to have the following functionality:
-
column-by-column data storage,
-
horizontal scalability,
-
the ability to perform local joins and use the "correct" distribution key in tables (ad-hoc user queries in 90% of cases are Select with joins (from 1 to 10) based on equality conditions and, sometimes, on the conditions of dates within the interval),
-
efficient work with cache and large amount of available memory,
-
Good integration with the BI-system SAP Business Objects (it will access the database, in addition to ad-hoc user queries)
-
Reliable, preferably incremental, data import from Greenplum (this is Tinkoff's current main data warehouse and data will be loaded into the database from it).
-
Window functions,
-
Redundancy (ability to store multiple copies of data on different nodes),
-
Ease of further expansion of the cluster,
-
Parallel data loading.
Infrastructure for testing
Two physical servers were allocated for each database under test:
-
16 physical cores (32 with HT)
-
128 GB of RAM
-
3.9 Tb of disk space (RAID 5 of 8 disks)
-
The servers are connected by a 10 Gbit network.
-
The operating system for each database was chosen based on the database installation recommendations. The same applies to OS settings, kernel, etc.
Performance test results
The size of the test data is 200 GB in uncompressed form and a little more than 522 million rows. The most resource-intensive queries are D1 and D2, the Cartesian product of one and several columns, respectively. A detailed description of the data and test queries can be downloaded here.
The results of comparative testing of Exasol, Greenplum, ClickHouse, MemSQL, SAP Hana, Cloudera Impala databases are in the table below. The time is specified in seconds of query execution.
Dmitry Pavlov, Head of Data Warehouse Administration at Tinkoff Bank, comments: "About the number of test query runs: the first execution of the query was taken everywhere. At the same time, we experimentally found out that the first and second executions differ significantly in time only in Exasol - the second time it works much faster due to the indexes built at the first run (however, even the time of the first execution in Exasol is always much less than the second of all other databases). Example:
Of course, it would be more correct to execute each query 5-10 times, discard the best and the worst and take the average of the remaining ones, but then testing would take longer. Besides, since we expect to execute ad-hoc user requests on this base, the time of the first execution is more interesting to us.
Comparison of DBMSs by key criteria for Tinkoff
I will also publish a table comparing databases by the criteria chosen by the bank's team. This data is from 2016, the product version is specified, most of the criteria are still applicable.
If you need an Exasol test license with parameters exceeding 200 GB of raw data (as in Exasol Community Edition), please contact us, ATK Consulting Group, the official Exasol partner in Russia - we will issue a test license and help you with a pilot project.
Please contact: consult@atkcg.ru or +7 (495) 937 16 50.
Source: Comparison of analytical in-memory databases
Industries: Exasol