Performance Tuning

Memory Distribution

When the OntoQuad starts, the RAM becomes re-distibuted in a certain way. Namely, the main part of the memory is allocated between:
  • static code of the program;
  • cache of the database pages; and
  • buffer to perform sorting data operations.

Volume of memory for the program static code

Volume of memory needed for a static RAM of the program code is a constant and takes about 50 Mb.

Volume of memory for the Uncompressed Database Pages Cache

For the "Uncompressed Database Pages Cache" http://support.ontoquad.ru/redmine/projects/ontoquad/wiki/Conceptual_Overview#Database-pages-Cache an amount of memory equal to a value of the cachesize parameter of the "configuration file" http://support.ontoquad.ru/redmine/projects/ontoquad/wiki/Configuration_Parameters is allocated. The cachesize parameter value is specified in bytes.
Example:

cachesize = 11811160064

Volume of memory for the Compressed Database Pages Cache

For the "Compressed Database Pages Cache" http://support.ontoquad.ru/redmine/projects/ontoquad/wiki/Conceptual_Overview#Database-pages-Cache an amount of memory equal to a value of the compressed-page-cachesize parameter of the "configuration file" http://support.ontoquad.ru/redmine/projects/ontoquad/wiki/Configuration_Parameters is allocated. The compressed-page-cachesize parameter value is specified in bytes.
Example:

compressed-page-cachesize = 13958643712

A size of the database in the compressed mode is obviously decreased. An actual database compression ratio depends on the nature of loaded data. For example, in case of the "Berlin SPARQL Benchmark (BSBM)" http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/ for a 100-million dataset and provided that the index-type parameter is set to polymorphic2 (see "configuration file" http://support.ontoquad.ru/redmine/projects/ontoquad/wiki/Configuration_Parameters), the ratio is about 2.

Volume of memory for data sorting

In order to perform the sorting of datasets, the DBMS reserves some amount of RAM. This RAM buffer size is determined by the max-inmemory-rows configuration parameter.
Example:

max-inmemory-rows = 1000000

Memory Configuration Recommendations

If memory configuration parameters are set properly, the OntoQuad performance improves. Let's consider different configurations of these parameters depending on the database size, available RAM and query nature.

Database size evaluation

The OntoQuad database size can be evaluated as equal to the size of the vm folder that contains all database files. Thus, for a Linux installation this folder's path normally is /opt/eventos/ontoquad056/vm.
Example:

# du -h /opt/eventos/ontoquad056/vm

Performance test configurations

The OntoQuad performance was tested on the Berlin SPARQL Benchmark Explore Use Case. The benchmark query mix of the BSBM Explore use case illustrates a search-and-navigation pattern. The resulting performance was evaluated by the Query Mixes per Hour (QMpH) metric for datasets with 100 million (100M) and 200 million (200M) triples. The test measurements were done for 1, 4, 8 and 64 parallel clients.

The OntoQuad compressed database size on the disk for the BSBM 100M dataset is 11 GB, the test server RAM was limited by 22 GB, and the test scenario was to measure the performance with different values of the cachesize and compressed-page-cachesize parameters assuming that the cachesize and compressed-page-cachesize parameters make 22 GB in total. Starting from 2 GB of the cachesize parameter and 20 GB of the compressed-page-cachesize parameter, we got 10 different configurations shown in the table below.

configuration cachesize compressed-
page-
cachesize
cachesize/compressed-
page-
cachesize
ratio
GB bytes GB bytes
0 2 2147483648 20 21474836480 2/20
1 4 4294967296 18 19327352832 4/18
2 6 6442450944 16 17179869184 6/16
3 8 8589934592 14 15032385536 8/14
4 10 10737418240 12 12884901888 10/12
5 12 12884901888 10 10737418240 12/10
6 14 15032385536 8 8589934592 14/8
7 16 17179869184 6 6442450944 16/6
8 18 19327352832 4 4294967296 18/4
9 20 21474836480 2 2147483648 20/2

Performance evaluation results

The tests were run on 100 and 200 million triples from the BSBM test datasets.

BSBM dataset test: 100 million triples

As previously mentioned, the 100M BSBM compressed database size is 11 GB. Let's assume that the uncompressed database size for the BSBM 100M dataset would be roughly 22 GB. It should be noted that under these conditions the database fits differently into compressed and uncompressed memory caches depending on a chosen configuration.

The performance evaluations for different cachesize to compressed-page-cachesize ratios and numbers of clients working with the database simultaneously are shown in Tab. 2 and Fig. 1.

Tab. 2. Performance evaluation results
2/20 4/18 6/16 8/14 10/12 12/10 14/8 16/6 18/4 20/2
1 mt 50 839,33 49 701,73 51 288,05 49 857,27 47 970,02 46 584,63 40 691,61 39 574,96 24 995,40 39 334,18
4 mt 179 195,32 174 615,03 176 748,60 174 418,97 175 675,37 161 654,62 141 692,94 130 723,53 89 033,59 140 450,97
8 mt 241 808,65 231 114,89 235 010,50 232 496,46 231 122,27 219 086,06 196 406,15 186 176,11 133 645,73 181 816,64
64 mt 284 091,82 283 687,27 260 868,76 274 421,35 276 020,52 233 498,29 230 055,83 171 943,15 146 305,72 161 767,50

Fig. 1. Performance evaluation results for different configurations

BSBM dataset test: 200 million triples

The 200M BSBM compressed database size is 22 GB. The uncompressed database size for the 200M BSBM dataset would be around 44 GB. As already stated, the database fits differently into compressed and uncompressed memory caches depending on a chosen configuration.

The performance evaluations for different cachesize to compressed-page-cachesize ratios and numbers of clients working with the database simultaneously are shown in Tab. 3 and Fig. 2.

Tab. 3. Performance evaluation results
2/20 4/18 6/16 8/14 10/12 12/10 14/8 16/6 18/4 20/2
1 mt 30 569,69 30 171,37 29 491,28 26 154,15 23 710,05 21 209,30 19 346,05 18 755,89 12 757,27 19 375,60
4 mt 113 157,64 104 080,96 99 980,48 88 460,32 80 560,41 64 705,61 57 764,23 55 675,83 42 204,92 56 750,26
8 mt 147 921,63 145 503,63 139 522,03 124 153,31 112 866,64 90 572,59 77 504,98 76 235,40 63 573,78 75 543,92
64 mt 176 734,26 168 953,35 156 245,01 148 104,03 134 320,88 114 336,79 105 198,49 100 356,97 95 881,19 111 619,40

Fig. 2. 200M dataset performance evaluation results

Analysis of experimental data

As the experiment has shown (Tab. 2, Fig. 1), when the database size in the compressed mode (compressed-page-cachesize is not equal to -1) is less or equal to the available RAM, the best strategy would be to set up a maximum value for the compressed-page-cachesize parameters.

If the database size exceeds the available RAM, the performance is in direct proportion to a value of the compressed-page-cachesize parameter (Tab. 3, Fig. 2). An optimal performance level was reached for the 2/20 ratio of the uncompressed-to-compressed database page caches.

cache-performance 200M.png (16.1 KB) Grigory Drobyazko, 2014-03-19 07:30

cache-performance 100M.png (14.5 KB) Grigory Drobyazko, 2014-03-24 11:49