- Conceptual Overview
OntoQuad is an easy-to-install, compact, powerful RDF Store intended for dynamic aggregation of heterogeneous data from different domains in the global space of the Web of Data. OntoQuad provides a vector representation of quadruples using the SPOG pattern, where "S" stands for Subject, "P" for Predicate, "O" for Object, and "G" for Graph.
Logical Data Model¶
OntoQuad data architecture employs a variant of the column-oriented vertical-partitioning (COVP) method. The fundamentals of vertical partitioning and multi-indexing systems were taken from this approach. An RDF triple of the SPO type is indexed within a multiple-index construct that associates two value vectors with every RDF element, one vector per each of the two remaining RDF triple elements.
We created a database structure supporting certain permutations of a set of elements not only for SPO triples, but also for quads belonging within SPOG relations. We elaborate on the vector representation of triples by expanding it onto quadruple representation. This model consists of four levels corresponding to the positions of elements in a quad; on each of these levels a vector of the element values is generated.
The root level (level 1) contains a vector of all values of four elements S, P, O, G. Level 2 of the model contains groups of vectors where each group has one of the level 1 element values as its parent. For every element (S, P, O, G) admissible at the rst level there exist three vectors consisting of values corresponding to three admissible remaining elements of a quadruple (e.g., if an instance of element P is chosen at level 1, then value vectors of the S, O and G elements will conform to it at level 2). Two value vectors corresponding to the two admissible remaining elements of a quadruple will conform to an element value at the second level (e.g., if an instance of element P is chosen at level 1, and an instance of element S is chosen at level 2, then value vectors of the O and G elements will conform to it at level 3).
Figure 1 depicts a logical structure of the OntoQuad's quadruple representation vector model.
Fig 1. Logical storage structure of the OntoQuad's quadruple representation vector model
The main system components and their interaction when executing queries are shown in Fig 2.
Fig 2. OntoQuad Components
HTTP Server Application¶
The HTTP Server provides a SPARQL endpoint (SPARQL 1.1 Protocol). HTTP Server handles HTTP requests and sends back HTTP responses for SPARQL Protocol operations. It is implemented as a built-in module which hosts web applications, in particular, the database Management tool.
SPARQL Engine is an engine to interpret SPARQL queries against RDF data from the OntoQuad's database. It consists of the SPARQL Parser, Iterators, Optimizer, and Functions components
SPARQL Parser is responsible for the syntactic analysis of queries; it generates an initial Query Execution Plan (QEP) tree. The tree nodes contain operators implementing operations of SPARQL algebra operators SPARQL 1.1 Query Language. The main arguments of leaf operators are quad patterns and index files of the base. Higher-level operators in the execution plan tree may have one (e.g., Order by or Distinct ), two (e.g., such operators as join, Cartesian product) or more (as in the multiple join) lower-level arguments/operators.
In OntoQuad the operators are implemented by iterators. The Iterator is an object the interface of which includes such methods as empty() – check if the dataset is empty, next() – go to the next dataset record, lowerBound() – do logarithmic complexity search by ordered data, setRange() – do logarithmic complexity search of value ranges. The QEP tree is processed in the following way. To obtain the SPARQL operator execution result, the next() method of the root iterator is called. This root iterator, in its turn, will call the next() method of the next iterator occupying a lower place in the execution plan tree. The process will go on until the leaf iterators working with database index files are reached.
Optimizer is responsible for transforming the initial QEP into a new equivalent plan, more optimal in terms of its performance time and resource consumption. The Optimizer can choose one or another algorithm for instantiating relational algebra operators. When constructing a plan, the Optimizer uses knowledge about computational complexity and resource intensity of the iterators, as well as the Vocabulary and indices.
The Iterators and Optimizer interact with the Index Storage (indices), Vocabulary and Transaction Log.
Functions are either functions of the SPARQL language (e.g., coalesce, if, sameTerm and others, see SPARQL 1.1 Query Language) or other custom functions. The functions are used by the Iterators.
Vocabulary Application¶Vocabulary Application is a comprehensive lexicon of URI’s and literals that are "known" to the base which associates the values of S, P, O and G with their vocabulary ID’s that are unique within a DB instance. By introducing the Vocabulary we achieve:
- acceleration of the work with type instances placed in the Vocabulary during data write/read operations;
- reduction of space occupied by the DB index files on HDD and in memory due to the fact that a long value is stored in the Vocabulary as a unique copy, while indices use a short numeral reference to it.
Index Application provides an abstraction level over the Database component and implements methods of work with different index file configurations (different PSOG combinations) .
The Database Application implements the database index structure and performs the transaction management.
Transaction Log Application¶
Transaction Log Application records all database modifications made by each transaction. It records and replays (whenever necessary) the REDO, UNDO, COMMIT, ROLLBACK commands.
Basic Level Algorithms¶
Basic Level Algorithms implement basic low-level operations with indexes and database pages like binary search, sort and merge, data encryption, data compression.
Database Pages Cache¶
Database Pages Cache is an intermediate layer between the Database File Storage and Database Application. It stores the most recently used database pages (Compressed Database Page Cache and Uncompressed Database Page Cache) from the Database File Storage. There are two kinds of Compressed Database Pages, one for the Vocabulary and the other for indexes, as well as two kinds of Uncompressed Database Pages, one for the Vocabulary and the other for indexes.
Transaction Log Pages Cache¶
Transaction Log Pages Cache stores the most recently used pages of the Transaction Log files.
Database File Storage¶
Database File Storage consists of files that store information on indexes, Vocabulary, and Transaction Log. The indexes and Vocabulary store their objects in a B-tree.