ScienceDirect ProcediaScienceDirect Computer Science 00 (2016) 000–000 Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2016) 000–000
ScienceDirect
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Procedia Computer Science 109C (2017) 975–981
The 7th International Symposium on Frontiers in Ambient and Mobile Systems (FAMS 2017) The 7th International Symposium on Frontiers in Ambient and Mobile Systems (FAMS 2017)
Integrity constraints in graph databases Integrity constraints graph a, b databases Jaroslav Pokorný , Michalin Valenta , Jiří Kovačičb Jaroslav Pokorný , Michal Valenta , Jiří Kovačič
a, University, Malostranské nám. b 25, 118 00 Prague,bCzech Republic Faculty of Mathematics and Physics, Charles b Faculty of Information Technology, Czech Technical University, Thákurova 9, 160 00 Prague, Czech Republic
a
Faculty of Mathematics and Physics, Charles University, Malostranské nám. 25, 118 00 Prague, Czech Republic b Faculty of Information Technology, Czech Technical University, Thákurova 9, 160 00 Prague, Czech Republic
a
Abstract One thing that is still being developed for graph databases is integrity constraint (IC) support. One possibility to IC proposal is to Abstract consider a graph conceptual schema and a graph database schema. At least inherent ICs coming from a graph conceptual schema should be that considered as explicit ICs on graph databases level, i.e., using a DDL. In the paper, we focustoonICgraph database One thing is still being developed forthe graph databases is integrity constraint (IC) support. One possibility proposal is to Neo4j and its possibilities expressand a database ICs. We extend these possibilities new constructs Neo4j consider a graph conceptualtoschema a graph schema databaseand schema. At least inherent ICs coming through from a graph conceptualinschema DDL their prototype implementation and databases experiments. shouldincluding be considered as explicit ICs on the graph level, i.e., using a DDL. In the paper, we focus on graph database Neo4j and its possibilities to express a database schema and ICs. We extend these possibilities through new constructs in Neo4j 1877-0509 © 2017 The Authors. Published by Elsevier B.V. Keywords: graph databases; modelling graph databases; conceptual schema; graph database schema; integrity constraints; Neo4j DDL including their prototype implementation andgraph experiments. Peer-review under responsibility of the Conference Program Chairs. Keywords: graph databases; modelling graph databases; graph conceptual schema; graph database schema; integrity constraints; Neo4j
1. Introduction
1. Introduction Graph databases (GDB), also called graph-oriented databases, are used for managing highly connected data and complex queries over this data. Not only data values but also graph structures are involved in queries. Specifying a Graph databases (GDB), points, also called databases, are used for managing highly connected data and pattern and a set of starting it is graph-oriented possible to reach an excellent performance for local reads by traversing the complex queries over this data. Not only data values but also graph structures are involved in queries. Specifying graph and collecting and aggregating information from nodes and edges. On the other hand, there are somea 8 pattern and of a set of starting points, itthey is possible to reach an excellentsince performance local reads bytools traversing the . For example, are usually not consistent, they haveforvery restricted to ensure limitations GDBs graph and collecting and aggregating information from nodes and edges. On the other hand, there are some consistency. 8 . Forone example, they are not consistent, haveincludes, very restricted tools to limitations GDBs A GDB of can contain (big) graph or ausually collections of graphs.since Thethey former e.g., graphs of ensure social consistency. networks, Semantic Web, geographical databases, the latter is used in scientific domains such as bioinformatics and A GDBorcan contain (big) Graph graph search or a collections of graphs. The former includes, e.g., graphs of social chemistry datasets likeone DBLP. occurs in other application scenarios, like recommender systems, networks, Semantic Web, geographical databases, the latter is used in scientific domains such as bioinformatics and chemistry or datasets like DBLP. Graph search occurs in other application scenarios, like recommender systems,
Corresponding author. Tel.: +420-951-554-265; fax: +420-951-554-323. E-mail address:
[email protected]
Corresponding author. Tel.: +420-951-554-265; fax: +420-951-554-323. 1877-0509 © 2016
[email protected] The Authors. Published by Elsevier B.V. E-mail address: Peer-review under responsibility of the Conference Program Chairs.
1877-0509 © 2016 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs.
1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs. 10.1016/j.procs.2017.05.456
2976
Jaroslav Pokorný et al.Computer / ProcediaScience Computer Science000–000 109C (2017) 975–981 Author name / Procedia 00 (2015)
complex object identification, software plagiarism detection, and traffic route planning. Following a similar concept in database technologies, we will talk about Graph Database Management Systems (GDBMS). An important part of GDB technology is modelling graph databases. Most graph data models use directly a structure of directed graphs or property graphs. On the other hand, graph databases are normally schema-less which means that there are no restrictions on data stored in graph structures. Graph structures are also very general, they are restricted only by basic graph definitions. However, customers working with graph databases would like to use some techniques that are available in the relational databases that can restrict a database schema and support checking the database data. This kind of restriction is called integrity constraints (IC). One approach to ICs in GDBs can be through appropriate schemas proposed on a conceptual and/or database level as it is usual in traditional database. Both graph conceptual schema and graph database schema can provide effective communication medium between users of any GDB. They can also significantly help to GDB designers. But a general approach to GDBs does not require the notion of database schema at all. Strict application of schemas is sometimes considered disadvantageous to application’s developers particularly for dynamic domains, where the data structures are changing very often1. Consequently, many GDBMSs are schema-less, e.g. Neo4j11. OrientDB (http://orientdb.com/) even distinguishes possibilities: schema-full, schema-less, and schema-hybrid. Even schema-less, some GDBMSs support to specify some types of ICs. In the paper, we focus on Neo4j and its possibilities to express ICs. Neo4j uses for this purpose a declarative (query) language over property graphs Cypher. Cypher commands have partially SQL syntax and are targeted at ad hoc queries over the graph data. They enable also to create graph nodes and relationships. Our goal is to extend the Cypher with new functionality supporting more ICs possibilities. The rest of the paper is organized as follows. Section 2 introduces a graph database model based on (labelled) property graphs. The notions of graph conceptual schema and graph database schema are introduced here including some ICs. In Section 3, we introduce an extended syntax for some proposals of ICs for Cypher language. An implementation of some ICs and related experiments are described in Section 4. Section 5 gives the conclusion. 2. Modelling of graph databases A foundation step of any database technology is the precise definition of a database model. Here we will use a (labelled) property graph model whose basic constructs include: entities (nodes), properties (attributes), labels (types) of nodes and relationships. Relationships (edges) have a direction, start node and end node, and identifiers. Entities and relationships can have any number of properties. Both nodes and edges are defined by a unique identifier (Id). Properties can be expressed in the form key:domain, i.e. only single-valued attributes are considered, although, e.g., Neo4j supports more complex property values. In graph-theoretic notions we also talk about labelled and directed attributed multigraphs. It means the edges of different types can exist between two nodes. These graphs are used both for GDB and its database schema (if any). 2.1 Graph conceptual and graph database schemas Current commercial GDBMSs need more improvements to meet traditional definitions of conceptual and database schema known, e.g., from the relational databases world. The graph database model is usually not presented explicitly, but it is hidden in constructs of a data definition language (DDL) which is at disposal in the given GDBMS. These languages also enable to specify some simple ICs. Conceptual modelling of graph databases is not used at all. An exception is the GRAD database model4, which although schema-less, uses conceptual constructs occurring in E-R conceptual model and some powerful ICs. Based on the Oracle Designer CASE2, we used a binary E-R model as a variant for graph conceptual modelling9 considering strong entity types, weak entity types, relationship types, attributes, identification keys, partial identification keys, ISA-hierarchies, and min-max ICs (cardinalities). Graphical min-max ICs (see, Fig. 1) can be expressed equivalently by expressions (E1:(a, b), E2:(c, d)), where a, c ϵ{0, 1}, b, d ϵ{1, n}, and n means “any number greater than 1”. As usual, only single-valued attributes of form key:domain are considered here. Schemas in Fig. 1 and Fig. 2, respectively, are depicted without attributes.
Jaroslav Pokorný et al. / Procedia Computer Science 109C (2017) 975–981 Author name / Procedia Computer Science 00 (2015) 000–000
977 3
A correct graph conceptual schema may be mapped into an equivalent (or nearly equivalent) graph database schema with the straightforward mapping algorithm9, but with a weaker notion of a database schema, i.e. some inherent ICs, e.g., min-max ICs from the conceptual level have to be neglected to satisfy usual notation of labelled and directed attributed multigraphs. Consequently, we can propose several different graph database schemas from a graph conceptual schema. For example, the edges Teaches and Is_born_in in Fig. 2 provide only a partial information w.r.t. the associated the schema in Fig. 1. The inverted arrow Is_taught_by could be used as well. Due to the loss of the inherent ICs on the conceptual level, we should put some explicit ICs into the graph database schema, e.g., that “A teacher can teach more languages” and “A teacher is born in exactly one town”, e.g., in minmax style, e.g., (0,n):(1,m) for Teaches and (1,1):(0,n) for Is_born_in. Some of these ICs are definable in DDLs of todays (possibly schema-less) GDBMSs. Attribute values can be represented in key:value style in a GDB. In practice, their values can miss in some GDB nodes or relationships. Language
Language
Is_taught Teaches Teacher
Teaches Is_a
Person
Is_born_in
Has
Is_in
Is_a
Person
Is_born_in
Is_birthplace_of Town
Teacher
Street
Fig. 1. Graph conceptual schema
Town
Has
Street
Fig. 2. Graph database schema
2.2 Integrity constraints in GDBMSs With a graph database schema, schema-instance consistency is required1. As in traditional databases, ICs provide a mechanism for capturing the semantics of the domain of interest represented by graphs. In the database area we usually distinguish three types of ICs. Inherent constraints are inherent to the data model itself, they do not need to be specified explicitly in the schema but are assumed to hold by the definition of the model constructs. There are at least two inherent ICs in our graph model: (1) Node IDs in a GDB are unique. (2) Edges of the GDB are composed of the labels and nodes of the database graph in which the edge occurs. An explicit constraint is any constraint that can be formulated in a DDL for GDB. Sometimes also cardinalities of relationships are explicitly stated. Obviously, a goal is to develop a sufficiently expressive language for formulation of explicit ICs. Such languages are not yet common in the commercial GDBMSs. Implicit constraints are logical consequences of inherent and explicit ICs. Another ICs concern property values both of nodes and edges. They include some domain constraints for particular properties, and possibly logical restrictions for their mutual relationships. However, GDBs are well-suited for situations in which the data complexity is contained in the relationships between the entities rather than in the property values associated with single nodes and edges. Due to the graph structure of data in graph database, associated explicit ICs can have also a graphical form (see, e.g., work9). 2.3 Examples of integrity constraints in GDBMSs If an IC is enabled, data is checked as it is entered or updated in the database, and data that does not conform to the constraint is prevented from being entered. Relatively easy is the case when an IC concerns a node and its neighbours. More complex can be to verify a structural restriction. Šestak et al.12 discuss ICs in GDBs and technical implementation issues that prevent these constraints from being specified. We present IC examples from three wellknown GDBMSs.
978 4
Jaroslav Pokorný et al. / Procedia Computer Science 109C (2017) 975–981 Author name / Procedia Computer Science 00 (2015) 000–000
Titan: GDBMS Titan (http://titan.thinkaurelius.com/) enables to define some ICs in the graph schema definition, e.g. cardinality settings for node and edges properties, to distinguish simple edges and multiedges, and 1:1, 1:N, and N:1 cardinalities, which refer to various cases of min-max constraints (E1:(0, b), E2:(0, d)), where b, d 1. OrientDB: In GDBMS OrientDB (http://orientdb.com/orientdb/) there are some possibilities for properties restrictions. They can be constrained by ICs: Minimum Value: setMin(), Maximum Value: setMax(), Mandatory: setMandatory(), Read Only: setReadonly(), Not Null: setNotNull(), Unique. Neo4j: Although a schema-less GDBMS, so called schema is there a persistent database state that describes available indexes and enabled ICs for the data graph, i.e., GDB. Neo4j allows an optional schema for the graph, based around the concept of labels. Labels are used in the specification of indexes, and for defining ICs on the graph. ICs can be applied to either nodes or relationships. Together, indexes and ICs are the schema of the graph. Neo4j does not use the labelled property graph model in its simplest version. For example, it is possible to put zero, one, or more labels to nodes a GDB. Also property values can be not only simple, but, e.g., lists. Neo4j supports unique node property constraint which means that it is possible for a certain labelled node with a particular property assert its unique value, but it is not supported covering multiple properties simultaneously yet, and also uniqueness is not supported for the relationship properties, too. Also, the IC feature called property existence constraint is at disposal. This IC lets us to specify a rule which will watch if the property is satisfied, i.e. must be set in a given node or relationship. This is an analogy to NOT NULL in the relational databases. Suppose nodes with the label Teacher. Then the following ICs can be specified: CREATE CONSTRAINT ON (teacher:Teacher) ASSERT teacher.#T_ID IS UNIQUE, CREATE CONSTRAINT ON (teacher:Teacher) ASSERT exists(teacher.Birth_year), i.e. all nodes with the label Teacher have the properties #T_ID and Birth_year. CREATE CONSTRAINT ON ()-[teaches:Teaches]-() ASSERT exists(teaches.Room), i.e. all relationships with the label Teaches have the property Room. 3. Our proposal to Cypher syntax extension for integrity constraints Section 3 and section 4 provide a shortened presentation of main results of our attempt to design a consistent and complete syntax for ICs declaration, maintenance, and implementation in Cypher language. The complete analysis and design decisions can be found in the work 6. A working proof of concept implementation is available on GitHub as well (see Section 4). Some proposals of ICs coming from Internet discussions have been critically studied and improved into a new powerful Cypher syntax6. We will consider the following (rather minimal) range of ICs: Node property uniqueness. Every node has a certain attribute, called a property key, with a value. Thus, an option for the node must exist that has the particular value for the given property key. Moreover, we need to allow a particular combination of values for the particular set of attributes composing a property key. An example of the latter could be the couple {firstName, lastName} in some applications. The disadvantage of today’s IC definition in Cypher is that it is not possible to assert more properties simultaneously. In other words, there does not exist the unique IC covering multiple properties simultaneously. Mandatory properties. Mandatory property constraints should be applied for both the nodes and relationships. This IC specifies for a node with a particular label that must have assigned a value for a given property. The same holds for a relationship with a particular type. For example, the edge Is_born_in can have a property Date. Each relationship of type Is_born_in must have filled a value for a mandatory property Date. The property existence constraints are an example of mandatory property IC in today’s Neo4j. Property value limitations. This IC is a new proposal which we consider in Neo4j. This IC specifies the type of a property value, e.g., Boolean, List, String restricted by a regular expression, and Number. Required relationships. This IC defines for a certain node with a particular label that must have one or more relationships with a particular type. For example, each node with a specified label Teacher must have an outgoing relationship of type Is_born_in. The verification of relationships should be of three direction types: outgoing, incoming, and in any direction.
Jaroslav Pokorný et al. / Procedia Computer Science 109C (2017) 975–981 Author name / Procedia Computer Science 00 (2015) 000–000
979 5
Cardinality requirements. These ICs are closely related to the required relationships ICs, but instead of requiring a relationship with a certain type to a particular node, this IC specifies the cardinality. The cardinality is the minimum and maximum number of relationships with a certain type that a given node with a particular label must have contain. Under this concept we can imagine a defined restriction imposed on a number of relationships that a certain node can have mapped. For example, each node with a specified label Person must have exactly two in-coming relationships of type Parent_of. Endpoint requirements. This kind of IC specifies a relationship with a particular type that must, or must not, start or end in the certain nodes with a particular label, or sets of labels. For example, each relationship with the type Owns must start at a node with either a Person label or an Organization label. The IC should be able to define a relationship rule which requires a particular node with a certain label to act as a start node. This rule should be applicable for more nodes too, where these nodes must also behave as start nodes. Label coexistence. The label coexistence IC specifies that two certain node labels may not occur on the same node, or that a particular label may only occur on nodes with a different particular label. For example, we can specify that it is possible to create the node either with the label Person or the node with the label Organization, i.e. not with both. Or we define the rule that a node with a label User can only exist in a combination if a node with a label Person exists too and thus the node must contain the label Person and then it can contain the label User. These ICs reflect restrictions coming from a conceptual view on graph data with an associated conceptual model behind. But, we can see that such a model is not as strict as the E-R variant used in Chapter 2. Particularly endpoint requirements and label coexistence allow more. One label can be used for more different node types. 4. Implementation and experiments What the proposal in Section 3 lacks is any discussion how should look like the appropriate implementation, thus, what are the technical issues resulting from these ICs. Our proof of concept (POC) solution includes:
introduction of appropriate ICs metadata into Neo4j, design of new Cypher statements for ICs definition and manipulation, location of convenient Neo4j API to bind our ICs enforcement solution, IC checking logic, a couple of experiments in order to proof the solution is practically usable.
Details of our solution are available in the work6. The source code of POC can be downloaded from GitHub (https://github.com/JiriKovacic/constraints). Metadata about ICs in Neo4j. Each DBMS needs to operate with some kind of metadata, i.e. a data dictionary. Its data model is usually the same as the data model of DBMS, i.e. set of relations in the case of relational DBMS. Similarly, we designed appropriate part of ICs metadata in Neo4j as a graph. There is a ICs metadata top node as an access point with two followers - one serves as a node constraints holder, second as a relationship constraints holder. Each constraints holder contains a node representing appropriate constraint type template. It is used as template for a new constraint creation. ICs definition/manipulation statements. Thanks to the fact that the data dictionary is a graph, it is possible to design a constraint language in a way it smoothly fits into Cypher language conception. Moreover, our design follows the same principle used in SQL: DDL processing means using a DML on appropriate tables of the data dictionary. (1) CREATE CONSTRAINT (name:'ic_name') ON (PATTERN) ASSERT ACTION(properties) OPTIONS(enable:'f VALIDATE|NOVALIDATE g', validation:'f IMMEDIATE|DEFERRED g', delete: 'f RESTRICT|CASCADE g', update: 'f RESTRICT|CASCADE g', final: 'f FALSE|TRUE g'); (2) MATCH (all_constraints) WHERE name = 'ic_name' SET (PATTERN) ASSERT ACTION(properties)
980 6
Jaroslav Pokorný et al. / Procedia Computer Science 109C (2017) 975–981 Author name / Procedia Computer Science 00 (2015) 000–000
OPTIONS(enable:'f VALIDATE|NOVALIDATE g', validation:'f IMMEDIATE|DEFERRED g', delete: 'f RESTRICT|CASCADE g', update: 'f RESTRICT|CASCADE g', final: 'f FALSE|TRUE g'); (3) DROP (all_constraints) WHERE name = 'ic_name'; (4) DISABLE (all_constraints) WHERE name = 'ic_name'; (5) ENABLE (all_constraints) WHERE name = 'ic_name'; The first three statements allow to create, modify and drop IC. The last two allow enable and disable IC. Again, this feature is inspired by RDBMS, namely SQL:2003. We found it very practical. Inspiration of RDBMS approach can be seen also in ICs properties, especially ON DELETE, ON UPDATE, and IMMEDIATE/DEFERRED. Nonstandardized constraint properties include clauses VALIDATE/NOVALIDATE. Their meaning is obvious. The DDL design for ICs described in Section 3 is complete, on the other hand, our POC implementation covers only its part. The logic of our ICs enforcement implementation is obvious: we have to find appropriate point in Neo4j core API to bind the procedure which checks that ongoing data changes do not violate defined and enabled ICs. Fortunately, there is a TransactionEventHandler containing beforeCommit method in the Noe4j core API. Let us note, that the whole POC was realized on Neo4j Version 2.3. Experiment. To test the GDB we used the Movie database containing 12k movies and 50k actors. We used a Cineats database together with the Spring framework extension written by M. Hunger which is available as a GitHub project (https://github.com/neo4j-examples/sdn4-cineasts/wiki). This benchmark was one between few of available at the time of the work was done, and it was recommended by Neo4j community for such purposes. The Movie database stores data about movies, actors who played in a specific movie and their roles, users and given ratings for movies in their area of interest. The database contains exactly 106 651 relationships and 63 042 nodes, respectively. The aim of POC is to demonstrate usability of proposed solution, not the complete and effective implementation of all considered ICs. The selection of really implemented features was partially affected by benchmark data we used and by the method we decided to use for measurement (see below). We implemented following three constraint types and tried them on a number of nodes and properties of the Movie database. uniqueness on node property on one attribute - Name property for nodes of types User and Director, uniqueness on node property on couple of attributes - (name, id) nodes of types User and Director, mandatory property value - Name property for each node of type User, Director, and Actor and Title property for each node of type Movie.
Fig. 3. Unique validation - comparison
The strategy of measurement is as follows: we setup constraints on existing database and then enable them with option VALIDATE (i.e. the IC is checked for all data already stored in database) and measure time necessary for constraint validation. Our results concerning the first two constraints are shown in Fig. 3. Two data sets were used: one satisfying the constraint and one with constraint violation. ICs were checked on two types of nodes – User and
Jaroslav Pokorný et al. / Procedia Computer Science 109C (2017) 975–981 Author name / Procedia Computer Science 00 (2015) 000–000
981 7
Director. The scale of time axis is exponential, real values (in milliseconds) are shown in the table in bottom part of the figure. Measurements of mandatory property enforcement is not presented here due to lack of space, but it is available in Section 5.1 of the work6. Let us shortly comment our POC and measurement. We should differentiate between two modes of ICs checking: 1. normal operation where constraint violation is checked against data changes during DML statements processing, 2. definition of new or enabling existing constraints with validate option. Our measurement follows the second scenario. The measurement was done with the processor Intel i7, 2.3GHz, 4Core, RAM 8GB, and OS MS Windows 10. Concerning the first scenario, a detail analysis of asymptotic complexity of implementation of all proposed ICs is presented in Section 2.46. However, its effective implementation highly depends on existence and usage of indexes. Nowadays, some effective indexes for nodes and edges already exist in GDB implementations (see, e.g., evaluation7), while structure-based indexes, which may be very useful mainly for relationship-based ICs are yet rather the subject of research (see e.g., works3,10,13). 5. Conclusions The objective of this paper was to provide possibilities of considering inherent ICs coming from conceptual modelling of GDB. Although GDBs are usually designed directly in a DDL, constructs for defining such ICs explicitly occur only rarely. We analysed these ICs and also some others useful for GDBs. We proposed an adequate extension of the Cypher language from Neo4j GDBMS. Let us note, that our implementation is really naive and there is a lot of space to improve it, especially extensive usage of node indexes. On the other hand, we believe, we proved our approach is practically usable and the method is general enough to cover large range of practical needs of ICs enforcement. Certainly, we can observe that the ICs considered use the important feature of GDB, i.e., their property called index-free adjacency, meaning that every node is directly linked to its neighbour nodes, i.e. a native graph processing is enabled. With Cypher indexes a relatively powerful mechanism is at disposal for both querying and ICs testing. But for more complex ICs, it is not enough. A more powerful index framework is necessary in some cases. For example, computing label-constraint reachability in GDB5 is of this type. Acknowledgements This work was partially supported by the Charles University project Q47. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Angels R. A Comparison of Current Graph Database Models. In: IEEE 28th Int. Conf. on Data Engineering Workshops, 2012; p. 171-177. Barker R. Case*Method: Entity Relationship Modeling. Addison-Wesley Publ. Comp; 1990. Gani A, Siddiqa A. Shamshirband S. Hanum F. A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst 2016; 46:241–284 Ghrab A, Romero O, Skhiri S, Vaisman A, and Zimányi E. GRAD: On Graph Database Modeling. Cornel University Library, arXiv:1602.00503; 2014. Jin R, Hong H, Wang H, Ruan N, Xiang Y. Computing Label-Constraint Reachability in Graph Databases. In: SIGMOD '10 Proc. of the 2010 ACM SIGMOD International Conference on Management of data. ACM 2010; p. 123-134. Kovačič J. Schema enforcement in a schema-free graph database I. Master’s thesis Czech Technical University in Prague, Faculty of Information technology, 2016. https://dspace.cvut.cz/handle/10467/65169 Mpinda SAT, Ferreira LC, Ribeiro MX, Santos MTP. Evaluation of Graph Databases Performance through Indexing Techniques. International Journal of Artificial Intelligence & Applications (IJAIA) Vol. 6, No. 5. 2015` p. 87-98 Pokorný J. Graph Databases: Their Power and Limitations. In: Proc. of 14th Int. Conf. on Computer Information Systems and Industrial Management Applications (CISIM 2015), Saeed K, Homenda W editors. LNCS 9339, Springer; 2015. p. 58-69. Pokorný J. Conceptual and Database Modelling of Graph Databases. In: Proc. of IDEAS’ 16, B. Desai B editor. ACM; 2016. p. 370-377. Ramba J. Indexing of patterns in graph DB engine neo4j II. Master’s thesis, Czech Technical University in Prague, Faculty of Information technology, 2016. https://dspace.cvut.cz/handle/10467/63010 Robinson I, Webber J. Eifrém E. Graph Databases. O’Reilly Media` 2013. Šestak M, Rabuzin K, Nova M. Integrity constraints in graph databases – implementation challenges. In: Proc. of Central European Conference on Intelligent and Information Systems, Varaždin, Vol. 1, 2016. p. 23-30. Troup M. Indexing of patterns in graph DB engine neo4j I. Master’s thesis, Czech Technical University in Prague, Faculty of Information technology, 2016. https://dspace.cvut.cz/handle/10467/63010