Tip

NoSQL database types explained: Graph

NoSQL graph databases focus on the relationships between pieces of data. Two common frameworks bring different advantages and disadvantages over other NoSQL database types.

The main objective of NoSQL databases is improved efficiency. It is somewhat achieved with the use of new technologies and thinking outside the box.

An example of the new technologies applied could be opting for solutions that require more storage space. In the past, we used to mind storage requirements more, but lately, it has become less of an issue as a result of the cost of this resource dropping significantly.

Now, for the metaphorical box, it can represent the strict limits of a SQL schema. We are opening it up to find out what else there is that can allow us to connect data in a meaningful whole, manipulate it and exploit it with the least friction.

What is a graph database?

A graph database certainly is outside that box. It focuses on relationships between pieces of data as much as it does on the data itself, thus managing to store data purposefully. Using a graph data model positively helps visualize data. It is highly appreciated as in the world of big data; it is always good to be able to make a quick sense of the data in front of you.

The elements

Based on the graph theory, these databases consist of nodes and edges. Nodes are the entities of a graph database. Simply put, they are the agents and objects of relationships and can be presented as answers to questions "who" and "whom."

Each of the entities holds a unique identifier. They can also have properties consisting of key-value pairs and can have labels with or without metadata assigning a role of a particular node in a domain. There are also the incoming and outgoing edges. Think of them as different ends of an arrow showing you who is the agent and who is the object of a relationship.

Edges are equally as important as nodes because they hold a vital piece of information. They represent relationships between entities. A SQL database would likely have a designated table for each class of relationships. A graph database does not require such mediation because it connects its entities directly. Edges also have unique identifiers and, just like nodes, can have other properties apart from the defined type, direction and the starting and ending node.

Graph database models

There are two common graph database models: Resource Description Framework (RDF) graphs and Property graphs. They have their similarities but are built with a focus on different purposes. One focuses on data integration, whereas the other focuses on analytics.

RDF graphs focus on data integration. They consist of the RDF triple -- two nodes and an edge that connects them (subject, predicate, object). Each of the three elements is identified by a unique resource identifier. You can find them in knowledge graphs, and they are used to link data together. RDFs are often used by healthcare companies, statistics agencies, etc.

Property graphs are much more descriptive and each of the elements carries properties, attributes that further determine its entities. They also consist of nodes and edges connecting the nodes and are better suited for data analysis.

Advantages

The emphasis placed on the edges of a graph database model means these databases represent a powerful way of getting to understand even the most complex relationships between data. The beauty of it is that this way of storing relationships also enables quick execution of queries.

With a clear representation of relationships in a graph database, it is easier to spot trends and recognize elements with the most influence.

Disadvantages

Graph databases share the common downfall of NoSQL databases -- the lack of uniform query language. While this can be an obstacle for the use of a database, it does not affect the performance of this database type. Certain graph databases are more prominent than others; so are the languages they use. Some of the most common graph database languages are PGQL, Gremlin, SPARQL, AQL, etc.

Another downfall is the scalability of these databases because they are designed for one-tier architecture meaning that they are hard to be scaled across a number of servers.

As with all other NoSQL databases, they are designed to serve a specific purpose and excel at it. They are not a universal solution designed to replace all other databases.

Use cases and examples

Graph databases are designed with a focus on relationships and, coincidentally, so are social networks. A graph database is a great way to store all users of a certain social media platform and their engagements to analyze them. You can determine how "lively" or active a social media platform is based on the activity volume of its users. Furthermore, you can identify the "influencers," analyze user behavior, isolate target groups for marketing purposes, etc.

By being able to track and map out the most complex networks of relationships, graph databases are a good tool for fraud detection. Connections between elements that are hard to detect with traditional databases suddenly become prominent with graph databases.

Some of the most popular graph databases -- as well as multimodel databases including graph data models -- are Neo4j suitable for a variety of business-related purposes and followed by the multimodel Microsoft Azure Cosmos DB, OrientDB, ArangoDB, etc.

A table comparing graph databases to relational databases.
The elements of a graph database vs. a relational database.

Graph databases vs. relational databases

A major advantage of any NoSQL over a SQL database is the flexibility of storing data with NoSQL. Whenever there is a case of less structured data or highly complex data, there is room for NoSQL application. If you are considering introducing new relationship types and properties, to place them in a SQL database, depending on what it is, you would have to add new tables.

On the other hand, with a graph database, it is as simple as adding a new edge or a property. By tracing the edges between nodes, you can get to the depth of the most complex relationship between two nodes in a database.

The need for a graph database is recognized by the level of connectedness between data -- where the data is highly connected, there could be room for a graph database. Furthermore, seeing how powerful these connections are, a graph database is a better choice for data analysis rather than simple data storage. Finally, if you want to be flexible with data that is changing often, a NoSQL graph database is likely the better option for you.

Dig Deeper on Database management