Types of NoSQL databases and key criteria for choosing them
In a book excerpt, consultant Dan Sullivan offers insights into how to select the right type of NoSQL database for the right application in your organization.
This is an excerpt from Chapter 15 from the book NoSQL for Mere Mortals by Dan Sullivan, an independent database consultant and author. In the chapter, Sullivan takes a look at the four primary types of NoSQL databases -- key-value, document, column family and graph databases -- and provides insights into which applications are best suited for each of them. He also discusses the differences between relational and NoSQL database design, and the need for coexistence between relational and NoSQL technologies in many organizations.
In relational database design, the structure and relations of entities drive design -- not so in NoSQL database design. Of course, you will model entities and relations, but performance is more important than preserving the relational model.
The relational model emerged for pragmatic reasons -- that is, data anomalies and difficulty reusing existing databases for new applications. NoSQL databases also emerged for pragmatic reasons -- specifically, the inability to scale to meet growing demands for high volumes of read and write operations.
In exchange for improved read and write performance, you may lose other features of relational databases, such as immediate consistency and ACID transactions (although, this is not always the case).
Throughout this book, queries have driven the design of data models. This is the case because queries describe how data will be used. Queries are also a good starting point for understanding how well various NoSQL databases will meet your needs. You will also need to understand other factors, such as:
The volume of reads and writes
Tolerance for inconsistent data in replicas
The nature of relations between entities and how that affects query patterns
Availability and disaster recovery requirements
The need for flexibility in data models
Latency requirements
The following sections provide some sample use cases and some criteria for matching different NoSQL database models to different requirements.
Copyright info
This chapter excerpt is from the book NoSQL for Mere Mortals by Dan Sullivan, published by Pearson/Addison-Wesley Professional, April 2015, ISBN 978-0-13-402321-2.
To purchase the book and receive 35% off, apply the discount code AWDATA35 during checkout at Informit.com.
Criteria for selecting key-value databases
Key-value databases are well-suited to applications that have frequent small reads and writes along with simple data models. The values stored in key-value databases may be simple scalar values, such as integers or Booleans, but they may be structured data types, such as lists and JSON structures.
Key-value databases generally have simple query facilities that allow you to look up a value by its key. Some key-value databases support search features that provide for somewhat more flexibility. Developers can use tricks, such as enumerated keys, to implement range queries, but these databases usually lack the query capabilities of document, column family and graph databases.
Key-value databases are used in a wide range of applications, such as the following:
Caching data from relational databases to improve performance
Tracking transient attributes in a Web application, such as a shopping cart
Storing configuration and user data information for mobile applications
Storing large objects, such as images and audio files
Note
In addition to key-value databases you install and run on premises, there are a number of cloud-based choices as well. Amazon Web Services offers SimpleDB and DynamoDB, whereas Microsoft Azure's Table service provides for key-value storage.
Use cases and criteria for selecting document databases
Document databases are designed for flexibility. If an application requires the ability to store varying attributes along with large amounts of data, then document databases are a good option. For example, to represent products in a relational database, a modeler may use a table for common attributes and additional tables for each subtype of product to store attributes used only in the subtype of product. Document databases can handle this situation easily.
Document databases provide for embedded documents, which are useful for denormalizing. Instead of storing data in different tables, data that is frequently queried together is stored together in the same document.
These NoSQL databases will continue to coexist with each other ... because there is a growing need for different types of applications with varying requirements and competing demands.
Additionally, document databases improve on the query capabilities of key-value databases with indexing and the ability to filter documents based on attributes in the document.
These databases are well-suited to a number of use cases, including:
Back-end support for websites with high volumes of reads and writes
Managing data types with variable attributes, such as products
Tracking variable types of metadata
Applications that use JSON data structures
Applications benefiting from denormalization by embedding structures within structures
Document databases are also available from cloud services such as Microsoft Azure Document and Cloudant's database.
Use cases and criteria for selecting column family databases
Column family databases are designed for large volumes of data, read and write performance, and high availability. Google introduced Bigtable to address the needs of its services. Facebook developed Cassandra to back its Inbox Search service.
These database management systems run on clusters of multiple servers. If your data is small enough to run with a single server, then a column family database is probably more than you need -- consider a document or key-value database instead.
Column family databases are well-suited for use with:
Applications that require the ability to always write to the database
Applications that are geographically distributed over multiple data centers
Applications that can tolerate some short-term inconsistency in replicas
Applications with dynamic fields
Applications with the potential for truly large volumes of data, such as hundreds of terabytes
Google demonstrated the capabilities of Cassandra running the Google Compute Engine. Google engineers deployed:
330 Google Compute Engine virtual machines
300 1 TB Persistent Disk volumes
Debian Linux
Datastax Cassandra 2.2
Data was written to two nodes (Quorum commit of 2)
30 virtual machines to generate 3 billion records of 170 bytes each
With this configuration, the Cassandra cluster reached 1 million writes per second, with 95% completing in under 23 milliseconds. When one-third of the nodes were lost, the 1 million writes were sustained, but with higher latency.
Several areas can use this kind of big data processing capability, such as:
Security analytics using network traffic and log data mode
Big Science, such as bioinformatics using genetic and proteomic data
Stock market analysis using trade data
Web-scale applications such as search
Social network services
Key-value, document and column family databases are well-suited to a wide range of applications. Graph databases, however, are best suited to a particular type of problem.
Use cases and criteria for selecting graph databases
Problem domains that lend themselves to representations as networks of connected entities are well-suited for graph databases. One way to assess the usefulness of a graph database is to determine if instances of entities have relations to other instances of entities.
For example, two orders in an e-commerce application probably have no connection to each other. They might be ordered by the same customer, but that is a shared attribute, not a connection.
Similarly, a game player's configuration and game state have little to do with other game players' configurations. Entities like these are readily modeled with key-value, document or relational databases.
Now, consider examples mentioned in the discussion of graph databases, such as highways connecting cities, proteins interacting with other proteins and employees working with other employees. In all of these cases, there is some type of connection, link or direct relationship between two instances of entities.
These are the types of problem domains that are well-suited to graph databases. Other examples of these types of problem domains include:
Network and IT infrastructure management
Identity and access management
Business process management
Recommending products and services
Social networking
From these examples, it is clear that when there is a need to model explicit relations between entities and rapidly traverse paths between entities, then graph databases are a good database option.
Large-scale graph processing, such as with large social networks, may actually use column family databases for storage and retrieval. Graph operations are built on top of the database management system. The Titan graph database and analysis platform takes this approach.
Key-value, document, column family and graph databases meet different types of needs. Unlike relational databases that essentially displaced their predecessors, these NoSQL databases will continue to coexist with each other and relational databases because there is a growing need for different types of applications with varying requirements and competing demands.
Next Steps
Get more on using the different types of NoSQL databases in this guide to NoSQL software
Expert tips for selecting a NoSQL DBMS
NoSQL DMBSes are the fastest-growing DBMS category -- but is it right for you?