What are the Different Types of NoSQL Data Stores ??

 

In my last post, I gave an overview of NoSQL databases and typical use cases where one can use NoSQL data stores.

As mentioned in the last post, NoSQL databases can be classified into four types,

  1. Key-Value (KV) Stores
  2. Document Stores
  3. Column Family Data stores or Wide column data stores
  4. Graph Databases

Here is an explanation for each of these types.

 

Key-Value (KV) Stores maintain data as pair consisting of an index key and a value. KV stores query Values using the index Key. Every item in the database is stored in the pairs of Keys (Indexes) and Values. KV stores resemble a relational database but with each table having only two columns.

Some KV stores may even allow basic joins to help you scan through if there are composite joins, they may not be a suitable options.

There are multiple KV Stores available, each differing mainly in their adaption of the CAP theorem and their configurations of memory v/s storage usage.

KV stores have fast query performance and are best suited for applications that require content caching, e.g. a gaming website that constantly updates the top 10 scores & players

Pros:

–          Simple Data model

–          Scalable

Cons:

–          No relationships, create your own foreign keys

–          Not suitable for complex data

 

Popular KV Stores would include Dynamo DB, Redis, BerkleyDB.

2. Document Stores are an extension of the simplicity of Key Value stores, where the values are stored in structured documents like XML or JSON. Document stores make it easy to map Objects in the object- oriented software.

A document database is schema free, you don’t have to define a schema beforehand and adhere to it. It allows us to store complex data in document formats (JSON, XML etc.).

Document databases do not support relations. Each document in the document store is independent and there is no relational integrity.

Document stores can be used for all use cases of a KV store database, but it also has additional advantages like there is no limitation of querying just by the key but even querying attributes within a document, also data in each document can be in a different format. E.g. A product review website where zero or many users can review each product and each review can be commented on by other users and can be liked or disliked by zero to many users.

Pros:

–          Simple & Powerful Data model

–          Scalable

Cons:

–          Not suitable for relational data

–          Querying limited to keys & indexes

–          Map Reduce for larger queries

 

Some popular Document stores are MongoDB, CouchDB, Lotus Notes.

3. Column Family Data stores or Wide column data stores take a hybrid approach mixing the declarative characteristics game of relational databases with the key-value pair based and totally variables schema of key-value stores. Wide Column databases stores data tables as sections of columns of data rather than as rows of data.

Columnar Family databases have their origins in Google’s Bigtable. According to Google’s paper on Bigtable, “A Bigtable is a sparse, distributed, persistent multidimensional sorted map.” This definition might leave you confused, just as I was, it was all greek to my RDBMS oriented mind.

Here is a more simplified explanation, a column family data store is a multi-dimensional key value store (map or associative array) which is persistent (values persist after creation or access), distributed (data is distributed across multiple computing  & storage nodes), sorted (sorted keys) and sparse (values for certain dimensions may not be populated, similar to sparsely populated rows in RDBMS).

The multi-dimensional aspect of column stores brings in another concept of column families.

Column-family databases store data in column families as rows that have many columns associated with a row key. Column families are groups of related data that is often accessed together.

There are two types of column families:

  1. Standard Column family: Standard Column family consists of a key-value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a standard column family is as a “table”, each key-value pair being a “row”.
  2. Super Column family: Super Column family consists of a key-value pair, where the key is mapped to a value that are column families. In analogy with relational databases, a super column family is something like a “view” on a number of tables. It can also be seen as a map of tables.

Pros:

–          Supports semi-structured data

–          Naturally indexed

–          Scalable

Cons:

–          Not suitable for relational data

Some of the popular Wide column data stores include Google’s Bigtable, Cassandra, HBase.

Note, wide column data stores are not to be confused with column-oriented databases, I’ll may be cover this in a separate post.

4. Graph Databases specific purpose is the storage of graph-oriented data structures. A graph database is any storage system that provides index-free adjacency. This means that every node contains a direct pointer to its adjacent element and no index lookups are necessary. As the number of nodes increases, the cost of a hop remains the same.

Graph databases are optimized for traversing through connected data, e.g. traversing through a list of contacts on your social network to find out the degree of connections.

Graph databases usually come with a flexible data model, which means there is no need to define the types of edges and vertices.

Typical use cases for graph databases would include social networking site, recommendation engine.

Pros:

–          Extremely powerful

–          Connected data is locally indexed

–          Can provide ACID

Cons:

–          Difficult to scale out, though can scale up

Some of the popular graph databases are Neo4j, OrientDB, Allegrograph.

NoSQL data stores provide an alternative to the traditional RDBMS, and you might be not be sure of the NoSQL databases you want to select. The ideal way of identifying the best suitable NoSQL database for your application is to figure out the requirement that is not met by RDBMS. If all you requirements are fulfilled by a RDBMS, you may not want a NoSQL data store.

If you have a requirement for managing large, unstructured data sources feel free to contact us. I will also appreciate your feedback to this post.

Tags: , , , , ,