NoSQL Data Stores
NoSQL is one of the most frequently heard terms in technology today. NoSQL is no single technology or framework but rather a consolidation of varying technologies with different use cases. NoSQL databases approach data storage in a less constrained way when compared to the Relational Databases widely used today.
Any database that is not a RDBMS, has schema-less structures, does not follow ACID transactions, and has high availability and support for large data sets in horizontally scaled environments can be categorized as a “NoSQL data store”. As there are no schemas in a NoSQL database, it makes it suitable to store unstructured data usually handled by internet scale websites.
NoSQL is not a competing set of technologies to RDBMS but provide with alternatives to use cases where RDBMS may not be a suitable option. Many implementations of NoSQL database do support SQL (Structured Query Language) and the term NoSQL is widely defined as ‘Not only SQL’.
Traditionally the emphasis for Relational Database Management Systems (RDBMS) has been on a set of properties called ACID (Atomicity, Consistency, Isolation, and Durability). The ACID properties of RDBMS guaranteed reliability by locking records while being updated. This constraint made the transactions reliable but affected the performance of the database itself, making them less suitable for internet scale applications. For a database following ACID properties, two users querying the same data will get same results after execution of a transaction.
Many NoSQL databases usually follow BASE (Basically Available, Soft-state, and Eventually Consistent) properties. This means while you can scale up your database and make it highly available, the consistency of data may not be immediate. For a database following BASE properties, two users querying the same data might get different results immediately after execution of a transaction but will get the same data eventually.
Lotus Notes from IBM is one of the first implementation of what we today identify as a NoSQL database. One of the developers of Lotus Notes, Damien Katz built first modern NoSQL database called CouchDB in mid 2000s. However, the real impetus to the whole NoSQL movement came from the Google’s paper on Bigtable (2006) and Amazon’s paper on DynamoDB (2007) which became the standard for anyone who wanted to develop a NoSQL database.
Currently there are about 150 NoSQL databases developed and available, mostly open source.
Typical use cases where you can consider a NoSQL databases would include and not be limited to:
– When you have scalability issues with your RDBMS, achieving scale at an acceptable cost
– You have an application where data models change frequently and you cannot fix them, a pre-requisite for a RDBMS implementation
– Your application has a lot of data which does not mandate an explicit transaction, e.g. likes for each update on your website
– You deal with temporary data which does not get stored in the main transaction tables, e.g. site personalization or lookups or shopping carts
– You can allow temporary inconsistencies of data e.g. updates on social networking sites can take time to be visible to all users
– You need to query data which is non-hierarchical e.g. how many people on my extended social network have a bachelor’s degree in engineering from University of Pune.
– You have de-normalized data in your RDBMS
NoSQL databases can be classified mainly into 4 types,
- Key-Value (KV) Stores
- Document Stores
- Column Family Data stores or Wide column data stores
- Graph Databases
I will explain these types in detail in my next post.