System Design Note #05: NoSQL

System Design Note #05: NoSQL

This blog post peeks into the world of NoSQL databases, the unsung heroes in managing your digital assets. We'll explore what NoSQL is, its various types, popular examples, and the challenges they bring to the table.

What is NoSQL?

Imagine a world where data isn't confined to rigid tables but flows freely, adapting to ever-changing needs. That's the essence of NoSQL databases. They are non-relational or distributed databases designed for large-scale data storage and high-performance data processing. Unlike traditional databases, NoSQL databases are known for their flexibility in handling diverse data types, making them a go-to solution for big data and real-time web applications.

NoSQL databases come in various flavors, each serving different data storage and management needs:

  • Key-Value Stores: Think of a giant locker with infinite compartments, each holding a treasure (value) that can only be accessed with a unique key. This simplicity makes them incredibly fast and scalable.
  • Document Databases: Picture a library where each book (document) can have a different structure, but finding information is a breeze. They're perfect for content management and e-commerce platforms.
  • Column Stores: Imagine a spreadsheet that's supercharged for handling millions of rows and columns efficiently, making them ideal for analytics.
  • Graph Databases: Consider a network of friends on a social media platform, where each connection holds meaning. Graph databases excel in uncovering these relationships.
Image source:

Different From Relational Database

In the landscape of data management, relational and non-relational databases offer contrasting methodologies, particularly evident in their approaches to data schema. Relational databases exemplify the 'schema-on-write' approach. Think of it like a meticulous planner who organizes every detail before an event. In relational databases, the schema (data structure) is defined upfront. This means before you can write (insert) data into the database, its format must conform to a predefined structure of tables, rows, and columns. This structure is rigorously maintained, ensuring data integrity and consistency, making them ideal for applications where precision is key, such as financial record keeping.

Non-relational databases, however, embrace the 'schema-on-read' philosophy. This approach is akin to an improviser who adapts to the situation as it unfolds. Here, the data can be stored without a predefined schema; its structure is interpreted or 'imposed' at the time of reading or querying the data. This flexibility allows for the storage of unstructured or semi-structured data, such as JSON documents, which can vary in structure from one document to the next. It's a perfect fit for scenarios dealing with diverse data types and formats, like social media feeds or IoT sensor data streams. However, this flexibility can lead to challenges in data quality and consistency, as the burden of interpreting the data correctly falls on the application or the database query.

Both schema-on-write and schema-on-read have their unique strengths and ideal use cases. Relational databases, with their schema-on-write model, offer robustness and precision, suited for applications requiring strict data consistency. In contrast, non-relational databases, with their schema-on-read approach, provide versatility and scalability, ideal for handling varied and voluminous data in dynamic environments.

Let's meet some of the stars of the NoSQL universe:

  • MongoDB: Famous for its flexible document model, MongoDB is a hit in content management systems.
  • Cassandra: Known for its excellent scalability, Cassandra is the go-to for applications like real-time bidding systems where handling large datasets in real-time is crucial.
  • Redis: With lightning-fast data retrieval capabilities, Redis shines in scenarios requiring real-time analytics, like in gaming leaderboards.
  • Neo4j: For applications needing to navigate and understand complex relationships, like social networks, Neo4j's graph database model is unparalleled.
Most popular NoSQL database in 2022. Image source:

When To Use SQL and NoSQL Database

Imagine you're developing an online banking system. This system requires complex transactions, like transferring money from one account to another. The database must maintain absolute consistency (every transaction is accurately recorded and the balance is always correct). The data here is highly structured and relational – customers have accounts, and accounts have transactions. In this scenario, we should use relational databases because:

  • ACID Compliance: SQL databases are ACID compliant (Atomicity, Consistency, Isolation, Durability), ensuring that transactions are processed reliably.
  • Complex Queries: SQL databases excel in handling complex queries, which is essential for financial reporting and auditing.
  • Data Integrity: They enforce data integrity through constraints and relationships, crucial for financial data.

Another example, considering a social media platform handling millions of user interactions, posts, comments, and real-time messaging. The data here is diverse (text, images, videos) and unstructured. The platform experiences massive and unpredictable spikes in traffic. We should use the NoSQL database because of the following reasons:

  • Scalability: NoSQL databases can handle a large volume of data and are designed to scale out by distributing the database across many servers.
  • Flexible Schema: They accommodate the unstructured and semi-structured data typical in social media (e.g., posts can have different types of content and formats).
  • High Performance for Simple Queries: NoSQL databases are optimized for specific types of queries (like key-value lookups) and can provide fast responses, essential for a real-time user experience.

In conclusion, the choice between SQL and NoSQL databases hinges on the specific needs and nature of your application. Opt for SQL when your application revolves around structured data and demands stringent requirements in terms of complex transactions, unwavering data integrity, and consistent reliability. This scenario is typical in domains like banking systems, where every transaction's accuracy is paramount, or inventory management systems, where relationships between different data entities are complex and interdependent. On the other hand, NoSQL databases are your ally in situations involving a vast expanse of unstructured or semi-structured data. They shine in environments that require rapid scalability to accommodate fluctuating data loads and prioritize flexibility in data models along with swift read/write operations. Such conditions are prevalent in applications like social media platforms, which must manage a variety of content types and user interactions, or in big data analytics, where the ability to quickly process large volumes of diverse data is crucial. Thus, the decision to use SQL or NoSQL should be guided by the specific data requirements and operational dynamics of your application.