RDBMS 與 NoSQL 的比較

RDBMS vs NoSQL

了解 RDBMS 與 SQL 的關係《RDBMS 與 SQL 的差異,他們是一樣的嗎?》之後,讓我們來看看什麼是 NoSQL,並比較 Relation database 與 NoSQL database 的差異。

首先,我們來看看什麼是 Relational Model。

Relational Model & Relational Database

RDBMS 多用關聯式模型 Relational Model 作為儲存資料的結構。

儲存資料的模型 data model 有很多種,像是最早的網狀模型 network model 和層次模型 hierarchical model,以及現在也廣為使用的文件模型 document model。

而目前在 RDBMS 中最廣為人知的資料模型,非關聯式模型 relational model 莫屬。

資料彼此間用關係連結,在 SQL 中稱 tables,每個關聯是複數個 tuples 的非排序集合,在 SQL 中稱 rows。

data is organized into relations (called tables in SQL), where each relation is an unordered collection of tuples (rows in SQL).

NoSQL Database

只有簡單一句
Not Only SQL

所以基本上,只要不是 SQL 的都屬於這一類!

SQL 與 NoSQL 有各自擅長的部分

SQL

關聯式模型可以提供很好的 joins 支援,以及表單中很好的的 many-to-one 和 many-to-many 關聯。

The relational model counters by providing better support for joins, and many-to-one and many-to-many relationships.

NoSQL

而 NoSQL 中常用的文件模型,則是讓 schema 更具彈性,因為 locality 特性所以有較好的效能,而且在某些應用程式中,具有更貼近應用程式資料的結構(相比之下 SQL 就有 impedance mismatch 的問題,所以才會衍生出 ORM)。

The main arguments in favor of the document data model are schema flexibility, better performance due to locality, and that for some applications it is closer to the data structures used by the application.

了解以上背景知識後,就藍正是比較兩種 database 囉!

以下為「傳統」的比較,因為隨著時代的演進,其實兩種資料庫之間的界線越來越模糊,Relational Database 添加了許多 NoSQL 資料庫的功能,而 NoSQL 資料庫往往也能做到 Relational Database 的資些功能。

Relational vs NoSQL databases

Aspect covered

Relational database

NoSQL database

Schema structure

schema, predefined schema

This ensures data consistency and integrity but can limit flexibility.

schema-on-write (the traditional approach of relational databases, where the schema is explicit and the database ensures all written data conforms to it)

schemaless or flexible, dynamic schema, there is an implicit schema, but it is not enforced by the database.

schema-on-read (the structure of the data is implicit, and only interpreted when the data is read)

Storage model

Relational model

Adopts a fixed storage model with data stored in rows and columns, suitable for structured data but less efficient for varied data types.

Offers diverse storage models like document, key-value, column-family, JSON or XML documents, and graph databases.

A hierarchical structure of arrays and key value pairs

Transaction properties

ACID (Atomicity, Consistency, Isolation, Durability)

Systems that do not meet the ACID criteria are sometimes called BASE, which stands for Basically Available, Soft state, and Eventual consistency

Query language

Utilizes SQL Structured Query Language, offering a standardized and powerful tool for complex data retrieval and manipulation.

Often employs non-standardized querying languages or simple API calls, providing specialized queries but lacking a universal language.

Scaling approach

Typically relies on vertical scaling, expanding by enhancing the power of existing servers, which can become costly and limited in scalability.

Designed for horizontal scaling, allowing the addition of more servers to handle larger volumes of data, ideal for big data applications.

Data organization

Employs data normalization to reduce redundancy and improve integrity, leading to cleaner databases but potentially more complex queries.

Often uses denormalization, duplicating data across tables to improve read performance at the expense of data redundancy.

Data locality for queries. There is a performance advantage to this storage locality.

Data relationships

Ideal for managing complex data relationships, with robust support for foreign keys and joins.

many-to-one — foreign key

many-to-many — foreign key

one-to-many — should joins many tables

use document reference in the document model

many-to-one — reference

many-to-many — reference

one-to-many — storing nested records (Figure 2-2)

Use cases

Well-suited for applications requiring complex queries, transactional integrity, and structured data, such as financial systems.

Preferred for applications dealing with large volumes of unstructured or semi-structured data, requiring scalability and flexibility, like social media platforms.


如果你有一張 one-to-many 的表,其為樹狀結構,是 user profile 跟 user positions, educational history 跟 contact information,比起你去 JOIN 多張表 educational history, user positions, contact information,用 JSON 去表示這個樹狀結構會更簡潔明瞭。

前面有說,現在 RDBMS 也有越來越多 NoSQL 資料庫的功能。看完這些差異後,舉 PostgreSQL 這個我們常聽到的 RDBMS 為例,讓我們來看看他具有哪些 NoSQL 資料庫特性吧《PostgreSQL 具有的 NoSQL 特性》!

實際案例

用正規化的 table 拿 user profile 資料,要 JOIN 4 張 table
SELECT
u.id AS user_id,
u.custom_id,
u.external_id,
up.nickname,
a.id AS avatar_id,
a.uri AS avatar_uri,
a.thumbnail_uri AS avatar_thumbnail_uri,
up.onboarding_interest_labels,
up.remaining_free_avatar_count,
up.has_onboarding_free_avatar,
u.email,
u.custom_id_updated_time,
us.is_onboard
FROM user_profiles AS up
JOIN users AS u ON up.user_id = u.id
JOIN user_status us ON us.user_id=u.id
LEFT JOIN avatars AS a ON up.avatar_id = a.id
WHERE u.id=197764038602182656;

如果把資料包成 JSON 格式先存進 user_profile_json table 的話,拿資料時只要下一行就搞定
select * from user_profile_json where id=197764038602182656;

再來看看 EXPLAIN 的分析

速度大概快了 4 倍!好讚!


Comments

Popular posts from this blog

《 Imgproxy 使用分析一:圖片下載速度優化分析:Akamai CDN vs Imgproxy 效能比較》

《 Akamai + S3 與 CloudFront + Imgproxy + S3 使用分析二:壓縮圖片設計流程:檔案大小 vs 載入時間的權衡》

程式語言初學者 Docker 入門第二章 —— 安裝與驗證 (Mac)