《Implement Idempotency With API Gateway Plugin》 Gopher Day 2024 — Gaston Chiu

July 14, 2024

不知道身為後端的你，在寫與交易有關的 API 時，有沒有遇過類似狀況：

用戶在用自家 app 買東西時，client 請求下單扣款而 server 已經完成扣款，但 response 可能因封包丟失或網路斷線等等原因，讓 client 以為 request failed，重送 request (retry 機制)，導致重複扣款。

上述情境可以發現三個問題

有些 API 有非冪等性的特性，因此要注意
網路是不可靠的，訊息可能丟失
Retry 是危險的

現在我們要來看看，如何解決 server/client behavior 不一致，導致的重複操作。

===== 補充 =====

首先來名詞解釋一下

什麼是 Idempotency?
什麼是 API GW?

1. Idempotency 冪等性

在 API 服務中，常常需要留意 idempotency（冪等性）。

Idempotent method 是指

不管 Request 幾次，結果都一樣。

從這個定義來看，DELETE, PUT 還有 Safe methods GET 都是 Idempotent method。

GET v1/backend/members

# response 200
[
  {
    "id": 1
    "name": "Eric"
  },
  {
    "id": 2
    "name": "Tina"
  },
  {
    "id": 3
    "name": "Rachel"
  }
]

DELETE v1/backend/members/id=?1

# response 204
{}

PUT v1/backend/members/1

# request
{
  "name": "Eric Wang"
}

# response 204
{}

但是 PATCH 則不一定了， PATCH 在語意上代表著修改資料，換句話說可能這樣：

PATCH http://shop.com/itme?id=1

# body
{
  "title": "new title"
}

request 代表著只更新 title, 這樣的 request 符合語意也 Idempotent。
發了 100 次和 1 次標題都是同樣的 new title。

但有另外一種可能

PATCH http:shop.com/item/add?id=1

# body
{
  "number": 10
}

requst 代表的是增加 10 個 item 的數量。這種情況下也符合語意（修改資料），但就不符合 Idempotent 了。

發了 100 次會新增 1000 個。

那 POST 就更不用提，1 次和 100 次肯定是不一樣的。

ref

2. API Gateway

如果你是用微服務架構，極有可能會在微服務最外層架 API Gateway 服務，Gateway 可以把 client request 轉導到指定服務，有 router 的功能或基本驗證。

Step 1 - The client sends an HTTP request to the API gateway.

Step 2 - The API gateway parses and validates the attributes in the HTTP request.

Step 3 - The API gateway performs allow-list/deny-list checks.

Step 4 - The API gateway talks to an identity provider for authentication and authorization.

Step 5 - The rate limiting rules are applied to the request. If it is over the limit, the request is rejected.

Steps 6 and 7 - Now that the request has passed basic checks, the API gateway finds the relevant service to route to by path matching.

Step 8 - The API gateway transforms the request into the appropriate protocol and sends it to backend microservices.

Steps 9-12: The API gateway can handle errors properly, and deals with faults if the error takes a longer time to recover (circuit break). It can also leverage ELK (Elastic-Logstash-Kibana) stack for logging and monitoring. We sometimes cache data in the API gateway.

ref

===== 補充完畢 ====

解決方法

加入 idempotency key 建立安全付款的 retry 機制

透過前後端約定好的 unique key 作為交易的 idempotencier 放在 Header，當 server 收到同樣的 idempotencier 且 payload hash 相同時，可以直接回 success (2xx)。

POST 比較需要 idenpotency key，因為非冪等性；GET, DELETE, PUT 不需要，因為冪等性，多次執行結果相同。

實作上參考 RFC

情況整理

如果一筆 idenpotency id 已經被成功執行，並且存在 cache 中，第二筆相同的 idenpotency id request 又進來時，在 cache 找到資料，可以直接回 201，表示 payment created。

如果第二筆 request 的 body/header hash 跟上一次相同的 idempotency key 對應的 body/header hash 不同，會回 422 Unprocessable

idempotency header missing - 400
different request payload with the same idempotency key - 422
原本的 request 還在處理，retry 的 request 進來 - 409

💡 補充 向 TCP 取經

在不穩定的分散式系統中，對於 non-idempotent 的操作，該如何避免 retry 所引發的問題？

效法 TCP，畢竟它算是在不可靠的 IP 傳輸環境中，進行可靠傳輸的老祖宗了。

每一個 TCP segment 的 header 都帶有一個序列號 (seq)，透過它，TCP 通訊的雙方，得以在不可靠的環境中，處理連線建立、流量控制、連線關閉等議題。2

就以 TCP 的 3-way handshake 機制為例，雙方藉由 seq 來溝通是否成功地傳送與接收訊息：

實作

最後是實作的部分

產 Idempotency-Key 的方法

透過程式邏輯產生的 id，可粗分為三大類：

純亂數：像 uuid v4
局部單調遞增：像 uuid v1
全域單調遞增：像 Snowflake、Sonyflake、Leaf

建議：Idempotency-Key: Random string uuid v4

PostgreSQL 的 uuid 效能十分優異，足堪大任。建議：如果你的 idempotency key 想要是亂數，可以直接把 uuid v4 存在 PostgreSLQL uuid 欄位中：

ref

Calculate Fingerprint

將 payload hash，可以是所有欄位，也可以是部分欄位

Checksum of the entire request payload
Checksum of selected element(s) in the request payload.

Pseudo Code:

key = req.header["Idempotency-Key"]
fingerprint = Fingerprint(url+body+whitelisted(headers))
Lock(key)
defer Unlock(key)
cachedResponse = GetFromCache(key, fingerprint)
if cachedResponse != nil {
return cachedResponse
}
resp = Process(req)

用 url, body, whitelisted header 算出 request fingerprint
Lock idempotency key

lock 的時候要回 value，unlock 的時候帶 value 回去 idempotency services unlock

看這個 key + fingerprint 是否已經有 cached response
沒有再 process 並將結果 set to cache

Architecture

microservice

Kong gateway plugin

Kong API Gateway

把 idempotency 相關邏輯拆成一個 service，由 Gateway Plugin 呼叫

API Gateway Plugin

Idempotency service

Request lock 實作，這裡包含 idempotency key 跟 lock duration:

Request lock

前提：透過 Redis 做 lock server & cache server
random value: unlock 時會用到，保證 lock & unlock 的人是同一個
使用 go 的 redsync 模組來同步

當只有 Gateway Plugin

當有 Gateway Plugin + Idempotencier Service

Acquire Lock 的兩種方式：

1. on Idempotencier Service

2. on Redis

Lock

Create a microservice for gateway plugin (GetCache)

Create a microservice for gateway plugin (Unlock & SetCache)

Unlock

Lock Storage

Lock 的儲存有以下幾種方式：

Redis replication: easy to setup, but may lose data

當 master 還有廣播到 replica 就被 kill，新成為 master 的 node 的 memory 中並沒有這筆資料（沒有任何 persistence value）
- Pros:
  - Easy for setup and maintain
- Cons:
  - Potentially loss data during failover
Redlock: redis distributed lock 演算法
- Pros:
  - More secure on data
- Cons:
  - Complicated for setup and implementation
ddia 作者 Martin Kleppmann：遇到 GC 時長超過 timeout 時，redlock 也會失效

https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
AWS MemoryDB for Redis: 把 redis 加上 transaction log，確保 failover 時不會造成 data loss
- Pros:
  - Remove the risk for losing data
  - Redis compatible
- Cons:

講者最終決定是採用 Redis replication，且可能會 migrate 到 MemoryDB for Redis

ref

Search This Blog

青技術