Ebay / Randy Shoup - QCon Tokyo 2009

よいまとめ。
QCon Tokyo 2009 に行ってきました。二日目のメモ - kawaguti の日記　(id:wayaguchi)

Ebay / Randy Shoup

Partition Everything
Pattern: Functional Segmentation
Pattern: Horizontal Split
Load-balance processing
Split data along primary access path: partition by modulo of a key, range, lookup etc

Corollary: No Session State
User session flow moves through multiple application pools
absolitely no session state ni application layer
Session state maintained in cookie, URL, or database

Best Practice 2: Asynchrony Everywhere
Prefer asyncronous processing
Move as much processing as possible to asynchronous flows
Where possible, integrate components asyncronously

Motivations
Scalability: Can independently scale component A and B
Availability: Allows component A or B to be temporarily unavailable, Can retry operations
Latency: Can reduce user experience latency, Can allocate more time to processing than user would toralate
Cost

Pattern: Event Queue
Primary application write data and produces event
Consumers subscribe to event
At least once delivery, rahter than exactory once
No Guaranteed order, rather than in-order
Idempotency and readback: イベント自体にはデータが入っていないので、データはマスタから読み出す

Pattern: Message Multicast
Search Feeder publishes item updates
Read item updates from primary database
Publishes sequensed updates via multicast to search grid

Search engines listen to assigned subset of messages
Update in-memory index in real time
Request recovery when messages are missed

Best Practice 3: Automate Everything
Prefer Adaptive / Automated systems to Maunal Systems
Pattern: Adaptie configuration
Do not manually configure event consumers
Define SLA for a given consumer
ex. process 99% of events within 15 seconds

Each consumer dynamically adjusts to meet defined SLA
Consumers automatically adapt to changes in
load
Event processing time
Number of consumer instances

Pattern: Machine Learning
Dynamicaly adapt search experience
Feedback loop enables system to learn and improve over time
collect user behaveor

Best Practice 4: Remember Everythig Fails
Build all systems to be tolerant
Pattern: Failure Detection
Servers log all requests
Over 2TB of log messages over day

Listeners automate failure detection and notification

Pattern: Rollback
Absolutely no changes to the site which cannot be undone(!)
Every features has on/off state driven by central configuration
Features can be immediately turned off for operational or business reasons
Features can be deployed "write-off"

Pattern: Graceful Degradation
Application "marks down" a resource if it is unavailable or distressed
Application removes or ignores non-critical operations
Application retries critical operations or defers them to an asynchronous event

Embrace Inconsistency
Brewer's CAP Theorem
any shared-data system cannot have at most two of the following properties: Consistency/Availability/Partition-tolerance

This trade-off is fundamental to all distributed systems.
Chooose Appropriate Consistency Guarantees
Immediate Consistency: Bids, Purchases
Eventual Consistency: Search Engine, Billing system etc
No Consistency: Preferences

Avoid Distributed Transactions
no two-phase commit
Minimize inconsistency through state machines and careful ordaring of database operations
Reach eventual consistency

Availabilty はギブアップした

[感想] すごく実践的な内容で、大規模サイト構築の悩むポイントがつまっていました。午前中の丸山先生の講演でも出てましたが、CAP理論で「２つまでしか成立しない」と言っている部分へのそれぞれの会社の対応がすごく興味深いですね。

A Life in Shinjuku.: 大規模ウェブサイトのベストプラクティス −eBayでの事例−

eBayシステムのプロファイル概略
* トランザクションデータは2ペタバイト
* DWH用データは50ペタバイト
* 1日あたり48億回のSQL実行
ペタバイトとか、もう多いか少ないかもわからん。
大規模システムの原則
ということで、eBayのシステム構築の原則。
* Partition Everything
* Asynchrony Everywhere
* Automate Everything
* Remember Everything Fails
* Embrace Inconsistency
なんだか清々しいです。
Partition Everything
分割方法は、データ、負荷、使用方法の特徴など、色々なものがあげられる。分割の動機としては…
* Scalability
* Availability
* Manageability
* Cost
このような感じ。PartitionとAvailabilityとは、Partitionによって故障箇所を局所化することでAvailabilityを向上できる、という点で関連する。PartitionとCostとは、Partitionによって問題を細分化することで、 price/performance比がベストの点を選択できるようになる、という点で関連する。特に、Availabilityを重視している印象を受けた。
eBayでは、いくつかの方法でシステムを分割している。
■機能による分割（処理）
* Selling
* Search
* ViewItem
* Bidding
* Checkout
* Feedback
■機能による分割（データ）
* User
* Item
* Transaction
* Product
* Account
* Feedback
■水平的な分割
* ノード分割＋ロードバランシング
* データをアクセスパスにしたがって分割（RDBのパーティションのことだと思う）
また、Partitionのため、セッションステートはアプリケーションティアに保持しないようにしている。CookieやURL、DBに持っているらしい。この辺は王道。
Asynchrony Everything
非同期化のメリットは…
* ピークロード（最大負荷）ではなくアベレージロード（平均負荷）をターゲットにリソースを配分することができる点
* ユーザーが許容可能な時間を越えて、リクエストを処理することができる点
# 他にもいくつかメリットがあったけど、個人的に重要だと思ったのだけなんとかメモ ^^;
非同期化の方法には、キューイングやマルチキャストメッセージがある。
eBayで採用しているキューでは順序性は保証していないが、メッセージ自体にはデータは入れず、イベント通知を受けた側がその時点で正しいデータを読みに行く（Read Back）ことで整合性の取れた処理を行えるようにしているらしい。
マルチキャストメッセージは、商品情報などのデータがアップデートされた際に、サーチエンジンのインデックス更新を通知する場合などに使われている。 Feed Daemonというプロセスが定期的にデータ更新をポーリングしており、更新があった場合（eBayの場合、ほぼ確実にあると思うが）に各サーチエンジンにマルチキャストメッセージを送信する、という形を取っている。
Automate Everything
興味深い例として、ユーザーの振る舞い（サイト上でどのように行動したか）情報を収集し、それに応じてシステムを最適化する、という一連のアクティビティを自動化している例が紹介された。
収集したデータをもとにメタデータを更新し、システムがその振る舞いを変えるところまで自動化されているのが面白い。データ収集や、分析のネタ作りの自動化くらいまでは結構やってると思うんだけど、ここまでやっているのは珍しい。いいアイデアのヒントをもらった。
Remember Everyghint Fails
システム上のすべての変更は、すべてもとに戻せるようにしているらしい。これは、データだけではなく、システム機能に対する変更にもあてはまるようで、すべての機能はコンフィグでON/OFFが切り替えられるそうだ。データ更新に対するロールバックをどう扱っているかに興味があるが、それは次の「Embrace Inconsistency」で扱われるトピックなのだろう。
また、Graceful Degrationも可能らしい。まぁ、このレベルだとある意味当たり前と言える。
Embrace Inconsistency
ここで、満を持して（？）CAP Theoremが登場。このセミナーで3回目。はじめて、CAPの正確な定義がスライド上に登場する。が、はやすぎてメモれず ;-(
* C : all clients see the same data even if system ...
* A : all clients will get a response even if system failure exist...
* P : 時間切れ ;-(
# おそらく、AmazonのCAPの定義と大差ないはず。でも、誰か知ってたら教えてください… ;-(
さて、ここでの基本的な主張は、もうおなじみ「Consistencyは"あり"か"なし"かの2者択一ではなく、Immediate ConsistencyとNo Consistencyの間に多数のConsistencyレベルがスペクトラムのように存在する」というもの。

-

Immediate <------- Eventually -------> No Consistency
Bids/Purchase Search Engine/ Preferences
Billing System

-

金融システムにおいてすら、必ずしもImmediate Consistencyが必要とは限らない。ちなみに、eBayでは分散トランザクションはまったく使っていない（これが噂のTransactionlessか）。そのかわり、DB操作の順序を厳密に決定することで、システム全体の整合性を高めている。
# ステートマシンを使っている、という発言もあったけど、分析に使っているんだろうか？