Goal
- Build a distributed Payment Gateway Service
- Using a PSP (Stripe, Razorpay, etc)
- Handle for high traffic load, Auth measures & failures
Functional Requirements
- High Availability (99.9% uptime)
- Consistency >> Availability if given the choice
- Atomic: Pass or Fail transactions, not "maybe" (even if inconvenient)
- Reliability, dealing with:
- Cascading failures -> service dead doesn't jam up traffic
- Service dead -> know when services down & mitigate
- Idem-potency -> if user tries multiple, still don't retry transactions for some TTL
Components
- Payment Services: Interacts with PSP & pass into MQ if pass
- PSP -> Stripe / Razorpay (Interacts with banks)
- Wallet Service: Aggregate account value
- Ledger Service: User details
- Service Heartbeat: when lots of systems, checks which alive or not
Solutions
Cascading Failures -> apply rate limit on 2 factors:
- "x%" of user have latency >= "y" threshold in "z" time window
- Service not detected after "k" pings -> limit that type of message to be added (until old processed) (kind of a circuit breaker)
Idem potency
- Use a idempotent-key (unique 32 bit signed payload) -> pass into HTTP header -> disallows same requests if user refreshes for a limited TTL (adjustable).
High Availability
- Hash based Sharding (abstract the hash function - may change), what it solves?
- if one down, other can be used
- if user traffic high, make new connection pool on other DB
- Replication can be:
- Async -> more available, but we need consistency
- Sync -> slower (less available), but better consistency (better for this case)