API Gateway
An API Gateway is a server that acts as the single entry point for all client requests in a microservices architecture. Instead of clients knowing the addresses of dozens of backend services, they send all requests to the gateway. The gateway handles cross-cutting concerns — authentication, rate limiting, routing, protocol translation — and forwards requests to the appropriate upstream services.
What Is an API Gateway
In a monolithic architecture, clients talk directly to one server. In a microservices architecture, a single user action may require calls to five different services — each with its own address, protocol, and authentication scheme. Exposing this complexity directly to clients creates tight coupling: every time a service is added, split, or renamed, every client must be updated.
An API gateway abstracts the internal service topology. Clients see a single, stable API surface. Internally, services can be reorganized without affecting clients. The gateway acts as a facade over the entire backend.
Core Functions
An API gateway consolidates concerns that would otherwise be duplicated across every service:
| Function | Without Gateway | With Gateway |
|---|---|---|
| Authentication | Each service validates tokens | Gateway validates once; services trust the gateway |
| Rate limiting | Each service enforces its own limits | Gateway enforces globally before requests reach services |
| SSL termination | Each service manages its own TLS | Gateway terminates TLS; internal traffic can be plain HTTP |
| Logging | Each service logs separately | Gateway provides a unified access log across all services |
| Routing | Clients know every service address | Clients know only the gateway; it knows where to route |
Routing and Load Balancing
The gateway inspects each incoming request and routes it to the correct upstream service based on configurable rules — typically matching on URL path, HTTP method, headers, or query parameters:
GET /api/users/* → User Service
POST /api/orders → Order Service
GET /api/products/* → Product Service
GET /api/search?q=* → Search Service
The gateway also performs load balancing across multiple instances of each service, using round-robin, least-connections, or weighted routing. It can perform health checks and automatically remove unhealthy instances from rotation — the same function a dedicated load balancer provides, but integrated with routing logic.
Advanced routing capabilities include:
- Canary routing: Send 5% of traffic to a new service version, 95% to the stable version. Roll forward or back based on error rates.
- A/B routing: Route specific user segments to different service versions based on headers or cookies.
- Shadow routing: Mirror production traffic to a new service without affecting production responses — test under real load with zero risk.
Authentication and Authorization
Without a gateway, every service must implement token validation — verifying JWTs, checking API keys, calling an identity provider. This is error-prone duplication: one service gets the auth logic wrong, and you have a security gap.
The gateway validates credentials on every request before forwarding to upstream services. After validation, it injects claims into request headers — the upstream service trusts that X-User-Id: 42 and X-User-Role: admin are already verified and doesn’t need to re-validate.
Authorization policies (which roles can call which endpoints) can be enforced at the gateway or left to individual services. Coarse-grained authorization (only authenticated users can call /api/orders) belongs at the gateway. Fine-grained authorization (a user can only see their own orders) belongs in the service, which knows the domain context.
Don’t assume internal service-to-service traffic is safe just because it came through the gateway. A compromised internal service can make arbitrary requests. Apply mutual TLS (mTLS) for service-to-service communication and validate service identity, not just user identity. The gateway handles user auth; a service mesh handles service-to-service auth.
Rate Limiting and Throttling
The gateway is the ideal enforcement point for rate limiting — it sees all traffic before it reaches any service. Limits can be applied per API key, per user ID, per IP address, or globally per endpoint.
Common rate limiting strategies at the gateway:
- Token bucket: Each client has a bucket that refills at a fixed rate. Requests consume tokens; if the bucket is empty, the request is rejected with HTTP 429.
- Fixed window: Allow N requests per time window (e.g., 1000 requests per minute). Simple but allows bursts at window boundaries.
- Sliding window: Tracks requests over a rolling time window. Smoother than fixed window; slightly more expensive to compute.
The gateway returns Retry-After and X-RateLimit-Remaining headers so clients can adapt their request rate. Well-behaved clients respect these; the gateway enforces limits for those that don’t.
Request Aggregation
A client loading a dashboard may need data from five services. Without aggregation, the client makes five sequential or parallel HTTP calls — each with its own latency and failure mode. The gateway can aggregate these into a single request: the client sends one call, the gateway fans out to five services in parallel, merges the responses, and returns a single result.
This reduces client-side complexity, eliminates multiple round-trip latencies from the client to the data center, and hides the internal service decomposition from the client. The tradeoff is that the gateway now contains aggregation logic — it’s no longer a dumb router but a smart orchestrator, which increases its complexity and coupling to service response shapes.
Backend for Frontend (BFF)
A Backend for Frontend is a specialized API gateway variant — a dedicated gateway per client type (web, mobile, third-party). Instead of one gateway that tries to serve all clients with a generic API, each BFF is tailored to its client’s needs:
- Web BFF: Returns full page data, handles session cookies, aggregates detailed responses for desktop UIs.
- Mobile BFF: Returns compact payloads, handles push notification tokens, pre-aggregates data to minimize mobile network calls.
- Partner API BFF: Exposes a stable, versioned API with strict backward compatibility for third-party consumers.
BFFs are owned by the frontend teams — the web team owns the web BFF, the mobile team owns the mobile BFF. This gives frontend teams autonomy over their API contract without requiring backend service changes. Backend services remain generic; BFFs are the translation layer.
Implementations
Kong: Open-source, Lua-based, built on nginx. Rich plugin ecosystem (auth, rate limiting, logging, transformations). Can be run on-premise or as a managed service (Kong Konnect).
AWS API Gateway: Fully managed. Integrates natively with Lambda, ECS, and other AWS services. Handles scaling and availability automatically. Best suited for AWS-native architectures.
nginx / Traefik: Lightweight reverse proxies that can serve as simple API gateways. Traefik is particularly popular for Kubernetes workloads — it discovers services automatically via Kubernetes annotations. Less featureful than purpose-built gateways but has minimal operational overhead.
Envoy: A high-performance proxy written in C++, used as the data plane in Istio and other service meshes. Can be used standalone as a gateway. Extremely extensible via WebAssembly filters.
GraphQL Federation (Apollo Router): For teams using GraphQL, the Apollo Router acts as a gateway that federates multiple GraphQL subgraphs into a single schema. Clients query one endpoint; the router distributes subqueries to the appropriate services.
Design Considerations
- The gateway is a critical path component. Every client request flows through it. It must be highly available — deploy multiple instances behind a load balancer, and ensure failover is automatic. A gateway outage is a total outage.
- Avoid business logic in the gateway. The gateway should handle cross-cutting infrastructure concerns, not domain logic. If you find yourself writing if-statements about order statuses or user tiers in the gateway, that logic belongs in a service. Gateways with business logic become a deployment dependency for every feature change.
- Timeout and circuit breaking. Configure timeouts on every upstream route. If an upstream service hangs, the gateway should return 504 after the timeout rather than holding connections open indefinitely. Add circuit breakers to stop sending requests to a failing upstream until it recovers.
- Observability at the gateway. The gateway has visibility into every request across every service — it’s the best place to emit unified access logs, latency histograms, and error rates per route. Use this data to identify slow services and high-error endpoints before users report them.
- Version your API at the gateway. Route
/v1/to stable services,/v2/to newer versions. This lets you run multiple API versions simultaneously, deprecate old versions gradually, and avoid forcing clients to upgrade all at once.