Architecture Patterns

API Gateway

● Intermediate ⏱ 12 min read architecture

An API Gateway is a server that acts as the single entry point for all client requests in a microservices architecture. Instead of clients knowing the addresses of dozens of backend services, they send all requests to the gateway. The gateway handles cross-cutting concerns — authentication, rate limiting, routing, protocol translation — and forwards requests to the appropriate upstream services.

What Is an API Gateway

In a monolithic architecture, clients talk directly to one server. In a microservices architecture, a single user action may require calls to five different services — each with its own address, protocol, and authentication scheme. Exposing this complexity directly to clients creates tight coupling: every time a service is added, split, or renamed, every client must be updated.

An API gateway abstracts the internal service topology. Clients see a single, stable API surface. Internally, services can be reorganized without affecting clients. The gateway acts as a facade over the entire backend.

API Gateway as the single entry point: clients talk to the gateway, which routes to internal services

Core Functions

An API gateway consolidates concerns that would otherwise be duplicated across every service:

FunctionWithout GatewayWith Gateway
AuthenticationEach service validates tokensGateway validates once; services trust the gateway
Rate limitingEach service enforces its own limitsGateway enforces globally before requests reach services
SSL terminationEach service manages its own TLSGateway terminates TLS; internal traffic can be plain HTTP
LoggingEach service logs separatelyGateway provides a unified access log across all services
RoutingClients know every service addressClients know only the gateway; it knows where to route

Routing and Load Balancing

The gateway inspects each incoming request and routes it to the correct upstream service based on configurable rules — typically matching on URL path, HTTP method, headers, or query parameters:

GET  /api/users/*        → User Service
POST /api/orders         → Order Service
GET  /api/products/*     → Product Service
GET  /api/search?q=*     → Search Service

The gateway also performs load balancing across multiple instances of each service, using round-robin, least-connections, or weighted routing. It can perform health checks and automatically remove unhealthy instances from rotation — the same function a dedicated load balancer provides, but integrated with routing logic.

Advanced routing capabilities include:

Authentication and Authorization

Without a gateway, every service must implement token validation — verifying JWTs, checking API keys, calling an identity provider. This is error-prone duplication: one service gets the auth logic wrong, and you have a security gap.

The gateway validates credentials on every request before forwarding to upstream services. After validation, it injects claims into request headers — the upstream service trusts that X-User-Id: 42 and X-User-Role: admin are already verified and doesn’t need to re-validate.

Authorization policies (which roles can call which endpoints) can be enforced at the gateway or left to individual services. Coarse-grained authorization (only authenticated users can call /api/orders) belongs at the gateway. Fine-grained authorization (a user can only see their own orders) belongs in the service, which knows the domain context.

💡
Zero Trust Inside the Perimeter

Don’t assume internal service-to-service traffic is safe just because it came through the gateway. A compromised internal service can make arbitrary requests. Apply mutual TLS (mTLS) for service-to-service communication and validate service identity, not just user identity. The gateway handles user auth; a service mesh handles service-to-service auth.

Rate Limiting and Throttling

The gateway is the ideal enforcement point for rate limiting — it sees all traffic before it reaches any service. Limits can be applied per API key, per user ID, per IP address, or globally per endpoint.

Common rate limiting strategies at the gateway:

The gateway returns Retry-After and X-RateLimit-Remaining headers so clients can adapt their request rate. Well-behaved clients respect these; the gateway enforces limits for those that don’t.

Request Aggregation

A client loading a dashboard may need data from five services. Without aggregation, the client makes five sequential or parallel HTTP calls — each with its own latency and failure mode. The gateway can aggregate these into a single request: the client sends one call, the gateway fans out to five services in parallel, merges the responses, and returns a single result.

This reduces client-side complexity, eliminates multiple round-trip latencies from the client to the data center, and hides the internal service decomposition from the client. The tradeoff is that the gateway now contains aggregation logic — it’s no longer a dumb router but a smart orchestrator, which increases its complexity and coupling to service response shapes.

Backend for Frontend (BFF)

A Backend for Frontend is a specialized API gateway variant — a dedicated gateway per client type (web, mobile, third-party). Instead of one gateway that tries to serve all clients with a generic API, each BFF is tailored to its client’s needs:

BFFs are owned by the frontend teams — the web team owns the web BFF, the mobile team owns the mobile BFF. This gives frontend teams autonomy over their API contract without requiring backend service changes. Backend services remain generic; BFFs are the translation layer.

Implementations

Kong: Open-source, Lua-based, built on nginx. Rich plugin ecosystem (auth, rate limiting, logging, transformations). Can be run on-premise or as a managed service (Kong Konnect).

AWS API Gateway: Fully managed. Integrates natively with Lambda, ECS, and other AWS services. Handles scaling and availability automatically. Best suited for AWS-native architectures.

nginx / Traefik: Lightweight reverse proxies that can serve as simple API gateways. Traefik is particularly popular for Kubernetes workloads — it discovers services automatically via Kubernetes annotations. Less featureful than purpose-built gateways but has minimal operational overhead.

Envoy: A high-performance proxy written in C++, used as the data plane in Istio and other service meshes. Can be used standalone as a gateway. Extremely extensible via WebAssembly filters.

GraphQL Federation (Apollo Router): For teams using GraphQL, the Apollo Router acts as a gateway that federates multiple GraphQL subgraphs into a single schema. Clients query one endpoint; the router distributes subqueries to the appropriate services.

Design Considerations