Production Concerns

Geohashing & Quadtrees

● Advanced ⏱ 13 min read production

Location-based services — find nearby drivers, restaurants within 2 km, points of interest on a map — require efficient proximity queries over geographic data. A naive approach (calculate distance from every location in the database) doesn’t scale. Geohashing and quadtrees are two spatial indexing techniques that solve this by transforming 2D geographic coordinates into structures that standard databases and distributed systems can query efficiently.

The Proximity Search Problem

Given a user at latitude 37.7749, longitude −122.4194 (San Francisco), find all restaurants within 5 km. The database contains 10 million restaurant locations worldwide.

A brute-force approach — compute the haversine distance from the user to every restaurant and filter those under 5 km — requires 10 million distance calculations per query. At scale, with thousands of concurrent users, this is unusable.

A naive SQL bounding box (WHERE lat BETWEEN x1 AND x2 AND lng BETWEEN y1 AND y2) helps, but a 2D range query on two separate indexed columns requires the database to intersect two range scans — much less efficient than a single-column index lookup. The ideal solution converts the 2D problem into a 1D problem that standard B-tree indexes can handle efficiently.

Geohashing

Geohashing encodes a latitude/longitude pair into a short alphanumeric string by recursively dividing the world into a grid of rectangular cells. The encoding works as follows:

Start with the entire world: latitude −90 to 90, longitude −180 to 180.
Divide into two halves along longitude. If the point is in the left half, encode 0; right half, encode 1.
Divide into two halves along latitude. Bottom half → 0; top half → 1.
Repeat, alternating longitude and latitude splits, encoding a bit at each step.
Group the bits in 5-bit chunks and map each chunk to a Base32 character (0–9, b–z minus a, i, l, o).

The result: a string where longer strings represent smaller, more precise cells, and any prefix of the string represents the containing larger cell.

San Francisco (37.7749, -122.4194):
  Geohash: 9q8yy   (precision 5 — roughly 4.9 km × 4.9 km cell)
  Geohash: 9q8yyh  (precision 6 — roughly 1.2 km × 0.6 km cell)
  Geohash: 9q8yyhb (precision 7 — roughly 153 m × 153 m cell)

Points in the same geohash cell share the same prefix. All restaurants with geohash starting with 9q8yy are in the same roughly 5 km × 5 km area. A string prefix query (WHERE geohash LIKE '9q8yy%') on a B-tree index is a single efficient range scan.

Geohashing divides the world into a hierarchical grid; quadtrees subdivide space adaptively based on point density

Precision and Cell Size

Precision (chars)	Cell width	Cell height	Use case
1	~5,000 km	~5,000 km	Country-level
3	~156 km	~156 km	State/region level
5	~4.9 km	~4.9 km	Neighborhood level
6	~1.2 km	~0.6 km	City block level
7	~153 m	~153 m	Street-level
9	~2.4 m	~4.8 m	Building-level

Choose precision based on your query radius. For "find restaurants within 1 km," precision 6 (1.2 km cells) is appropriate — query the user’s cell and its 8 neighbors to cover the full radius. For "find drivers within 500 m," precision 7 (153 m cells) with neighbor expansion works well.

Proximity Search with Geohash

A critical property of geohashing: two nearby points share a long common prefix — but not always. A point near a cell boundary has a similar geohash to other points in its own cell, but a very different geohash from points just across the boundary in a neighboring cell, even if those points are physically very close.

The solution: always query the target cell and its 8 neighbors. Every geohash cell has 8 adjacent cells, and their geohash strings can be computed directly from the target geohash.

// Python pseudocode
def nearby_restaurants(lat, lng, radius_km):
    precision = geohash_precision_for_radius(radius_km)  # e.g., 6
    center_hash = geohash.encode(lat, lng, precision)
    neighbors = geohash.neighbors(center_hash)  # 8 adjacent cells
    cells_to_query = [center_hash] + neighbors   # 9 cells total

    results = db.query(
        "SELECT * FROM restaurants WHERE geohash6 IN (?)",
        cells_to_query
    )
    # Filter by exact distance to exclude corners of bounding cells
    return [r for r in results if haversine(lat, lng, r.lat, r.lng) <= radius_km]

The database query uses a fast IN-list lookup on a geohash column index, returning a small candidate set. The final haversine distance filter eliminates false positives from cell corners. This two-step approach — coarse index filter then precise distance check — is efficient even at scale.

Quadtrees

A quadtree is a tree data structure where each node represents a rectangular region of space and has exactly four children, each representing one quadrant (NW, NE, SW, SE). The tree is built adaptively: a region is subdivided only when it contains more than a threshold number of points.

Building a quadtree:

Start with one node covering the entire world.
Insert points one by one. When a node exceeds the threshold (e.g., 100 points), split it into four child nodes and distribute its points among them.
Repeat until all nodes are under the threshold or the minimum cell size is reached.

The key difference from geohashing: quadtrees adapt to data density. Dense urban areas are subdivided many times (fine-grained cells); sparse rural areas remain as large coarse cells. Geohash cells are always the same size at a given precision, regardless of how many points they contain.

Proximity search with a quadtree: Traverse the tree from the root, descending into any node whose bounding box intersects the search radius. Collect all points in intersecting leaf nodes, then filter by exact distance. The traversal is O(log N + k) where k is the number of candidate points.

Quadtrees are typically stored in memory (the entire tree for a city fits comfortably in a few hundred MB) or in a distributed cache. They’re especially suited for workloads with highly uneven geographic distribution — which is most real-world location data.

Geohash vs Quadtree

	Geohash	Quadtree
Storage	Column in database (string)	In-memory tree or dedicated store
Cell size	Fixed per precision level	Adaptive to data density
Index type	B-tree index on geohash string	Tree traversal in memory
Query complexity	IN-list lookup (9 cells) + filter	Tree traversal + filter
Database integration	Easy — standard SQL column	Harder — requires in-memory structure
Uneven density	Hot cells with many points	Naturally handles dense regions
Updates	Simple column update	Tree rebalancing on insert/delete

Real-World Use

Uber: Uses a geohash-based approach to partition the city into cells. Drivers broadcast their location updates; the dispatch system routes requests to drivers in the same cell or adjacent cells. The geohash cell is also the unit of surge pricing calculation.

Yelp / Google Maps: Geohash or similar spatial indexing for "restaurants near me" queries. PostGIS (the PostgreSQL geospatial extension) uses R-trees internally, a generalization of quadtrees, for spatial queries.

Twitter / Instagram: Geohash-based location tagging allows fetching posts geotagged within a region efficiently — prefix search on geohash covers the bounding box.

PostGIS and spatial databases: PostgreSQL + PostGIS, MySQL spatial, and MongoDB support geospatial indexes natively — under the hood, these use R-trees or similar space-partitioning structures. Use these for general-purpose geospatial queries rather than implementing your own quadtree.

Design Considerations

Always expand to 9 cells (center + 8 neighbors). Querying only the center cell misses points just across cell boundaries. The neighbor expansion is cheap and prevents the boundary problem from silently degrading results.
Index the geohash column at the precision you query. Pre-compute and store geohash6 (for km-scale queries) alongside geohash7 (for 100m-scale queries). A single geohash9 column supports all precisions via prefix query (LIKE 'abc%'), but storing pre-computed precision columns avoids the prefix scan.
For driver tracking (high-frequency updates), use Redis. Redis Geo (backed by a sorted set with geohash scores) supports GEORADIUS queries and fast location updates. Drivers update their position every few seconds — a relational database index would thrash under this write rate. Redis handles it efficiently in memory.
Keep the quadtree in memory for read performance. For static or slowly changing data (restaurant locations, store locations), load the full quadtree into memory at service startup. Serve proximity queries from memory — microsecond latency. Rebuild the tree periodically as data changes.
Use PostGIS for complex geospatial queries. Proximity is the simple case. For polygon containment (is this point within a delivery zone?), road network routing, or spatial joins, use a dedicated geospatial database rather than building spatial indexing from scratch.