spatial indexes and data structures

After spent the past two years in the location/gis industry and read numerous good posts about spatial indexing techniques and data structure, I think its a good time to summarize this topic.

Linear space:

  • binary search tree - the fundamental of search in numeric space.

  • b tree - "the more you think about what the B in B-trees means, the better you understand B-trees."

Multidimensional space:

  • kd tree - is an extension of bst into multidimensional space, best for knn search. The primary distinction between kd tree and r tree is that the former partitions on space and the later partitions on data.

  • r tree - is an extension of b tree into multidimensional space. It is efficient in range search and frequent update, and spatial object (polygon) index. Since data cannot be sorted in k dimension, the rectangles are allowed to overlap and page fill is not as optimal compared to b tree. This is also a slightly confusing term as r(ectangle) implies two dimension but the data structure can support multidimensional data (maybe kdr tree?).

  • kdb tree - combines properties of K-D-trees and B-trees.

Geo-specific space:

  • quadtree - This is probably the most confusing concept as it can be a data structure for 2 dimensional data (aka 2d tree), the vanilla version of 2d spatial index, and a group of geo indexes, and a specific type of geo index! I will explain why this is the case:

  1. Quadtree literally means splitting the 2d space into 4 cells, often the same size. Quadtree can be classified as a special kd-tree, but since the splitting hyperplane(s) are predetermined instead of generating on the fly, they are quit different. Or if the quad cells are implemented in a way that allow different sizes, then it is indeed a 2d tree.

  2. Since the least cells you can split in 2 dimension is 4 (2^2), it is the vanilla version of spatial index (QuadKeys for map/geo).

  3. Since it is so vanilla the term is also used to describe similar hierarchal spatial indexes, even if the index has different number of leaf nodes. Quadtree-like structure do not scale well to high dimensions, due to the exponential dependency in the dimension, but is useful in 2 dimension space.

  • quadtree-like geo indexes - these indexes are intended for mapping points on earth into 2 dimensional space with 1 unique identifier in a hierarchal index system. Each implementation has various properties, mainly distortion, neighbor traversal and subdivision.

    • QuadKeys - consistent 4 cells

    • S2 - 6 top level cells, then 4 child cells

    • geohash - consistent 32 cells

    • H3 - 12 top level triangle cells, then 7 hexagon child cells. it is also the index that least similar to quad tree as it's grid is not rectangular.

Some recommended reading with regard to spatial indexing:

Unwinding Uber’s Most Efficient Service

A dive into spatial search algorithms

The Bkd Tree

Spatial Trees