spatial indexes and data structures
After spent the past two years in the location/gis industry and read numerous good posts about spatial indexing techniques and data structure, I think its a good time to summarize this topic.
Linear space:
binary search tree - the fundamental of search in numeric space.
b tree - "the more you think about what the B in B-trees means, the better you understand B-trees."
Multidimensional space:
kd tree - is an extension of bst into multidimensional space, best for knn search. The primary distinction between kd tree and r tree is that the former partitions on space and the later partitions on data.
r tree - is an extension of b tree into multidimensional space. It is efficient in range search and frequent update, and spatial object (polygon) index. Since data cannot be sorted in k dimension, the rectangles are allowed to overlap and page fill is not as optimal compared to b tree. This is also a slightly confusing term as r(ectangle) implies two dimension but the data structure can support multidimensional data (maybe kdr tree?).
kdb tree - combines properties of K-D-trees and B-trees.
Geo-specific space:
quadtree - This is probably the most confusing concept as it can be a data structure for 2 dimensional data (aka 2d tree), the vanilla version of 2d spatial index, and a group of geo indexes, and a specific type of geo index! I will explain why this is the case:
Quadtree literally means splitting the 2d space into 4 cells, often the same size. Quadtree can be classified as a special kd-tree, but since the splitting hyperplane(s) are predetermined instead of generating on the fly, they are quit different. Or if the quad cells are implemented in a way that allow different sizes, then it is indeed a 2d tree.
Since the least cells you can split in 2 dimension is 4 (2^2), it is the vanilla version of spatial index (QuadKeys for map/geo).
Since it is so vanilla the term is also used to describe similar hierarchal spatial indexes, even if the index has different number of leaf nodes. Quadtree-like structure do not scale well to high dimensions, due to the exponential dependency in the dimension, but is useful in 2 dimension space.
quadtree-like geo indexes - these indexes are intended for mapping points on earth into 2 dimensional space with 1 unique identifier in a hierarchal index system. Each implementation has various properties, mainly distortion, neighbor traversal and subdivision.
Some recommended reading with regard to spatial indexing:
Unwinding Uber’s Most Efficient Service