Node Splitting Algorithms
The paper gives three possible algorithms for node splitting. The first one is the exhaustive algorithm, which generates all possible splits and chooses the best. This takes quite long, and thus the other two described algorithms are better alternatives.
The second one is a quadratic-cost algorithm, which tries to find a small-area split while not insisting on the one with the smallest area. This algorithm chooses the two entries of the M+1 entries that would waste the most area if both were put in the same group. These two elements are the first elements of the two new groups. The remaining entries are then assigned to the two groups by calculating the area expansion required to add each entry to each group and then assigning the entry that shows the greatest difference of the two groups.
The third algorithm, a linear-cost algorithm, is quite similar to the quadratic algorithm. The only difference is in the process of choosing the two entries that are supposed to be the first elements of the groups. Instead of calculating the inefficiency of grouping entries together and then choosing the most wasteful pair, the linear algorithm searches for extreme rectangles along all dimensions and selects the most extreme pair. Extreme rectangles, according to Guttman’s interpretation, are rectangles with the highest low side or the lowest high side.
Outlook on Alternatives
Other scientists continuously worked on improving this data structure regarding specific queries or applications. Each variation focuses on certain aspects, such as optimal insertion or range query retrieval [2]. Some combined the R-tree with other index structures to obtain their advantages as well.
The R+-tree [2, 4] avoids that several subtrees have to be searched during the execution of a point location query. It focuses on small rectangles as large rectangles can worsen the performance for range queries due to the increased possibility of overlaps.
In 1990, the R*-tree [2, 4, 5] was published. The R-tree focuses on the minimization of the area of the rectangles. The R*-tree considers additional criteria: the minimization of overlap, the minimization of margins, and the maximization of storage utilization. The main idea is to find the best possible combination while following an engineering approach. Doing so, the data structure achieves much better performance for all queries than the R-tree. It is widespread in both research and commercial areas and is regarded as the most robust variation of the R-tree.
Another variant is the Priority R-tree [2, 4] that has a guaranteed worst-case performance. PR-trees perform similar to the R-tree regarding queries and outperform every variation of the R-tree when handling extreme data. However, there are faster variants, if the main wish is efficient bulk-loading.
To summarize, many applications in different sectors need to handle multi-dimensional data, which is why indexing spatial data is still an important area of research. Due to its excellent performance and simple implementation, the R-tree is widespread in modern applications and has resulted in a large number of variations.
For further research, please regard my source for the data structure in its original form [6] and publications on overviews of the different variations [2, 4].
References:
[1] www.bowdoin.edu/~ltoma/teaching/cs340/spring08/ (04.01.2020)
[2] Balasubramanian, Sugumaran, A State-of-Art in R-Tree Variants for Spatial Indexing, International Journal of Computer Applications 2012
[3] Kothuri, Ravada, Abugov, Quadtree and R-tree Indexes in Oracle Spatial: A Comparison using GIS Data, SIGMOD 2002
[4] www.bowdoin.edu/~ltoma/teaching/cs340/spring08/Papers/Rtree-chap2.pdf (05.01.2020)
[5] Beckmann, Kriegel, Schneider, Seeger, The R*-tree: an efficient and robust access method for points and rectangles, SIGMOD 1990
[6] Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching, SIGMOD 1984