Kleiner-Hartigan trees

Kleiner-Hartigan trees :

Kleiner-Hartigan trees, also known as K-H trees, are a type of data structure used to represent data sets in a hierarchical manner. They are named after the two researchers, William Kleiner and John Hartigan, who first proposed the concept in 1980.

K-H trees are similar to other hierarchical data structures, such as quadtrees and binary trees, in that they divide a data set into smaller and smaller subsets based on some sort of criteria. In the case of K-H trees, the criteria used to divide the data set is the median value of the data set along a given dimension.

For example, consider a data set containing a collection of 2-dimensional points, such as (1,5), (3,4), (5,2), and (4,3). To construct a K-H tree for this data set, we would first choose a dimension to split the data set along, such as the x-dimension. We would then compute the median value of the x-coordinates of the points in the data set, which in this case is 4. We would then divide the data set into two subsets, one containing all points with x-coordinates less than or equal to 4, and the other containing all points with x-coordinates greater than 4.

The two subsets would then be treated as separate data sets, and the process of dividing them into smaller subsets would be repeated on each subset. For example, the subset containing points with x-coordinates less than or equal to 4 might be further divided into two subsets based on the median value of the y-coordinates of the points in the subset. This process of dividing the data set into smaller and smaller subsets would continue until each subset contains only a single point.

This hierarchical structure of dividing the data set into smaller and smaller subsets allows for efficient searching and sorting of the data. For example, if we want to search for a point with a particular set of coordinates, we can start at the root node of the K-H tree and use the median value to determine which of the two child nodes to traverse to next. This process would continue until we reach the leaf node containing the point we are searching for.

One advantage of K-H trees over other hierarchical data structures, such as quadtrees, is that they do not require a predetermined number of dimensions. In a quadtree, the data set must be divided into four quadrants, each corresponding to a different dimension. In a K-H tree, however, the data set can be divided into as many subsets as necessary, allowing for greater flexibility in representing the data.

Another advantage of K-H trees is their ability to handle outliers in the data set. In a quadtree, for example, an outlier point could cause an entire quadrant to be dedicated to just that one point, leading to an inefficient use of space. In a K-H tree, however, the outlier point would simply be placed in its own subset, allowing for more efficient use of space.

One potential disadvantage of K-H trees is the time required to construct the tree. Because the median value of the data set must be computed for each dimension, constructing a K-H tree can be computationally intensive. However, the time required for searching and sorting the data once the tree has been constructed is typically much faster than other hierarchical data structures.

In conclusion, K-H trees are a flexible and efficient data structure for representing hierarchical data sets. They offer advantages over other hierarchical data structures, such as the ability to handle outliers and the ability to work with any number of dimensions. While they may require more time to construct, their efficient searching and sorting capabilities make them a useful tool in many applications.

Filed under: K - @ 6:59 pm

Data Science Wiki

Unlocking the power of data science, one term at a time.

Archives

Categories

Recent Posts

Recent Comments

Categories

Kleiner-Hartigan trees

Kleiner-Hartigan trees :