# K-D Tree

### Binary Search Tree (BST)

#### Tree

* In graph theory, a **tree** is a connected **acyclic undirected graph**, and a **forest** is a disjoint union of trees.
* In computer science, a **tree** is a widely used abstract data type (ADT) that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes. (This "linked" relation can be implemented by a linked list or an array with certain indexing rules.)

#### Binary Tree

* A binary tree is a tree data structure in which each node has at most two children. (This is the only constraint.)

**Binary Search Tree (BST)**

* In computer science, a binary search tree (BST), also called an **ordered** or **sorted** binary tree, is a rooted binary tree data structure whose internal nodes each store a key greater than all the keys in the node's left subtree and less than those in its right subtree.&#x20;
* Used for sorting: the construction step takes O(n log n) time, then the sorted list can be obtained by traversing all nodes in O(log n) time.&#x20;
* | Algorithm |   | Average  | Worst case |
  | --------- | - | -------- | ---------- |
  | Space     |   | O(n)     | O(n)       |
  | Search    |   | O(log n) | O(n)       |
  | Insert    |   | O(log n) | O(n)       |
  | Delete    |   | O(log n) | O(n)       |

### K-D Tree

* In computer science, a k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space
* k-d trees are a special case of binary space partitioning trees.
* | Algorithm |   | Average  | Worst case |
  | --------- | - | -------- | ---------- |
  | Space     |   | O(n)     | O(n)       |
  | Search    |   | O(log n) | O(n)       |
  | Insert    |   | O(log n) | O(n)       |
  | Delete    |   | O(log n) | O(n)       |

### Comparison of various data structures

![](https://3310611219-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LwRt8_BWYScDaMKj3wZ%2F-MglC3vmJu1V0nbsSiuY%2F-Mglybe2N54822Dvyf1p%2Fdata_structure.png?alt=media\&token=8b8c3047-d05e-474e-bfdc-413097424562)

### Discussions on K-D Tree

* It is said that Euclidean distance is a bad metric in high dimensional space.
* There are debates on using K-D Tree for high dimensional data.
  * Against: [When Is "Nearest Neighbor" Meaningful?](https://link.springer.com/content/pdf/10.1007%2F3-540-49257-7_15.pdf) by Kevin Beyer et al.
  * For: [When is ‘nearest neighbour’ meaningful: A converse theorem and implications](https://www.sciencedirect.com/science/article/pii/S0885064X09000260) by Robert Durrant et al. ([Bob's answer on Stack Exchange](https://stats.stackexchange.com/a/10976))
* More discussions on [Stackoverflow: Nearest neighbors in high-dimensional data?](https://stackoverflow.com/questions/5751114/nearest-neighbors-in-high-dimensional-data)
