K-nearest Neighbors Search by Random Projection Forests
K-nearest neighbors (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. As demonstrated by experiments on a wide collection of real datasets, our method achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. rpForests has a very low computational complexity as a tree-based methodology. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multicore computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights on rpForests by showing the exponential decay of neighboring points being separated by ensemble random projection trees when the ensemble size increases. Our theory can also be used to refine the choice of random projections in the growth of rpForests; experiments show that the effect is remarkable.
- System : i3 Processor
- Hard Disk : 500 GB.
- Monitor : 15’’ LED
- Input Devices : Keyboard, Mouse
- Ram :1 GB
- Operating system : Windows 7/UBUNTU.
- Coding Language : Java 1.7 ,Hadoop 0.8.1
- IDE : Eclipse
- Database : MYSQL
Donghui Yan, Yingjie Wang, Jin Wang, Honggang Wang, and Zhenpeng Li, “K-nearest Neighbors Search by Random Projection Forests”, IEEE Transactions on Big Data, 2019.