TY - JOUR TI -

A segment tree based Top-k RMQ algorithm and its application to the autocomplete problem

T2 - IS - KW - All-Russian Classifier of Addresses KW - autocomplete KW - segment tree KW - range minimum query (RMQ) KW - top k range minimum query (Top-k RMQ) KW - algorithm AB - Mikhail S. Dvoretckii - MSc Program Student, Lomonosov Moscow State University; Programmer, IQ Systems LLCAddress: 1, Leninskie Gory, Moscow, 119991, Russian FederationE-mail: mike.dvorecky@gmail.com      An important way of ensuring data quality is controlling data input. One of the methods of doing that is checking the input data against the corresponding reference data where applicable. This may be done via autocomplete. Since reference data is usually stored in a centralized fashion, autocomplete algorithms usually run in client-server architectures and face strict time requirements.      In this article, a new autocomplete task decomposition is formulated using an existing method based on range minimum queries (RMQ). The Top-k RMQ problem is formulated and used in the autocomplete problem decomposition. A segment tree based algorithm is proposed for the Top-k RMQ problem. While the conventional segment tree based RMQ algorithm when used in autocomplete (in the Top-k RMQ sub-problem) repeatedly processes the same nodes on the tree, the proposed algorithm is adapted directly to the Top-k RMQ problem and does not require any node of the segment tree to be processed more than twice. A complexity analysis is made for both the new Top-k RMQ algorithm and the conventional segment tree-based RMQ approach. This analysis considers different implementations of priority queues used in these algorithms, specifically binary heaps and ordered arrays. The new algorithm has time complexity that is not lower than that of the conventional algorithms with any priority queue implementation.      To prove the practical value of the new algorithm, a series of experiments was conducted using the data from the All-Russian Classifier of Addresses - a practical source of reference data for Russian address inputs. The new algorithm demonstrates better time efficiency than the conventional one in all experiments with all priority queue implementations. AU - Mikhail Dvoretckii UR - https://bijournal.hse.ru/en/2017--1 (39)/205283637.html PY - 2017 SP - 48-54 VL -