Hide
Раскрыть

ISSN 2587-814X (print),
ISSN 2587-8158 (online)

Russian version: ISSN 1998-0663 (print),
ISSN 2587-8166 (online)

Mikhail Dvoretckii1
  • 1 Lomonosov Moscow State University, 1, Leninskie Gory, Moscow, 119991, Russian Federation

A segment tree based Top-k RMQ algorithm and its application to the autocomplete problem

2017. No. 1 (39). P. 48–54 [issue contents]

Mikhail S. Dvoretckii - MSc Program Student, Lomonosov Moscow State University; Programmer, IQ Systems LLC
Address: 1, Leninskie Gory, Moscow, 119991, Russian Federation
E-mail: mike.dvorecky@gmail.com

      An important way of ensuring data quality is controlling data input. One of the methods of doing that is checking the input data against the corresponding reference data where applicable. This may be done via autocomplete. Since reference data is usually stored in a centralized fashion, autocomplete algorithms usually run in client-server architectures and face strict time requirements.
      In this article, a new autocomplete task decomposition is formulated using an existing method based on range minimum queries (RMQ). The Top-k RMQ problem is formulated and used in the autocomplete problem decomposition. A segment tree based algorithm is proposed for the Top-k RMQ problem. While the conventional segment tree based RMQ algorithm when used in autocomplete (in the Top-k RMQ sub-problem) repeatedly processes the same nodes on the tree, the proposed algorithm is adapted directly to the Top-k RMQ problem and does not require any node of the segment tree to be processed more than twice. A complexity analysis is made for both the new Top-k RMQ algorithm and the conventional segment tree-based RMQ approach. This analysis considers different implementations of priority queues used in these algorithms, specifically binary heaps and ordered arrays. The new algorithm has time complexity that is not lower than that of the conventional algorithms with any priority queue implementation.
      To prove the practical value of the new algorithm, a series of experiments was conducted using the data from the All-Russian Classifier of Addresses – a practical source of reference data for Russian address inputs. The new algorithm demonstrates better time efficiency than the conventional one in all experiments with all priority queue implementations.

Citation:

Dvoretckii M.S. (2017) A segment tree based Top-k RMQ algorithm and its application to the autocomplete problem. Business Informatics, no. 1 (39), pp. 48–54. DOI: 10.17323/1998-0663.2017.1.48.54

BiBTeX
RIS
 
 
Rambler's Top100 rss