ISSN 2587-814X (print),
ISSN 2587-8158 (online)

Russian version: ISSN 1998-0663 (print),
ISSN 2587-8166 (online)

Nikolay Golov 1, Lars Ronnback2
  • 1 National Research University Higher School of Economics, 20 Myasnitskaya Str., Moscow, 101000, Russian Federation
  • 2 Stocholm University , SE-106 91 Stockholm, Sweden.

SQL query optimization for highly normalized Big Data

2015. No. 3(33) . P. 7–14 [issue contents]

Nikolay I. Golov - Lecturer, Department of Business Analytics, School of Business Informatics, Faculty of Business and Management, National Research University Higher School of Economics. 
Address: 20, Myasnitskaya Street, Moscow, 101000, Russian Federation.

Lars Ronnback - Lecturer, Department of Computer Science, Stocholm University
Address: SE-106 91 Stockholm, Sweden

      This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes).
      Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.

Citation: Golov N., Ronnback L. (2015)
Optimizatsiya SQL-zaprosov dlya vysokonormalizovannykh bol'shikh dannykh
[SQL query optimization for highly normalized Big Data].
Biznes-informatika, no 3(33) , pp. 7-14 (in English)
Rambler's Top100 rss