Hilbert spectrum support vector regression for public companies’ sales forecast

Keywords: embeddings, support vector regression, Hilbert spectrum support vector regression, nonlinearity, non-stationarity, Hilbert spectrum, features density distribution, revenue forecast

Abstract

Accurate and timely revenue forecasting is of critical importance to investors in public companies, who base their investment decisions on the Discounted Cash Flow (DCF) model. Market practice dictates that financial analysts develop financial models by focusing on revenue growth projections for the first year of the forecast horizon, which holds the highest weighting in discounted cash flow calculations. These projections are typically derived through expert assessment of publicly available quarterly reporting. However, the potential of contemporary data science tools, particularly machine learning approaches that employ linear approximation techniques to model nonlinear patterns to enhance current quarter revenue analysis is largely underutilized. The present study focused on developing source code to apply the Hilbert Spectrum Support Vector Regression (Hilbert spectrum SVR, HSVR) method for sales prediction. Testing HSVR on data of companies listed on the Moscow Exchange and comparing its performance metrics with those of classical Support Vector Regression (SVR), led to the conclusion that HSVR can be deployed in industrial settings for revenue forecasting of public companies using news vectors – vector representations (embeddings) of news articles.

Downloads

Download data is not yet available.

References

Damodaran, A. (2002). Investment Valuation: Tools and Techniques for Determining the Value of Any Asset (2nd ed.). John Wiley & Sons.

Chan, L. K. C., Karceski, J., & Lakonishok, J. (2003). The level and persistence of growth rates. The Journal of Finance, 58(2), 643–684. https://doi.org/10.1111/1540-6261.00540

Pluss, D., Groso, A., & Meyer, T. (2013). Expert judgements in risk analysis: A strategy to overcome uncertainties. 14th EFCE International Conference on Loss Prevention and Safety, 31. https://doi.org/10.3303/CET1331052

Purkayastha, A., Palmaro, E., Falk-Krzesinski, H., & Baas, J. (2018). Comparison of two article-level, field-independent citation metrics: Field-Weighted Citation Impact (FWCI) and Relative Citation Ratio (RCR). SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3237564

Chen, F.-L., & Ou, T.-Y. (2011). Sales forecasting system based on Gray extreme learning machine with Taguchi method in retail industry. Expert Systems with Applications, 38(3), 1336–1345. https://doi.org/10.1016/j.eswa.2010.07.014

Shi, J., Guo, J., & Zheng, S. (2012). Evaluation of hybrid forecasting approaches for wind speed and power generation time series. Renewable and Sustainable Energy Reviews, 16(5), 3471–3480. https://doi.org/10.1016/j.rser.2012.02.044

Arias, M., Arratia, A., & Xuriguera, R. (2014). Forecasting with Twitter data. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1), 1–24. https://doi.org/10.1145/2542182.2542190

Jiang, X., Zhang, L., & Chen, X. (2014). Short-term forecasting of high-speed rail demand: A hybrid approach combining ensemble empirical mode decomposition and gray support vector machine with real-world applications in China. Transportation Research Part C: Emerging Technologies, 44, 110–127.

De Giorgi, M. G., Congedo, P. M., Malvoni, M., & Laforgia, D. (2015). Error analysis of hybrid photovoltaic power forecasting models: A case study of mediterranean climate. Energy Conversion and Management, 100, 117–130. https://doi.org/10.1016/j.enconman.2015.04.078

Ferreira, K. J., Lee, B. H. A., & Simchi-Levi, D. (2015). Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing & Service Operations Management, 18(1), 59–78. https://doi.org/10.1287/msom.2015.0561

Hajek, P., & Henriques, R. (2017). Mining corporate annual reports for intelligent detection of financial statement fraud – A comparative study of machine learning methods. Knowledge-Based Systems, 128, 139–152. https://doi.org/10.1016/j.knosys.2017.05.001

Loureiro, A. L., Miguéis, V. L., & da Silva, L. F. (2018). Exploring the use of deep neural networks for sales forecasting in fashion retail. Decision Support Systems, 114, 81–93. https://doi.org/10.1016/j.dss.2018.08.010

Koshute, P., Robinette, M., & Fagan, W. (2025). Using random forests to infer nonlinear step selection effects. bioRxiv. https://doi.org/10.1101/2025.03.27.644749

Bouniot, Q., Redko, I., Mallasto, A., Laclau, C., Arndt, K., Struckmeier, O., Heinonen, M., Kyrki, V., & Kaski, S. (2024). From Alexnet to Transformers: Measuring the non-linearity of deep neural networks with affine optimal transport. arXiv:2310.11439v4. https://doi.org/10.48550/arXiv.2310.11439

Lu, H., Azimi, M., & Iseley, T. (2019). Short-term load forecasting of urban gas using a hybrid model based on improved fruit fly optimization algorithm and support vector machine. Energy Reports, 5, 666–677. https://doi.org/10.1016/j.egyr.2019.06.003

Bedi, J., & Toshniwal, D. (2020). Energy load time-series forecast using decomposition and autoencoder integrated memory network. Applied Soft Computing, 93, 106189. https://doi.org/10.1016/j.asoc.2020.106390

Ma, S., & Fildes, R. (2021). Retail sales forecasting with meta-learning. European Journal of Operational Research, 288(1), 111–128. https://doi.org/10.1016/j.ejor.2020.05.038

Evgeniou, T., & Pontil, M. (1999). On the Vγ dimension for regression in reproducing kernel Hilbert spaces. Algorithmic Learning Theory (eds. Watanabe, O., Yokomori, T.). Lecture Notes in Computer Science, 1720, 106–117. https://doi.org/10.1007/3-540-46769-6_9

Fock, V. (1932). Konfigurationsraum und zweite Quantelung [Configuration space and second Quantization]. Zeitschrift für Physik, 75, 622–647.

Berezin, F. A. (1986). The method of second quantization. Academic Press, 1986.

Alpay, D., Colombo, F., Diki, K., Sabadini, I., & Struppa, D. (2023). Superoscillations and Fock spaces. arXiv:2304.11551. https://doi.org/10.48550/arXiv.2304.11551

Emergent Mind. (2025). Hilbert space embeddings of POMDPs. https://www.emergentmind.com/topics/hilbert-space-embeddings-of-pomdps

Jorgensen, P. E., Song, M. S., & Tian, J. (2023). Operator theory, kernels, and feedforward neural networks. arXiv:2301.01327. https://doi.org/10.48550/arXiv.2301.01327

Parada-Mayorga, A., Agorio, L., Ribeiro, A., & Bazerque, J. (2024). Convolutional filtering with RKHS algebras. arXiv:2411.01341. https://doi.org/10.48550/arXiv.2411.01341

Fermanian, A., Marion, P., Vert, J. P., & Biau, G. (2021). Framing RNN as a kernel method: A neural ODE approach. arXiv:2106.01202. https://doi.org/10.48550/arXiv.2106.01202

Dagdoug, M., Dombry, C., & Duchamps, J. J. (2025). An RKHS Perspective on Tree Ensembles. arXiv:2512.00397. https://doi.org/10.48550/arXiv.2512.00397

Vapnik, V., & Lerner, A. (1963). Pattern recognition using generalized portrait method. Automation and Remote Control, 24, 774–780.

Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. COLT '92: Proceedings of the Fifth annual workshop on Computational learning theory, 144–152. https://doi.org/10.1145/130385.130401

Vapnik, V., Golowich, S., & Smola, A. (1997). Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems, 9, 281–287. https://papers.nips.cc/paper_files/paper/1996/file/4f284803bd0966cc24fa8683a34afc6e-Paper.pdf

Vapnik, V. N. (1979). Vosstanovlenie zavisimostei po empiricheskim dannym [Reconstruction of relationships from empirical data]. Moscow: Nauka.

Legenchuk, I. G. (2025). Theoretical foundations of forecasting company revenue using news support vector regression. Proceedings of the 67th All-Russian Scientific Conference of MIPT, March 31 – April 5, 2025. Applied Mathematics and Computer Science (pp. 142–145). Moscow: Fizmatkniga (in Russian). https://old.mipt.ru/upload/medialibrary/1e0/konferentsiya_5_fpmi.pdf

Legenchuk, I. G. (2026). Obuchenie modeli predskazaniya nelineynykh nestatsionarnykh protsessov s pomoshch'yu regressii opornykh vektorov spektra Gil'berta (Hilbert spectrum support vector regression, HSVR) [Model training for the prediction of nonlinear non stationary processes with Hilbert spectrum support vector regression]. Russian Computer Program Registration No. 2026619275. Federal Institute of Industrial Property of the Russian Federation. https://fips.ru/registers-doc-view/fips_servlet?DB=EVM&DocNumber=2026619275&TypeFile=html

Yandex Cloud. (2025). Text vectorization. Yandex Cloud Docs (in Russian). https://yandex.cloud/ru/docs/foundation-models/concepts/embeddings

Finam. (2026, February 13). Dannye po zanyatosti SShA otpravili syrevye aktivy v krutoe pike [U.S. employment data sent commodity assets into a sharp decline]. https://www.finam.ru/publications/item/dannye-po-zanyatosti-ssha-otpravili-syrevye-aktivy-v-krutoe-pike-20260213-0920/

Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., Yen, N. C., Tung, C. C., & Liu, H. H. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 454(1971), 903–995. https://doi.org/10.1098/rspa.1998.0193

Huang, D. (2003). Practical implementation of Hilbert-Huang Transform algorithm. Acta Oceanologica Sinica, 22(1), 1–14.

Huang, N. E., & Shen, S. S. P. (Eds.). (2014). Hilbert-Huang transform and its applications (2nd ed.). World Scientific.

Hahn, S. L. (1996). Hilbert transforms. The transforms and applications handbook (ed. A. D. Poularikas). CRC Press.

Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3), 337–404.

Ghojogh, B., Ghodsi, A., Karray, F., & Crowley, M. (2021). Reproducing Kernel Hilbert Space, Mercer’s theorem, eigenfunctions, Nystrom method, and use of kernels in machine learning: Tutorial and survey. arXiv:2106.08443. https://doi.org/10.48550/arXiv.2106.08443

Rudin, C. (2012). Prediction: Machine learning and statistics (MIT 15.097), lecture on kernels (Technical Report). Massachusetts Institute of Technology.

Dombrovskiy, V. (2016). Econometrics. Tomsk State University (in Russian). https://lib.tsu.ru/mminfo/2016/Dombrovski/start.htm

Khoshvaght, H., Permala, R., Razmjou, A., & Khiadani, M. (2025). A critical review on selecting performance evaluation metrics for supervised machine learning models in wastewater quality prediction. Journal of Environmental Chemical Engineering, 13(6), 119675. https://doi.org/10.1016/j.jece.2025.119675

Spärck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–21. https://doi.org/10.1108/eb026526

Abdurrafi, M., & Ningsih, D. (2023). Content-based filtering using cosine similarity algorithm for alternative selection on training programs. Journal of Soft Computing Exploration, 4(4), 204–212. https://doi.org/10.52465/joscex.v4i4.232

Malzer, C., & Baum, M. (2020). A hybrid approach to hierarchical density-based cluster selection. 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Karlsruhe, Germany, 223–228. https://doi.org/10.1109/MFI49285.2020.9235263

Nau, R. (2020). Statistical forecasting: notes on regression and time series analysis. Fuqua School of Business, Duke University. https://people.duke.edu/~rnau/411home.htm

Alzraiee, A., & Niswonger, R. (2024). A probabilistic approach to training machine learning models using noisy data. Environmental Modelling & Software, 176, 106037. https://doi.org/10.1016/j.envsoft.2024.106133

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109. https://doi.org/10.2307/2334940

Spiess, A. N., & Neumeyer, N. (2010). An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach. BMC Pharmacology, 10(1), 6. https://doi.org/10.1186/1471-2210-10-6

Youbi, R., Messaoudi, F., & Loukili, M. (2025). Convolutional neural networks for advanced sales forecasting in dynamic market environments. Statistics, Optimization & Information Computing, 13(5), 1972–1983. https://doi.org/10.19139/soic-2310-5070-2143

Hamza, M., Abolghasemi, M., & Alvandi, A. (2021). Forecasting sales with Bayesian networks: a case study of a supermarket product in the presence of promotions. 24th International Congress on Modelling and Simulation (pp. 883–889). Sydney, NSW, Australia. https://doi.org/10.36334/modsim.2021.M9.hamza

Sunendar, N., & Rianto, Y. (2025). Comparison of ARIMA, LSTM, and GRU models for forecasting sales of hit aerosol products. Journal of Computing and Information System, 21(2), 153–159. https://doi.org/10.33480/pilar.v21i2.6412

Luyo-Ballena, J., Ortiz-Pallihuanca, C., & Carrera-Salas, E. (2024). A predictive sales system based on deep learning. International Journal of Advanced Computer Science and Applications, 15(1), 967–973. https://doi.org/10.14569/IJACSA.2024.0150117

Silva, R., Ribeiro, M., Larcher, J., & Mariani, V. (2021). Artificial intelligence and signal decomposition approach applied to retail sales forecasting. XV Congresso Brasileiro de Inteligência Computacional (pp. 1–6). Joinville, Brazil. https://doi.org/10.21528/CBIC2021-25

Mansur, S., Sattar, K., Hosseini, S., Pervez, S., Ahmad, I., Saleem, K., & Elhendi, A. (2025). Sales forecasting for retail stores using hybrid neural networks and sales-affecting variables. PeerJ Computer Science, 11, e3058. https://doi.org/10.7717/peerj-cs.3058

Tang, H., Zhang, C., Jin, M., Yu, Q., Wang, Z., Jin, X., Zhang, Y., & Du, M. (2024). Time series forecasting with LLMs: Understanding and enhancing model capabilities. arXiv:2402.10835. https://doi.org/10.48550/arXiv.2402.10835

Vedula, N., Dhyani, D., Jalali, L., Oreshkin, B., Bayati, M., & Malmasi S. (2025). Quantile regression with large language models for price prediction. Findings of the Association for Computational Linguistics: ACL 2025 (pp. 12396–12415). Vienna, Austria. Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.findings-acl.641

Chowa, S. S., Alvi, R., Rahman, S. S., Rahman M. A., Raiaan, M. A. K., Islam, M. R., Hussain, M. & Azam, S. (2026). From language to action: A review of large language models as autonomous agents and tool users. Artificial Intelligence Review, 59(71). https://doi.org/10.1007/s10462-025-11471-9

Castrillo, V., Gidey, H., Lenz, A., & Knoll, A. (2025). Fundamentals of building autonomous LLM. arXiv:2510.09244. https://doi.org/10.48550/arXiv.2510.09244

Published
2026-06-30
How to Cite
LegenchukI. G. (2026). Hilbert spectrum support vector regression for public companies’ sales forecast. Business Informatics, 20(2), 94-117. Retrieved from https://bijournal.hse.ru/article/view/38881
Section
Articles