Improving bankruptcy prediction using Data Envelopment Analysis scores

Yuri A. Zelenkov

doi:10.17323/2587-814X.2025.3.7.21

Yuri A. Zelenkov Graduate School of Business, HSE University, Moscow, Russia https://orcid.org/0000-0002-2248-1023

DOI: https://doi.org/10.17323/2587-814X.2025.3.7.21

Keywords: bankruptcy prediction, bankruptcy factors, DEA specification, causal modeling

Abstract

Most current bankruptcy prediction models are based on financial ratios, although their usage is not supported by formal theory and their interpretation is problematic. One of the prospects for improving the predictive models is the study of other firm performance measures, such as the data envelopment analysis (DEA) scores. However, this raises the problem of choosing the optimal DEA specification, since it determines the shape of the efficiency frontier and the predictive properties of the model. This paper presents a method for automatically designing DEA models whose scores are then used as features to improve the quality of bankruptcy predictors. The method has two goals. The first is to improve accuracy. The second objective assumes that if DEA scores improve the prediction, then the specification of this model can provide information about failures. At the first step, accounting measures that are potentially suitable for the DEA are selected using hierarchical clustering. The second step explores the causal relationships between the selected measures. The third step calculates pure technical efficiency, scale efficiency and mix efficiency. Experiments with two datasets show that the inclusion of these scores in the list of features improves the AUC-ROC by more than 20%, which is superior to previous works. The analysis of the DEA models provides insight into the reasons for a firm’s failure in both stable and crisis periods.

Downloads

Download data is not yet available.

References

Zhao J., Ouenniche J., De Smedt J. (2024) Survey, classification and critical analysis of the literature on corporate bankruptcy and financial distress prediction. Machine Learning with Applications, article 100527. https://doi.org/10.1016/j.mlwa.2024.100527

Smith P. (1990) Data envelopment analysis applied to financial statements. Omega, vol. 18(2), pp. 131–138.

Feroz E.H., Kim S., Raab R.L. (2003) Financial statement analysis: A data envelopment analysis approach. Journal of the Operational Research Society, vol. 54(1), pp. 48–58. https://doi.org/10.1057/palgrave.jors.2601475

Fernandez-Castro A., Smith P. (1994) Towards a general non-parametric model of corporate performance. Omega, vol. 22(3), pp. 237–249.

Zelenkov Y., Volodarskiy N. (2021) Bankruptcy prediction on the base of the unbalanced data using multi-objective selection of classifiers. Expert Systems with Applications, vol. 185, article 115559. https://doi.org/10.1016/j.eswa.2021.115559

Burova A., Penikas H., Popova S. (2020) Probability of default (PD) model to estimate ex ante credit risk. Bank of Russia. Working Paper Series, no. 66. Available at https://www.cbr.ru/StaticHtml/File/116472/wp-66_e.pdf (accessed 01 August 2025).

Tangsawasdirat B., Tanpoonkiat S., Tangsatchanan B. (2021) Credit Risk Database: Credit Scoring Models for Thai SMEs. Puey Ungphakorn Institute for Economic Research. Discussion paper, no. 168. Available at: https://www.pier.or.th/en/dp/168/ (accessed 01 August 2025).

Benítez-Peña S., Bogetoft P., Morales D.R. (2020) Feature selection in data envelopment analysis: a mathematical optimization approach. Omega, vol. 96, article 102068. https://doi.org/10.1016/j.omega.2019.05.004

Li Z., Crook J., Andreeva G. (2014) Chinese companies distress prediction: an application of data envelopment analysis. Journal of the Operational Research Society, vol. 65(3), pp. 466–479. https://doi.org/10.1057/jors.2013.67

Mousavi M.M., Ouenniche J., Tone K. (2019) A comparative analysis of two-stage distress prediction models. Expert Systems with Applications, vol. 119, pp. 322–341. https://doi.org/10.1016/j.eswa.2018.10.053

Charnes A., Cooper W.W., Rhodes E. (1978) Measuring the efficiency of decision making units. European Journal of Operational Research, vol. 2, pp. 429–444.

Banker R.D., Charnes A., Cooper W.W. (1984) Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management Science, vol. 30, pp. 1078–1092.

Cooper W.W., Seiford L.M., Tone K. (2006) Data envelopment analysis: A comprehensive text with models, applications, references, and DEA-Solver software. Springer.

Premachandra I.M., Chen Y., Watson J. (2011) DEA as a tool for predicting corporate failure and success: A case of bankruptcy assessment. Omega, vol. 39(6), pp. 620–626. https://doi.org/10.1016/j.omega.2011.01.002

Shetty U., Pakkala T.P.M., Mallikarjunappa T. (2012) A modified directional distance formulation of DEA to assess bankruptcy: An application to IT/ITES companies in India. Expert Systems with Applications, vol. 39(2), pp. 1988–1997. https://doi.org/10.1016/j.eswa.2011.08.043

Ouenniche J., Tone K. (2017) An out-of-sample evaluation framework for DEA with application in bankruptcy prediction. Annals of Operations Research, vol. 254(1), pp. 235–250. https://doi.org/10.1007/s10479-017-2431-5

Xu X., Wang Y. (2009) Financial failure prediction using efficiency as a predictor. Expert Systems with Applications, vol. 36(1), pp. 366–373. https://doi.org/10.1016/j.eswa.2007.09.040

Yeh C.C., Chi D.J., Hsu M.F. (2010) A hybrid approach of DEA, rough set and support vector machines for business failure prediction. Expert Systems with Applications, vol. 37(2), pp. 1535–1541. https://doi.org/10.1016/j.eswa.2009.06.088

Psillaki M., Tsolas I.E., Margaritis D. (2010) Evaluation of credit risk based on firm performance. European Journal of Operational Research, vol. 201(3), pp. 873–881. https://doi.org/10.1016/j.ejor.2009.03.032

Tone K. (2002) A slacks-based measure of super-efficiency in data envelopment analysis. European Journal of Operational Research, vol. 143(1), pp. 32–41. https://doi.org/10.1016/S0377-2217(01)00324-1

Spirtes P., Glymour C., Scheines R. (2001) Causation, prediction, and search. The MIT Press.

Koller D., Friedman N. (2009) Probabilistic graphical models: principles and techniques. MIT press.

Zelenkov Yu.A., Lashkevich E.V. (2024) Counterfactual explanations based on synthetic data generation. Business Informatics, vol. 18(3), pp. 24–40. https://doi.org/10.17323/2587-814X.2024.3.24.40

Zhang J. (2008) On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, vol. 172(16–17), pp. 1873–1896. https://doi.org/10.1016/j.artint.2008.08.001