Повышение эффективности прогнозирования банкротств при помощи синтетических данных
Аннотация
Прогнозирование финансовой несостоятельности компаний имеет решающее значение для инвесторов, кредиторов и регулирующих органов. Однако доступ к высококачественным, сбалансированным данным для обучения моделей часто ограничен из-за соображений конфиденциальности, нехватки информации или особенностей предоставления финансовой отчетности. В данной работе исследуется потенциал методов создания синтетических данных для увеличения экземпляров миноритарного класса в несбалансированных наборах данных и тем самым потенциального улучшения моделей прогнозирования несостоятельности. В работе сравнивается производительность различных методов снижения дисбаланса, включая такие классические, как, например, метод синтетического увеличения выборки меньшинства (Synthetic Minority Over-sampling Technique), с новыми подходами к генерации синтетических данных на основе байесовских сетей, маргинальных распределений, случайных лесов и генеративных состязательных сетей. Исследуется эффективность этих методов с точки зрения их способности улучшить такие показатели классификации, как коэффициент Джини, среднее геометрическое, доля ложно положительных и ложно отрицательных решений. В качестве выборки для эксперимента взяты реальные финансовые показатели промышленных компаний малого и среднего бизнеса Финляндии за 2021. Полученные результаты вносят вклад в растущий объем знаний о генерации синтетических данных и их применении для решения проблем несбалансированных наборов данных и улучшения прогностического моделирования в финансовой сфере, а также дают представление об эффективности различных методов создания синтетических данных для сэмплирования несбалансированных наборов данных и повышения точности и надежности моделей прогнозирования несостоятельности фирм.
Скачивания
Литература
Ildefonso M.V., Laureano R.M., Vasarhelyi M.A. (2023) Predictive models of insolvency: A systematic literature review. 2023 18th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–7. https://doi.org/10.23919/CISTI58278.2023.10211516
BarNiv R., McDonald J.B. (1992) Identifying financial distress in the insurance industry: A synthesis of methodological and empirical issues. Journal of Risk and Insurance, pp. 543–573. https://doi.org/10.2307/253344
Petropoulos A., Siakoulis V., Stavroulakis E., Vlachogiannakis N. (2020) Predicting bank insolvencies using machine learning techniques. International Journal of Forecasting, vol. 36, pp. 1092–1113. https://doi.org/10.1016/j.ijforecast.2019.11.005
Sanya S., Wolfe S. (2010) Ownership structure, revenue diversification and insolvency risk in European banks. SSRN (Social Science Research Network). https://doi.org/10.2139/ssrn.1102626
Pitselis G. (2008) An overview on solvency supervision, regulations and prediction of insolvency. Belgian Actuarial Bulletin, vol. 8, no. 1, pp. 37–53.
Beaver W.H. (1966) Financial ratios as predictors of failure. Journal of Accounting Research, pp. 71–111. https://doi.org/10.2307/2490171
Altman E.I. (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, vol. 23, no. 4, pp. 589–609. https://doi.org/10.2307/2978933
Shumway T. (2001) Forecasting bankruptcy more accurately: A simple hazard model. Journal of Business, vol. 74, no. 1, pp. 101–124. https://doi.org/10.1086/209665
Sisodia D.S., Verma U. (2018) The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models. International Journal on Electrical Engineering and Informatics, vol. 10, no. 3, pp. 433–446. https://doi.org/10.15676/IJEEI.2018.10.3.2
Vellamcheti S., Singh P. (2020) Class imbalance deep learning for bankruptcy prediction. 2020 First International Conference on Power, Control and Computing Technologies (ICPC2T), pp. 421–425. https://doi.org/10.1109/ICPC2T48082.2020.9071460
Veganzones D., Séverin E. (2018) An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, vol. 112, pp. 111–124. https://doi.org/10.1016/j.dss.2018.06.011
Garcia J. (2022) Bankruptcy prediction using synthetic sampling. Machine Learning with Applications, vol. 9, article 100343. https://doi.org/10.1016/j.mlwa.2022.100343
Sattarov T., Schreyer M., Borth D. (2023) Findiff: Diffusion models for financial tabular data generation. Proceedings of the Fourth ACM International Conference on AI in Finance, pp. 64–72. https://doi.org/10.48550/arXiv.2309.01472
Ramzan F., Sartori C., Consoli S., Reforgiato Recupero D. (2024) Generative adversarial networks for synthetic data generation in finance: Evaluating statistical similarities and quality assessment. AI, vol. 5, no. 2, pp. 667–685. https://doi.org/10.3390/ai5020035
de Meer Pardo F. (2019) Enriching financial datasets with generative adversarial networks. MS thesis. Delft University of Technology. The Netherlands.
Assefa S.A., Dervovic D., Mahfouz M., et al. (2020) Generating synthetic data in finance: opportunities, challenges and pitfalls. Proceedings of the First ACM International Conference on AI in Finance, article 44. https://doi.org/10.1145/3383455.3422554
Le T.L.M., Park J.R., Baik S.W. (2018) Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry, vol. 10, no. 4, article 79. https://doi.org/10.3390/sym10040079
Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P. (2002) SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, vol. 16, pp. 321–357. https://doi.org/10.1613/jair.953
He H., Bai Y., Garcia E.A., Li S. (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Xu L., Skoularidou M., Cuesta-Infante A., Veeramachaneni K. (2019) Modeling tabular data using conditional GAN. arXiv:1907.00503. https://doi.org/10.48550/arXiv.1907.00503
Devi D., Purkayastha B. (2017) Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance. Pattern Recognition Letters, vol. 93, pp. 3–12. https://doi.org/10.1016/j.patrec.2016.10.006
Pardo F.D.M., López R.C. (2020) Mitigating overfitting on financial datasets with generative adversarial networks. The Journal of Financial Data Science, vol. 2, no. 1, pp. 76–85.
Eckerli F., Osterrieder J. (2021) Generative adversarial networks in finance: an overview. arXiv:2106.06364. https://doi.org/10.48550/arXiv.2106.06364
Majid R., Mir S.A. (2018) Advances in statistical forecasting methods: An overview. Economic Affairs, vol. 63, no. 4, pp. 815–831. https://doi.org/10.30954/0424-2513.4.2018.5
Nair J. (2019) Corporate distress and bankruptcy prediction – A critical review of statistical methods and models. Abhigyan, vol. 37, no. 2, pp. 10–20.
Billios D., Seretidou D., Stavropoulos A. (2024) The power of numerical indicators in predicting bankruptcy: A systematic review. Journal of Risk and Financial Management, vol. 17, no. 10, article 433. https://doi.org/10.3390/jrfm17100433
Ding J., Tarokh V., Yang Y. (2018) Model selection techniques: An overview. IEEE Signal Processing Magazine, vol. 35, no. 6, pp. 16–34. https://doi.org/10.1109/MSP.2018.2867638
Barboza F., Kimura H., Altman E. (2017) Machine learning models and bankruptcy prediction. Expert Systems with Applications, vol. 83, pp. 405–417. https://doi.org/10.1016/j.eswa.2017.04.006
Sulistiani I., Mufida E., Yasser P.M., Alamsyah L. (2021) Systematic literature review: Bankruptcy prediction Menggunakan Teknik machine learning dan deep learning. INTECH (Informatika dan Teknologi), vol. 2, no. 1, pp. 13–18. https://doi.org/10.54895/intech.v2i1.824
Chen J.M. (2019) Models for predicting business bankruptcies and their application to banking and financial regulation. Penn State Law Review, vol. 123, pp. 735–752. https://doi.org/10.2139/ssrn.3329147
Soukal I., Mačí J., Trnková G., et al. (2024) A state-of-the-art appraisal of bankruptcy prediction models focussing on the field’s core authors: 2010–2022. Central European Management Journal, vol. 32, no. 1, pp. 3–30. https://doi.org/10.1108/CEMJ-08-2022-0095
da Silva Mattos E., Shasha D. (2024) Bankruptcy prediction with low-quality financial information. Expert Systems with Applications, vol. 237, article 121418. https://doi.org/10.1016/j.eswa.2023.121418
Wang X., Kräussl Z., Brorsson M. (2024) Datasets for advanced bankruptcy prediction: A survey and taxonomy. arXiv:2411.01928. https://doi.org/10.48550/arXiv.2411.01928
Tian S., Yu Y., Zhou M. (2015) Data sample selection issues for bankruptcy prediction. Risk, Hazards and Crisis in Public Policy, vol. 6, no. 1, pp. 91–116. https://doi.org/10.1002/rhc3.12071
Mann S.C., Logeswaran R. (2021) Data analytics in improved bankruptcy prediction with industrial risk. 14th International Conference on Developments in eSystems Engineering (DeSE), pp. 23–26. https://doi.org/10.1109/DeSE54285.2021.9719372
Chakraborty D., Ranjan R. (2024) Missing data imputation with granular semantics and AI-driven pipeline for bankruptcy prediction. arXiv:2404.00013. https://doi.org/10.48550/arXiv.2404.00013
Abd Elrahman S.M., Abraham A. (2013) A review of class imbalance problem. Journal of Network and Innovative Computing, vol. 1.
Chaves R.M., Rossi A.L.D., Garcia L.P.F. (2023) Financial distress prediction in an imbalanced data stream environment. International Conference on Hybrid Artificial Intelligence Systems (HAIS 2023). Lecture Notes in Computer Science, vol. 14001, pp. 168–179. https://doi.org/10.1007/978-3-031-40725-3_15
Mortaz E. (2020) Imbalance accuracy metric for model selection in multi-class imbalance classification problems. Knowledge-Based Systems, vol. 210, article 106490. https://doi.org/10.1016/j.knosys.2020.106490
Luque A., Carrasco A., Martín A., de Las Heras A. (2019) The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, vol. 91, pp. 216–231. https://doi.org/10.1016/j.patcog.2019.02.023
García V., Sánchez J.S., Marqués A.I., et al. (2020) Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data. Expert Systems with Applications, vol. 158, article 113026. https://doi.org/10.1016/j.eswa.2019.113026
Xie Y., Huang X., Qin F., et al. (2024) A majority affiliation based under-sampling method for class imbalance problem. Information Sciences, vol. 662, article 120263. https://dl.acm.org/doi/10.1016/j.ins.2024.120263
Napierala K., Stefanowski J., Wilk S. (2010) Learning from imbalanced data in presence of noisy and borderline examples. Rough Sets and Current Trends in Computing: 7th International Conference (RSCTC 2010), pp. 158–167. https://doi.org/10.1007/978-3-642-13529-3_18
Seiffert C., Khoshgoftaar T.M., Van Hulse J., Napolitano A. (2007) Mining data with rare events: A case study. 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), vol. 2, pp. 132–139. https://doi.org/10.1109/ICTAI.2007.71
Chen N., Vieira A.S., Duarte J., et al. (2009) Cost-sensitive learning vector quantization for financial distress prediction. Progress in Artificial Intelligence: 14th Portuguese Conference on Artificial Intelligence (EPIA 2009), pp. 374–385. https://doi.org/10.1007/978-3-642-04686-5_31
Safi S.A.D., Castillo P.A., Faris H. (2022) Cost-sensitive metaheuristic optimization-based neural network with ensemble learning for financial distress prediction. Applied Sciences, vol. 12, no. 14, article 6918. https://doi.org/10.3390/app12146918
Eltayeb R., Karrar A.E., Osman W.I., Mutasim M. (2023) Handling imbalanced data through re-sampling: Systematic review. Indonesian Journal of Electrical Engineering and Informatics (IJEEI), vol. 11, no. 2, pp. 503–514. https://doi.org/10.52549/.v11i2.4471
Chawla N., Bowyer K., Hall L., Kegelmeyer W. (2002) SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, vol. 16, pp. 321–357. https://doi.org/10.1613/jair.953
Cheng K., Zhang C., Yu H., et al. (2019) Grouped SMOTE with noise filtering mechanism for classifying imbalanced data. IEEE Access, vol. 7, pp. 170668–170681. https://doi.org/10.1109/ACCESS.2019.2955086
Leevy J.L., Khoshgoftaar T.M., Bauder R.A., Seliya N. (2018) A survey on addressing high-class imbalance in big data. Journal of Big Data, vol. 5, no. 1, pp. 1–30. https://doi.org/10.1186/s40537-018-0151-6
Sharma S., Bellinger C., Krawczyk B., et al. (2018) Synthetic oversampling with the majority class: A new perspective on handling extreme imbalance. 2018 IEEE International Conference on Data Mining (ICDM), pp. 447–456. https://doi.org/10.1109/ICDM.2018.00060
Zhang R., Lu S., Yan B., et al. (2023) A density-based oversampling approach for class imbalance and data overlap. Computers & Industrial Engineering, vol. 186, article 109747. https://doi.org/10.1016/j.cie.2023.109747
Hairani H., Widiyaningtyas T., Prasetya D.D. (2024) Addressing class imbalance of health data: A systematic literature review on modified synthetic minority oversampling technique (SMOTE) strategies. International Journal on Informatics Visualization, vol. 8, no. 3, pp. 1310–1318. https://doi.org/10.62527/joiv.8.3.2283
Mehmood A., De Luca F. (2025) Financial distress prediction in private firms: Developing a model for troubled debt restructuring. Journal of Applied Accounting Research, vol. 26, no. 6, pp. 205–222. https://doi.org/10.1108/JAAR-12-2022-0325
O’hara R.B., Sillanpää M.J. (2009) A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis, vol. 4, no. 1, pp. 85–117. https://doi.org/10.1214/09-BA403
He H., Bai Y., Garcia E.A., Li S. (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Le T., Vo M.T., Vo B., et al. (2019) A hybrid approach using oversampling technique and cost‐sensitive learning for bankruptcy prediction. Complexity, vol. 2019, article 8460934. https://doi.org/10.1155/2019/8460934
Ren T., Lu T., Yang Y. (2021) Improved data mining method for class-imbalanced financial distress prediction. Proceedings of the 7th International Conference on Computing and Artificial Intelligence, pp. 308–313. https://doi.org/10.1145/3467707.3467754
Zhou L. (2013) Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods. Knowledge-Based Systems, vol. 41, pp. 16–25. https://doi.org/10.1016/j.knosys.2012.12.007
Krawczyk B., Wozniak M. (2019) On the role of cost-sensitive learning in imbalanced data oversampling. Computational Science–ICCS 2019: 19th International Conference, Faro, Portugal, June 12–14, 2019. Proceedings, Part III, pp. 180–191. https://doi.org/10.1007/978-3-030-22744-9_14
Murad M.A.H., Paul M.K. (2023) A hybrid preprocessing approach for the classification of class imbalanced data. 2023 6th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–6. https://doi.org/10.1109/EICT61409.2023.10427712
Kang Q., Chen X., Li S., Zhou M. (2016) A noise-filtered under-sampling scheme for imbalanced classification. IEEE Transactions on Cybernetics, vol. 47, no. 12, pp. 4263–4274. https://doi.org/10.1109/TCYB.2016.2606104
Palli A.S., Jaafar J., Hashmani M.A., et al. (2022) A hybrid sampling approach for imbalanced binary and multi-class data using clustering analysis. IEEE Access, vol. 10, pp. 118639–118653. https://doi.org/10.1109/ACCESS.2022.3218463
de Morais R.F., Vasconcelos G.C. (2019) Boosting the performance of oversampling algorithms through under-sampling the minority class. Neurocomputing, vol. 343, pp. 3–18.
Figueira A., Vaz B. (2022) Survey on synthetic data generation, evaluation methods and GANs. Mathematics, vol. 10, no. 15, article 2733. https://doi.org/10.3390/math10152733
Fonseca J., Bacao F. (2023) Tabular and latent space synthetic data generation: A literature review. Journal of Big Data, vol. 10, article 115. https://doi.org/10.1186/s40537-023-00792-7
Shorten C., Khoshgoftaar T.M. (2019) A survey on image data augmentation for deep learning. Journal of Big Data, vol. 6, article 60. https://doi.org/10.1186/s40537-019-0197-0
Bayer M., Kaufhold M.-A., Reuter C. (2021) A survey on data augmentation for text classification. ACM Computing Surveys, vol. 55, no. 7, article 146. https://doi.org/10.1145/3544558
Bonabeau E. (2002) Agent-based modeling: Methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences, vol. 99, pp. 7280–7287. https://doi.org/10.1073/pnas.082080899
Goodfellow I., Pouget-Abadie J., Mirza M., et al. (2020) Generative adversarial nets. Communications of the ACM, vol. 63, no. 11, pp. 139–144. https://doi.org/10.1145/3422622
Kingma D.P. (2013) Auto-encoding variational bayes. arXiv:1312.6114. https://doi.org/10.48550/arXiv.1312.6114
Beaulieu-Jones B.K., Wu Z.S., Williams C., et al. (2019) Privacy-preserving generative deep neural networks support clinical data sharing. Circulation: Cardiovascular Quality and Outcomes, vol. 12, no. 7, article e005122. https://doi.org/10.1161/circoutcomes.118.005122
Frid-Adar M., Klang E., Amitai M., et al. (2018) Synthetic data augmentation using GAN for improved liver lesion classification. arXiv:1801.02385. https://doi.org/10.48550/arXiv.1801.02385
Sutskever I., Vinyals O., Le Q.V. (2014) Sequence to sequence learning with neural networks. arXiv:1409.3215. https://doi.org/10.48550/arXiv.1409.3215
Delgado R., Núñez-González J.D. (2022) Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning. Scientific Reports, vol. 12, article 8724. https://doi.org/10.1038/s41598-022-12682-8
Li H., Wang S., Jiang J., et al. (2024) Augmenting the diversity of imbalanced datasets via multi-vector stochastic exploration oversampling. Neurocomputing, vol. 583, article 127600. https://doi.org/10.1016/j.neucom.2024.127600
Zhai J., Qi J., Shen C. (2022) Binary imbalanced data classification based on diversity oversampling by generative models. Information Sciences, vol. 585, pp. 313–343. https://doi.org/10.1016/j.ins.2021.11.058
Engelmann J., Lessmann S. (2021) Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. Expert Systems with Applications, vol. 174, article 114582. https://doi.org/10.1016/j.eswa.2021.114582
Majeed A., Hwang S.O. (2023) CTGAN-MOS: Conditional generative adversarial network-based minority-class-augmented oversampling scheme for imbalanced problems. IEEE Access, vol. 11, pp. 85878–85899. https://doi.org/10.1109/ACCESS.2023.3303509
Son M., Jung S., Jung S., Hwang E. (2021) BCGAN: A CGAN-based oversampling model using the boundary class for data balancing. The Journal of Supercomputing, vol. 77, pp. 10463–10487. https://doi.org/10.1007/s11227-021-03688-6
Ai Q., Wang P., He L., et al. (2023) Generative oversampling for imbalanced data via majority-guided VAE. International Conference on Artificial Intelligence and Statistics, pp. 3315–3330. https://doi.org/10.48550/arXiv.2302.10910
Yang G., Ramanan D. (2019) Volumetric correspondence networks for optical flow. Advances in Neural Information Processing Systems, vol. 32.
Zelenkov Y.A., Lashkevich E.V. (2024) Counterfactual explanations based on synthetic data generation. Business Informatics, vol. 18, no. 3, pp. 24–40. http://doi.org/10.17323/2587-814X.2024.3.24.40
Sklar M. (1959) Fonctions de répartition à n dimensions et leurs marges. Annales de l’ISUP, vol. 8, no. 3, pp. 229–231. https://doi.org/10.2139/ssrn.4198458
Nelsen R.B. (2006) An introduction to copulas. Springer.
Joe H. (2014) Dependence modeling with copulas. CRC Press. https://doi.org/10.1201/b17116
Endres M., Mannarapotta Venugopal A., Tran T.S. (2022) Synthetic data generation: A comparative study. Proceedings of the 26th International Database Engineered Applications Symposium, pp. 94–102. https://doi.org/10.1145/3548785.3548793
Pearl J. (2014) Probabilistic reasoning in intelligent systems: Networks of plausible inference. Elsevier.
Chan L.S., Chu A.M., So M.K. (2023) A moving-window Bayesian network model for assessing systemic risk in financial markets. PLoS One, vol. 18, article e0279888. https://doi.org/10.1371/journal.pone.0279888
Koller D., Friedman N. (2009) Probabilistic graphical models: Principles and techniques. MIT Press.
Chickering D.M. (2013) Learning equivalence classes of Bayesian-network structures. arXiv:1302.3566. https://doi.org/10.48550/arXiv.1302.3566
Huang S., Li J., Ye J., et al. (2012) A sparse structure learning algorithm for Gaussian Bayesian network identification from high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1328–1342. https://doi.org/10.1109/tpami.2012.129
Yang J., Jiang J., Wen Z., Mian A. (2023) Parallel and distributed Bayesian network structure learning. IEEE Transactions on Parallel and Distributed Systems, vol. 35, no. 4, pp. 517–530. https://doi.org/10.1109/TPDS.2023.3326832
Xu W., Liu A., Zhang Y., Lau V. (2024) Bayesian deep learning via expectation maximization and turbo deep approximate message passing. arXiv:2402.07366. https://doi.org/10.48550/arXiv.2402.07366
Liaw A., Wiener M. (2002) Classification and regression by randomForest. R News, vol. 2, no. 3, pp. 18–22.
Breiman L. (2001) Random forests. Machine Learning, vol. 45, pp. 5–32. https://doi.org/10.1023/A:1010950718922
Mesiar R., Sheikhi A. (2021) Nonlinear random forest classification, a copula-based approach. Applied Sciences, vol. 11, no. 15, article 7140. https://doi.org/10.3390/app11157140
Elavarasan D., Vincent P.D.R. (2021) A reinforced random forest model for enhanced crop yield prediction by integrating agrarian parameters. Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 11, pp. 10009–10022. https://doi.org/10.1007/s12652-020-02752-y
Kotelnikov A., Baranchuk D., Rubachev I., Babenko A. (2023) TabDDPM: Modelling tabular data with diffusion models. arXiv:2209.15421. https://doi.org/10.48550/arXiv.2209.15421
Qian Z., Cebere B.C., van der Schaar M. (2023) Synthcity: Facilitating innovative use cases of synthetic data in different data modalities. arXiv:2301.07573. https://doi.org/10.48550/arXiv.2301.07573
Fonseca J., Bacao F. (2023) Tabular and latent space synthetic data generation: A literature review. Journal of Big Data, vol. 10, article 115. https://doi.org/10.1186/s40537-023-00792-7
Cai K., Lei X., Wei J., Xiao X. (2021) Data synthesis via differentially private Markov random fields. Proceedings of the VLDB Endowment, vol. 14, no. 11, pp. 2190–2202. https://doi.org/10.14778/3476249.3476272
Koudas N., Srivastava D., Yu T., Zhang Q. (2009) Distribution based microdata anonymization. Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 958–969. https://doi.org/10.14778/1687627.1687735
Zhang J., Cormode G., Procopiuc C.M., et al. (2017) PrivBayes: Private data release via Bayesian networks. ACM Transactions on Database Systems (TODS), vol. 42, no. 4, pp. 1–41. https://doi.org/10.1145/3134428
Kaur D., Sobiesk M., Patil S., et al. (2021) Application of Bayesian networks to generate synthetic health data. Journal of the American Medical Informatics Association, vol. 28, no. 4, pp. 801–811. https://doi.org/10.1093/jamia/ocaa303
Patki N., Wedge R., Veeramachaneni K. (2016) The synthetic data vault. 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 399–410. https://doi.org/10.1109/DSAA.2016.49
Tahir M.A., Kittler J., Mikolajczyk K., Yan F. (2009) A multiple expert approach to the class imbalance problem using inverse random under sampling. Multiple Classifier Systems: 8th International Workshop, pp. 82–91. https://doi.org/10.1007/978-3-642-02326-2_9
Devi D., Purkayastha B. (2017) Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance. Pattern Recognition Letters, vol. 93, pp. 3–12. https://doi.org/10.1016/j.patrec.2016.10.006
Tazwar S.M., Knobbout M., Quesada E.H., Popa M. (2024) Tab-VAE: A novel VAE for generating synthetic tabular data. Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 17–26. https://doi.org/10.5220/0012302400003654
Yoon J., Drumright L.N., van der Schaar M. (2020) Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 8, pp. 2378–2388. https://doi.org/10.1109/jbhi.2020.2980262
Watson D.S., Blesch K., Kapar J., et al. (2023) Adversarial random forests for density estimation and generative modeling. International Conference on Artificial Intelligence and Statistics, pp. 5357–5375. https://doi.org/10.48550/arXiv.2205.09435
Copyright (c) 2025 Национальный исследовательский университет «Высшая школа экономики»

Это произведение доступно по лицензии Creative Commons «Attribution» («Атрибуция») 4.0 Всемирная.








