TY - JOUR TI -

An approach to identifying threats of extracting confidential data from automated control systems based on internet technologies

T2 - IS - KW - information security KW - countering information security threats KW - confidential data KW - personal data KW - machine learning KW - deep learning KW - identifying the entities of natural language texts AB -       Together with ubiquitous, global digitalization, cybercrime is growing and developing rapidly. The state considers the creation of an environment conducive to information security to be a strategic goal for the development of the information society in Russia. However, the question of how the "state of protection of the individual, society and the state from internal and external information threats" should be achieved in accordance with the "Information Security" and the "Digital Economy of Russia 2024" programs remains open. The aim of this study is to increase the efficiency whereby automated control systems identify confidential data from html-pages to reduce the risk of using this data in the preparatory and initial stages of attacks on the infrastructure of government organizations. The article describes an approach that has been developed to identify confidential data based on the combination of several neural network technologies: a universal sentence encoder and a neural network recurrent architecture of bidirectional long-term short-term memory. The results of an assessment in comparison with modern means of natural language text processing (SpaCy) showed the merits and prospects of the practical application of the methodological approach. AU - Vladimir Kuzmin AU - Artem Menisov UR - https://bijournal.hse.ru/en/2021--3 Vol.15/510536041.html PY - 2021 SP - 35-47 VL -