Reliable plagiarism detection service with limited number of search queries

V. Dyagilev; А. Tskhay; S. Butakov

Vadim Dyagilev – Post-Graduate Student, Department of Higher Mathematics and Mathematical Simulation, Faculty of Natural Sciences, Altai State Technical University.
Address: 46, Lenin pr., Altay region, Barnaul, 656038, Russian Federation.
E-mail: dyagilev@mail.ru

Alexander Tskhay – Professor, Head of Department of Mathematics and Applied Informatics, Faculty of Economics, Altai Academy of Economics and Law.
Address: 82, Komsomolsky pr., Altay region, Barnaul, 656038, Russian Federation.
E-mail: taa1956@mail.ru

Sergey Butakov – Associate Professor, Department of Mathematics and Applied Informatics, Faculty of Economics, Altai Academy of Economics and Law.
Address: 82, Komsomolsky pr., Altay region, Barnaul, 656038, Russian Federation.
E-mail: sergey.butakov@gmail.com

The research analyzes existing approaches in detection of textual plagiarism and identifies potential problems related to the outsourcing of the web-search for similar documents on the internet. The problems arise out of the fact that third-party plagiarism detection services require to use an entire document for the checkup. This situation may be unacceptable in some cases, e.g. in connection with copyright concerns.

Based on findings of the analysis, an improved architecture of a plagiarism detection system is suggested with supportive evidence to confirm the efficiency of the proposed approach. In the suggested architecture, the internet search represents a separate module hosted by a third-party checkup organization. In contrast to the conventional architectures, the new one assumes that, instead of an entire document, the third party receives only certain part of it containing key phrases sufficient to look for identical texts on the web. In this case, the third party sends back preliminary search results for potential sources of copying, while the detailed comparison is carried out on the client side. The experiment conducted as part of the research correlates the amount of text taken from the web with the quality of plagiarism detection based on the limited number of queries. It is evidenced for specialized texts that the proposed approach allows to locate the original sources based on a limited set of queries if as little as 5% of the text is copied from the web, while rendering almost impossible for the third party to fully restore the document subjected to the checkup. For general texts, the copying minimum tends to be fairly higher. In general, the proposed approach allows to avoid sending an entire document for a third party check.

V. Dyagilev, А. Tskhay, S. Butakov

Reliable plagiarism detection service with limited number of search queries