Jun 15, 2018 - Journal of Artificial Intelligence Research,. 38(2010), 135-187. .... http://www.aft.org/periodical/american-educator/winter-2001/cheating ...
6/15/2018
Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents: A Comparative Study | IGI Global
Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents: A Comparative Study Saugata Bose (University of Liberal Arts Bangladesh, Bangladesh) and Ritambhra Korpal (Savitribai Phule Pune University, India) Source Title: Feature Dimension Reduction for Content-Based Image Identification (/gateway/book/192042) Copyright: © 2018 Pages: 18 ISBN13: 9781522557753ISBN10: 152255775XEISBN13: 9781522557760 DOI: 10.4018/978-1-5225-5775-3.ch007 Cite Chapter
Favorite
View Full Text HTML (/gateway/chapter/full-text-html/207231)
View Full Text PDF (/gateway/chapter/full-text-pdf/207231)
Abstract In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.
References Androutsopoulos, I., & Malakasiotis, P. (2010). A Survey of Paraphrasing and Textual Entailment Methods. Journal of Artificial Intelligence Research, 38(2010), 135-187. Retrieved from https://www.jair.org/media/2985/live-2985-5001-jair.pdf (https://www.jair.org/media/2985/live-2985-5001-jair.pdf) Badge, J., & Scott, J. (2009). Dealing with plagiarism in the digital age. University of Leicester. Retrieved from http://evidencenet.pbworks.com/w/page/19383480/Dealing%20with%20plagiarism%20in%20the%20digital%20age (http://evidencenet.pbworks.com/w/page/19383480/Dealing%20with%20plagiarism%20in%20the%20digital%20age) Baeza-Yates, R., & Ribeiro-Neto, B. (Eds.). (1999). Modern Information Retrieval. Addison Wesley Longman Limited. Retrieved from http://people.ischool.berkeley.edu/~hearst/irbook/print/chap10.pdf (http://people.ischool.berkeley.edu/~hearst/irbook/print/chap10.pdf) Barrón-Cedeño, A., & Rosso, P. (2009). On Automatic Plagiarism Detection Based on n-Grams Comparison. Advances in Information Retrieval, 696-700. Retrieved from http://www.cs.upc.edu/~albarron/publications/2009/BarronNgramsECIR.pdf (http://www.cs.upc.edu/~albarron/publications/2009/BarronNgramsECIR.pdf) Bennett, R. (2005). Factors associated with student plagiarism in a post-1992 University. Journal of Assessment and Evaluation in Higher Education, 30(2), 137-162. Retrieved from https://www.scribd.com/document/309860125/Bennett-2005-Factors-Associated-With-Student-Plagiarism-in-a-Post-1992-University (https://www.scribd.com/document/309860125/bennett-2005-factors-associated-with-student-plagiarism-in-a-post-1992-university)
Follow Reference
Bull J. Collins C. Coughlin E. Sharp D. (2001). Technical review of plagiarism detection software report. Luton: Computer Assisted Assessment Centre. Retrieved from https://www.researchgate.net/publication/247703683_Technical_Review_of_Plagiarism_Detection_Software_Report (https://www.researchgate.net/publication/247703683_technical_review_of_plagiarism_detection_software_report) Buruiana, F., Scoica, A., Rebedea, T., & Rughinis, R. (2013). Automatic Plagiarism Detection System for Specialized Corpora. Proceedings of the 19th International Conference on Control Systems and Computer Science (CSCS), 77-82. Retrieved from https://www.researchgate.net/publication/251899410_Automatic_Plagiarism_Detection_System_for_Specialized_Corpora (https://www.researchgate.net/publication/251899410_automatic_plagiarism_detection_system_for_specialized_corpora) Ceska, Z. (2009). Automatic Plagiarism Detection Based on Latent Semantic Analysis (Unpublished Ph.D Thesis). University of West Bohemia. Chong, M., Specia, L., & Mitkov, R. (2010). Using Natural Language Processing for Automatic Plagiarism Detection. Proceedings of the 4th International Plagiarism Conference. Retrieved from http://www.academia.edu/326444/Using_Natural_Language_Processing_for_Automatic_Detection_of_Plagiarism (http://www.academia.edu/326444/Using_Natural_Language_Processing_for_Automatic_Detection_of_Plagiarism)
https://www.igi-global.com/gateway/chapter/207231
1/3
6/15/2018
Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents: A Comparative Study | IGI Global Clough, P. (Ed.). (2003). Old and new challenges in automatic plagiarism detection. National UK Plagiarism Advisory Service. Retrieved from http://ir.shef.ac.uk/cloughie/papers/pas_plagiarism.pdf (http://ir.shef.ac.uk/cloughie/papers/pas_plagiarism.pdf) Clough, P., & Stevenson, M. (2011). Developing a corpus of plagiarised short answers. Language Resources and Evaluation, 45(1), 5-24. Retrieved from https://www.researchgate.net/publication/220147549_Developing_a_corpus_of_plagiarised_short_answers (https://www.researchgate.net/publication/220147549_developing_a_corpus_of_plagiarised_short_answers) Cosma, G., & Joy, M. (2008). Towards a definition of source-code plagiarism. IEEE Transactions on Education, 51(2), 195-200. Retrieved from https://www.researchgate.net/publication/3052950_Towards_a_Definition_of_Source-Code_Plagiarism (https://www.researchgate.net/publication/3052950_towards_a_definition_of_source-code_plagiarism) Eissen, S. M. Z., Stein, B., & Kulig, M. (2006). Plagiarism Detection without Reference Collections. Proceedings of the 30th Annual Conference of the GesellschaftfürKlassifikatione.V., 359-366. Retrieved from http://www.uni-weimar.de/medien/webis/publications/papers/stein_2007a.pdf (http://www.uniweimar.de/medien/webis/publications/papers/stein_2007a.pdf) Lyon, C., Barrett, R., & Malcolm, J. (2003). Experiments in Electronic Plagiarism Detection. Computer Science Department, University of Hertfordshire. Retrived from http://homepages.herts.ac.uk/~comqcml/TR5.3.5.doc (http://homepages.herts.ac.uk/~comqcml/TR5.3.5.doc) Lyon, C., Barrett, R., & Malcolm, J. (2004). A Theoretical Basis to the Automated Detection of Copying Between Texts, and its Practical Implementation in the Ferret Plagiarism and Collusion Detector. Proceedings of the Plagiarism: Prevention, Practice and Policies Conference. Retrieved from http://homepages.herts.ac.uk/~comqcml/LyonPaperFerretx.pdf (http://homepages.herts.ac.uk/~comqcml/LyonPaperFerretx.pdf) McCabe, D. (2002). Cheating: Why Students Do It and How We Can Help Them Stop. American Educator. Retrieved from http://www.aft.org/periodical/american-educator/winter-2001/cheating (http://www.aft.org/periodical/american-educator/winter-2001/cheating) Meuschke, N., & Gipp, B. (2014). Reducing computation effort for plagiarism detection by using citation characteristics to limit retrieval space. In Proceedings of the 14th ACM/IEEE Conference on Digital Libraries (pp. 567–575). Retrieved from http://www.academia.edu/28340526/Reducing_Computational_Effort_for_Plagiarism_Detection_by_using_Citation_Characteristics_to_Limit_Retrieval_Space (http://www.academia.edu/28340526/Reducing_Computational_Effort_for_Plagiarism_Detection_by_using_Citation_Characteristics_to_Limit_Retrieval_Space) Mihalcea, R., & Corley, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity. Proceedings of the Twenty-first National Conference on Artificial Intelligence (AAAI-06), 775–780. Retrieved from https://www.aaai.org/Papers/AAAI/2006/AAAI06-123.pdf (https://www.aaai.org/Papers/AAAI/2006/AAAI06-123.pdf) Mohler, M., & Mihalcea, R. (2009). Text-to-text semantic similarity for automatic short answer grading. Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), 567–575. Retrived from http://www.aclweb.org/anthology/E09-1065 (http://www.aclweb.org/anthology/E09-1065) Oberreuter, G., Lhuillier, G., Ros, S. A., & Velsquez, J. D. (2011). Approaches for intrinsic and external plagiarism detection - notebook for pan at clef 2011. Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2011) Labs and Workshop. Retrieved from http://ceurws.org/Vol-1177/CLEF2011wn-PAN-OberreuterEt2011.pdf (http://ceur-ws.org/Vol-1177/CLEF2011wn-PAN-OberreuterEt2011.pdf) Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). Bleu: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 311-318. Retrieved from http://www.aclweb.org/anthology/P02-1040.pdf (http://www.aclweb.org/anthology/P02-1040.pdf) Reddy, K. D. (2013). Plagiarism Detection using Enhanced Relative Frequency Model (Unpublished M.Sc Thesis). Department of Computer Science and Engineering, National Institute of Technology Rourkela, India. Retrieved from http://ethesis.nitrkl.ac.in/5401/1/211CS3297.pdf (http://ethesis.nitrkl.ac.in/5401/1/211CS3297.pdf) Runeson, P., Alexandersson, M., & Nyholm, O. (2007). Detection of Duplicate Defect Reports Using Natural Language Processing. Proceedings of the 29th International Conference on Software Engineering, ICSE'07, 499-510. Retrieved from https://www.semanticscholar.org/paper/Detection-of-Duplicate-DefectReports-Using-Natura-Runeson-Alexandersson/0d459e3be20f7f529bc0d92d42fa63e60fc1e1ba (https://www.semanticscholar.org/paper/detection-ofduplicate-defect-reports-using-natura-runeson-alexandersson/0d459e3be20f7f529bc0d92d42fa63e60fc1e1ba) Vani, K., & Gupta, D. (2015). Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. Proceedings of the 4th International Conference on Advances in Computing, Communications and Informatics (ICACCI), 1578-1584. Retrieved from https://www.deepdyve.com/lp/institute-of-electrical-and-electronics-engineers/investigating-the-impact-of-combined-similarity-metrics-and-pos-Z29pbXpVIw (https://www.deepdyve.com/lp/institute-of-electrical-and-electronics-engineers/investigating-the-impact-of-combined-similarity-metrics-and-pos-z29pbxpviw) Vania, C., & Adriani, M. (2010). Automatic external plagiarism detection using passage similarities. Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2010). Retrieved from http://ceur-ws.org/Vol-1176/CLEF2010wn-PAN-VaniaEt2010.pdf (http://ceurws.org/Vol-1176/CLEF2010wn-PAN-VaniaEt2010.pdf) Zubarev, D. V., & Sochenkov, I. V. (2017). Paraphrased plagiarism detection using sentence similarity. Proceedings of the International Conference on Computational Linguistics and Intellectual Technologies: Dialogue 2017. Retrieved from http://www.dialog-21.ru/media/3965/zubarevdvsochenkoviv.pdf (http://www.dialog-21.ru/media/3965/zubarevdvsochenkoviv.pdf)
https://www.igi-global.com/gateway/chapter/207231
2/3
6/15/2018
Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents: A Comparative Study | IGI Global
Research Tools Database Search (/gateway/) | Help (/gateway/help/) | User Guide (/gateway/user-guide/) | Advisory Board (/gateway/advisory-board/) User Resources Librarians (/gateway/librarians/) | Researchers (/gateway/researchers/) | Authors (/gateway/authors/) Librarian Tools COUNTER Reports (/gateway/librarian-tools/counter-reports/) | Persistent URLs (/gateway/librarian-tools/persistent-urls/) | MARC Records (/gateway/librarian-tools/marc-records/) | Institution Holdings (/gateway/librarian-tools/institution-holdings/) | Institution Settings (/gateway/librarian-tools/institution-settings/) Librarian Resources Training (/gateway/librarian-corner/training/) | Title Lists (/gateway/librarian-corner/title-lists/) | Licensing and Consortium Information (/gateway/librarian-corner/licensing-and-consortiuminformation/) | Promotions (/gateway/librarian-corner/promotions/) | Online Symposium Series (/gateway/librarian-corner/online-symposium-series/) | Database Icons (/gateway/librariancorner/database-icons/) | LibGuides (/gateway/librarian-corner/libguides/) Policies Terms and Conditions (/gateway/terms-and-conditions/)
(http://www.facebook.com/pages/IGI-Global/138206739534176?ref=sgm)
(http://twitter.com/igiglobal)
(http://www.world-forgotten-children.org)
Copyright © 1988-2018, IGI Global - All Rights Reserved
https://www.igi-global.com/gateway/chapter/207231
3/3