Plagiarism Detection using Enhanced Relative Frequency Model
AbstractAs the world is running towards greater heights of technology, it’s becoming more complex to secure data from being copied. So it’s better to detect the copied contents rather than securing the contents. Here, contents cover digital documents of scientific research, articles in newspapers, journals and assignments submitted by students. There are so many tools and algorithms to detect plagiarism, but the time complexity of the algorithm really matters where document comparison is against giant data set. Vector based methods are quite frequently used in the detection process of plagiarism. There are so many vector based methods, but having some drawbacks. In SCAM approach, selection of 'e'(epcilon) value is a drawback as 'e' value decides the closeness set and daniel approach fails to identify plagiarism when there were repeated terms in a sentence. Here we are proposing a new algorithm, which is developed using the concepts of the Relative Frequency Model overcomes the drawbacks involved in existing methods. In the implementation of our proposed method, we employed sentence splitter, stop-word removal process, and stemming of words.
Reddy, Kotha Dinesh (2013) Plagiarism Detection using Enhanced Relative Frequency Model. MTech thesis.