Contributor(s)The Pennsylvania State University CiteSeerX Archives
Full recordShow full item record
AbstractIn intrinsic plagiarism analysis we are given a document, allegedly written by a single author, and the task is to find sufficient evidence either to accept or to reject this hypothesis. Existing research to intrinsic plagiarism analysis tries to quantify changes in the writing style by analyzing the distributions of particular style markers. This way, acceptable detection rates can be achieved if the portion of plagiarized sections is known a-priori and if the document is of a single genre. However, both assumptions may not be fulfilled in practice. In  Koppel and Schler propose a new approach to the authorship verification problem, where the task is to determine whether two texts are written by the same author. Their approach is ingenious in that it provides a means to detect relatively shallow differences in writing style while being independent of language, period, and genre. Since the approach requires two (relatively large) samples of text to be compared to each other it cannot be applied directly to the intrinsic plagiarism analysis problem. Main contribution of our paper is the idea to address the shortcomings of existing approaches to intrinsic plagiarism analysis with the technology presented in . We propose a hybrid approach that employs style marker analysis for the purpose of hypotheses generation which then are accepted or rejected by an authorship verification analysis. A second contribution of our paper is the evaluation of style markers for German text and their application to a real-world plagiarism case.