Contributor(s)The Pennsylvania State University CiteSeerX Archives
Full recordShow full item record
AbstractAbstract — In today’s world, copy detection is a major problem. Students plagiarize assignments from the web and from each other. In such a scenario, we need a technique that can detect even partial copies between assignments subject to relocation. This problem also finds uses in the context of the web. Search engines are highly interested in detecting copies of entire web documents to avoid displaying the same content multiple times in the result. Document fingerprinting is an efficient technique for the accurate detection of full and partial copies between documents. We have come up with a new randomized algorithm that provides a guarantee that with very high probability, any match of greater than or equal to W characters (an input parameter) will be detected. Moreover, we have small deterministic bounds on the amount of space needed for our algorithm. This is the key way in which it differs from previous work, where either there are no guarantees  or the space bounds are very poor . I.