Inference of Fine-grained Attributes of Bengali Corpus for Stylometry Detection
Keywords
Computer Science - Computation and LanguageComputer Science - Computer Vision and Pattern Recognition
Full record
Show full item recordOnline Access
http://arxiv.org/abs/1210.3729Abstract
Stylometry, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and belongs to the core task of Text categorization that involves authorship identification, plagiarism detection, forensic investigation, computer security, copyright and estate disputes etc. In this work, we present a strategy for stylometry detection of documents written in Bengali. We adopt a set of fine-grained attribute features with a set of lexical markers for the analysis of the text and use three semi-supervised measures for making decisions. Finally, a majority voting approach has been taken for final classification. The system is fully automatic and language-independent. Evaluation results of our attempt for Bengali author's stylometry detection show reasonably promising accuracy in comparison to the baseline model.Comment: 5 pages, 2 figures, 4 tables. arXiv admin note: substantial text overlap with arXiv:1208.6268
Date
2012-10-13Type
textIdentifier
oai:arXiv.org:1210.3729http://arxiv.org/abs/1210.3729
Polibits (44) 2011, pp. 79-83