• English
    • français
    • Deutsch
    • español
    • português (Brasil)
    • Bahasa Indonesia
    • русский
    • العربية
    • 中文
  • English 
    • English
    • français
    • Deutsch
    • español
    • português (Brasil)
    • Bahasa Indonesia
    • русский
    • العربية
    • 中文
  • Login
View Item 
  •   Home
  • OAI Data Pool
  • OAI Harvested Content
  • View Item
  •   Home
  • OAI Data Pool
  • OAI Harvested Content
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

All of the LibraryCommunitiesPublication DateTitlesSubjectsAuthorsThis CollectionPublication DateTitlesSubjectsAuthorsProfilesView

My Account

LoginRegister

The Library

AboutNew SubmissionSubmission GuideSearch GuideRepository PolicyContact

Web Spam Detection by Learning from Small Labeled

  • CSV
  • RefMan
  • EndNote
  • BibTex
  • RefWorks
Author(s)
Jaber Karimpour
Ali A. Noroozi
Somayeh Alizadeh
K. N. Toosi
Contributor(s)
The Pennsylvania State University CiteSeerX Archives
Keywords
General Terms Information Retrieval
Search Engine
Machine Learning. Keywords Adversarial Information Retrieval
Web Search
Web Spam Detection
Semi-supervised Learning
Expectation Maximization Algorithm

Full record
Show full item record
URI
http://hdl.handle.net/20.500.12424/828935
Online Access
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.258.8543
http://research.ijcaonline.org/volume50/number21/pxc3880993.pdf
Abstract
Web spamming tries to deceive search engines to rank some pages higher than they deserve. Many methods have been proposed to combat web spamming and to detect spam pages. One basic method is using classification, i.e., learning a classification model from previously labeled training data and using this model for classifying web pages to spam or nonspam. A drawback of this method is that manually labeling a large number of web pages to generate the training data can be biased, non-accurate, labor intensive and time consuming. In this paper, we are going to propose a new method to resolve this drawback by using semi-supervised learning to automatically label the training data. To do this, we incorporate Expectation-Maximization algorithm that is an efficient and an important algorithm of semi-supervised learning. Experiments are carried out on the real web spam data, which show the new method, performs very well in practice.
Date
2013-01-17
Type
text
Identifier
oai:CiteSeerX.psu:10.1.1.258.8543
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.258.8543
Copyright/License
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Collections
OAI Harvested Content

entitlement

 
DSpace software (copyright © 2002 - 2021)  DuraSpace
Quick Guide | Contact Us
Open Repository is a service operated by 
Atmire NV
 

Export search results

The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.