Warning: pg_query(): Query failed: ERROR: missing chunk number 0 for toast value 29512337 in pg_toast_2619 in /dati/webiit-old/includes/database.pgsql.inc on line 138 Warning: ERROR: missing chunk number 0 for toast value 29512337 in pg_toast_2619 query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'https://www-old.iit.cnr.it/en/node/36179' in /dati/webiit-old/includes/database.pgsql.inc on line 159 Warning: pg_query(): Query failed: ERROR: missing chunk number 0 for toast value 29512337 in pg_toast_2619 in /dati/webiit-old/includes/database.pgsql.inc on line 138 Warning: ERROR: missing chunk number 0 for toast value 29512337 in pg_toast_2619 query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'https://www-old.iit.cnr.it/en/node/36179' in /dati/webiit-old/includes/database.pgsql.inc on line 159 Large Scale Web-Content Classification | IIT - CNR - Istituto di Informatica e Telematica
IIT Home Page CNR Home Page

Large Scale Web-Content Classification

Web classification is used in many security devices for preventing users to access selected web sites that are not allowed by the current security policy, as well for improving web search and for implementing contextual advertising. There are many commercial web classification services available on the market and a few publicly available web directory services. Unfortunately they mostly focus on English-speaking web sites, making them unsuitable for other languages in terms of classification reliability and coverage.
This paper covers the design and implementation of a web-based classification tool for TLDs (Top Level Domain). Each domain is classified by analysing the main domain web site, and classifying it in categories according to its content. The tool has been successfully validated by classifying all the registered .it Internet domains, whose results are presented in this paper.

 


8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Lisbona, Portogallo, 2015

External authors: Daniele Sartiano (Dipartimento di Informatica, Univ. di Pisa)
IIT authors:

Type: Article in proceedings of international peer-reviewed conference
Field of reference: Information Technology and Communication Systems

File: Large Scale Web Content Classification.pdf

Activity: Unità Sistemi e Sviluppo Tecnologico