Logo do repositório
 
A carregar...
Miniatura
Publicação

Computational Generalization in Taxonomies Applied to: (1) Analyze Tendencies of Research and (2) Extend User Audiences

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
Frolov_etal_IDEA2019.pdf248.68 KBAdobe PDF Ver/Abrir

Orientador(es)

Resumo(s)

We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors referred to as “gaps” and “offshoots”. Our method, ParGenFS, globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. Two applications are considered: (1) analysis of tendencies of research in Data Science; (2) audience extending for programmatic targeted advertising online. The former involves a taxonomy of Data Science derived from the celebrated ACM Computing Classification System 2012. Based on a collection of research papers published by Springer 1998–2017, and applying in-house methods for text analysis and fuzzy clustering, we derive fuzzy clusters of leaf topics in learning, retrieval and clustering. The head subjects of these clusters inform us of some general tendencies of the research. The latter involves publicly available IAB Tech Lab Content Taxonomy. Each of about 25 mln users is assigned with a fuzzy profile within this taxonomy, which is generalized offline using ParGenFS. Our experiments show that these head subjects effectively extend the size of targeted audiences at least twice without loosing quality.

Descrição

D.F. and B.M. acknowledge continuing support by the Academic Fund Program at the NRU HSE (grant-19-04-019 in 2018?2019) and by the DECAN Lab NRU HSE, in the framework of a subsidy granted to the HSE by the Government of the Russian Federation for the implementation of the Russian Academic Excellence Project ?5-100?. S.N. acknowledges the support by FCT/MCTES, NOVA LINCS (UID/CEC/04516/2019).

Palavras-chave

Annotated suffix tree Fuzzy thematic cluster Generalization Research tendencies Targeted advertising Theoretical Computer Science General Computer Science

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

Springer

Licença CC

Métricas Alternativas