< Terug naar vorige pagina

Publicatie

Clustering social sciences and humanities publications

Boekbijdrage - Boekabstract Conferentiebijdrage

Ondertitel:can word and document embeddings improve cluster quality?
We study how different document representation techniques affect the outcomes of clustering based on textual information. Our dataset consists of titles and abstracts for 15,907 publications from the social sciences and humanities. We compare established document representation techniques such as TF-IDF and Latent Semantic Indexing with word and document embedding techniques (Word2Vec and Doc2Vec) in terms of the quality of the clustering outcomes. Quality is assessed by calculating silhouette scores and the average textual coherence of the clusters. The results show that word and document embeddings are promising feature engineering techniques in the context of clustering social sciences and humanities publications. The average of Word2Vec embeddings works best for identifying textually coherent clusters.
Boek: 18th international conference on scientometrics & informetrics (ISSI2021), virtual event, 12-15 July, 2021, KU Leuven, Belgium / Glänzel, Wolfgang [edit.]; et al.
Pagina's: 369 - 374
Jaar van publicatie:2021
Trefwoorden:P1 Proceeding
Toegankelijkheid:Closed