< Terug naar vorige pagina
Publicatie
Kernel Spectral Document clustering using unsupervised precision-recall metrics
Boekbijdrage - Boekhoofdstuk Conferentiebijdrage
© 2015 IEEE. Kernel Spectral Clustering (KSC) solves a weighted kernel principal component analysis problem in a primal-dual optimization framework. The KSC model is built on a small subset of data using a proper training, model selection and a test phase. The clustering model is obtained using the dual solution of the problem and has a powerful out-of-sample extensions property which allows cluster affiliation for previously unseen data points. In the model selection phase, we estimate the appropriate number of clusters using a metric that evaluates the quality of the clusters. Traditional quality indices like inertia, Davies-Bouldin (DB) index and silhouette (SIL) are known to be method-dependent and not perform well in case of complex heterogeneous data like textual data. In this paper, we utilize the quality evaluation techniques based on an unsupervised version of Precision, Recall and F-measure proposed in [1] to come up with a new kernel spectral document clustering (KSDC) model which generates homogeneous clusters of documents. We compare the quality of the clusters obtained by the proposed KSDC technique with k-means and neural gas algorithm, which are more oriented towards these metrics, on several real world textual data.
Boek: Proc. of the International Joint Conference on Neural Networks
Pagina's: 1 - 7
ISBN:9781479919604
Jaar van publicatie:2015
BOF-keylabel:ja
IOF-keylabel:ja
Authors from:Higher Education
Toegankelijkheid:Closed