Publicatie

Dutch Humor Detection by Generating Negative Examples

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

Detecting if a text is humorous is a hard task to do computationally, as it usually requires linguistic and common sense insights. In machine learning, humor detection is usually modelled as a binary classification task, trained to predict if the given text is a joke or another type of text. Rather than using completely different non-humorous texts, we propose using text generation algorithms for imitating the original joke dataset to increase the difficulty for the learning algorithm. We constructed several different joke and non-joke datasets to test the humor detection abilities of different language technologies. In particular, we test if the RobBERT language model is more capable than previous technologies for detecting humor when given generated similar non-jokes. In doing so, we create and compare the first Dutch humor detection systems. We found that RobBERT outperforms other algorithms, and especially shines when distinguishing jokes from the generated negative examples. This performance illustrates the usefulness of using text generation to create negative datasets for humor recognition, and also shows that transformer models are a large step forward in humor detection.

Boek: Proceedings of the 32st Benelux Conference on Artificial Intelligence (BNAIC 2020) and the 29th Belgian Dutch Conference on Machine Learning (Benelearn 2020)

Pagina's: 313 - 323

Aantal pagina's: 10

Jaar van publicatie:2020

Institutional Repository URL: https://lirias.kuleuven.be/3274633

Toegankelijkheid:Open

Publicatie

Dutch Humor Detection by Generating Negative Examples

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

Auteurs/uitgever

Onderzoekseenheden

Evenementen

Projecten