Publicatie

Discovering non-metadata contaminant features in intrusion detection datasets

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

Most newly proposed detection methods in intrusion detection incorporate machine learning models to distinguish between benign and malicious traffic. The models are validated on a handful of academic datasets and ranked based on their classification performance. This article aims to demonstrate that unbeknownst to the new models' authors, there are features in these datasets which heavily bias the results and obscure a realistic, reliable estimate of the separability of the datasets. This paper proposes a methodology to estimate the contaminating influence of a dataset's features based on the concept of blind generalization. The novel methodology is subsequently used to assess the features of six widely adopted intrusion detection datasets. In each dataset, several features show a pattern where regardless of training attack class, the models blindly generalize towards all available attack classes with nearly identical classification metrics. These features provide undeserved boosts in the baseline classification scores for each dataset. By themselves, some contaminant features even push these baselines upwards of 90% accuracy (balanced).

Boek: 2022 19TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY & TRUST (PST)

Aantal pagina's: 1

ISBN:9781665473989

Jaar van publicatie:2022

WoS Id: 000861070900014
Handle: http://hdl.handle.net/1854/LU-8769860

Toegankelijkheid:Open

Publicatie

Discovering non-metadata contaminant features in intrusion detection datasets

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

Auteurs/uitgever

Onderzoekseenheden