Publicatie

Sequence count data are poorly fit by the negative binomial distribution

Tijdschriftbijdrage - Tijdschriftartikel

Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that non-parametric tests should be preferred over parametric methods.

Tijdschrift: PloS one

ISSN: 1932-6203

Issue: 4

Volume: 15

Jaar van publicatie:2020

Trefwoorden:Goodness-Of-Fit, Rna-Seq Data, Models

DOI: https://doi.org/10.1371/journal.pone.0224909
Handle: http://hdl.handle.net/1942/31792
WoS Id: 000536673200005

BOF-keylabel:ja

IOF-keylabel:ja

BOF-publication weight:1

CSS-citation score:1

Auteurs:International

Authors from:Higher Education

Toegankelijkheid:Open

Zie ook: Sequence count data are poorly fit by the negative binomial distribution

Publicatie

Sequence count data are poorly fit by the negative binomial distribution

Tijdschriftbijdrage - Tijdschriftartikel

Auteurs/uitgever

Onderzoekseenheden