Document Type
Article
Publication Date
9-10-2024
Identifier
DOI: 10.1002/sim.10149
Abstract
Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."
Journal Title
Statistics in medicine
Volume
43
Issue
20
First Page
3899
Last Page
3920
MeSH Keywords
DNA Methylation; Humans; Algorithms; Computer Simulation; Models, Statistical; Multivariate Analysis; Arthritis, Rheumatoid; Likelihood Functions; Sulfites; Sequence Analysis, DNA
Keywords
DNA methylation; EM algorithm; additive dispersion; binomial; measurement error; multiplicative dispersion
Recommended Citation
Zhao K, Oualkacha K, Zeng Y, et al. Addressing dispersion in mis-measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data. Stat Med. 2024;43(20):3899-3920. doi:10.1002/sim.10149
Comments
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
Publisher's Link: https://onlinelibrary.wiley.com/doi/10.1002/sim.10149