Document Type

Article

Publication Date

4-17-2014

Identifier

DOI: https://dx.doi.org/10.4310/SII.2014.v7.n2.a4

Abstract

In genetic pathway analysis and other high dimensional data analysis, thousands and millions of tests could be performed simultaneously. p-values from multiple tests are often presented in a negative log-transformed format. We construct a contaminated exponential mixture model for −ln(P)" role="presentation" style="display: inline; line-height: normal; font-size: 17.3333px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: rgb(64, 64, 64); font-family: "Times New Roman", Times, serif; position: relative;">−ln(P)−ln(P) and propose a D_CDF test to determine whether some −ln(P)" role="presentation" style="display: inline; line-height: normal; font-size: 17.3333px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: rgb(64, 64, 64); font-family: "Times New Roman", Times, serif; position: relative;">−ln(P)−ln(P) are from tests with underlying effects. By comparing the cumulative distribution functions (CDF) of −ln(P)" role="presentation" style="display: inline; line-height: normal; font-size: 17.3333px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: rgb(64, 64, 64); font-family: "Times New Roman", Times, serif; position: relative;">−ln(P)−ln(P) under mixture models, the proposed method can detect the cumulative effect from a number of variants with small effect sizes. Weight functions and truncations can be incorporated to the D_CDF test to improve power and better control the correlation among data. By using the modified maximum likelihood estimators (MMLE), the D_CDF tests have very tractable limiting distributions under H0" role="presentation" style="display: inline; line-height: normal; font-size: 17.3333px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: rgb(64, 64, 64); font-family: "Times New Roman", Times, serif; position: relative;">H0H0. A copula-based procedure is proposed to address the correlation issue among p-values. We also develop power and sample size calculation for the D_CDF test. The extensive empirical assessments on the correlated data demonstrate that the (weighted and/or c" role="presentation" style="display: inline; line-height: normal; font-size: 17.3333px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: rgb(64, 64, 64); font-family: "Times New Roman", Times, serif; position: relative;">cc-level truncated) D_CDF tests have well controlled Type I error rates and high power for small effect sizes. We applied our method to gene expression data in mice and identified significant pathways related the mouse body weight.

Journal Title

Statistics and Its Interface

Volume

7

Issue

2

First Page

187

Last Page

200

Keywords

D_CDF test; negative log transformed p-values; weight function; c-level truncated test; mixture model; modified maximum likelihood estimator (MMLE)

Share

COinS