Intro


This website accompanies the papers:
  • The p-filter: multilayer FDR control for grouped hypotheses.
    Rina Foygel Barber and Aaditya Ramdas. arXiv:1512.03397
  • A unified treatment of multiple testing with prior knowledge.
    Aaditya Ramdas, Rina Foygel Barber, Martin J. Wainwright, and Michael I. Jordan. arXiv:1703.06222

In many scientific applications it is necessary to test a list of many hypotheses simultaneously. Suppose that we have a list H1, H2, ..., Hn with accompanying p-values p1, p2, ..., pn. Some of the p-values correspond to true signals in the data, but many of them are null (no true signal).

If the p-values are partitioned into groups, we might want to simultaneously guarantee that we don't pick too many false positives (null), and that we don't pick too many false groups (groups consisting of all nulls, with no true signals). More generally, our list of hypotheses may be partitioned in multiple ways, and we might want to guarantee a low number of false discoveries with respect to all of these different groupings. For example, given data measured across space and time, we may be interested in the set of selected spatial locations (across all time points) and the set of selected time points (across all spatial locations).

The p-filter procedure selects a set of "discoveries" among the n hypotheses which is guaranteed to have a bounded false discovery rate (i.e. the expected value of the proportion of discoveries which are actually false), simultaneously for every partition of interest. Our second paper extends the method to allow for null proportion adaptivity, weighting the p-values according to varying priors or penalties, overlapping groups, and incomplete partitions.




Code


Here we provide R code: a function implementing the p-filter method, and scripts for reproducing the two simulations and the real data experiment in the paper.
  • The p-filter algorithm: pfilter.R (a simple example for how to use this function: example.R)
  • Script for simulation 1 (grouped hypotheses, 100 trials): script_grouped.R (runs in ~30 seconds)
  • Script for simulation 2 (row- and column-wise grouped hypotheses, 100 trials): script_row_col.R (runs in ~15 minutes)
  • Script for the fMRI data experiment, neuro.R, and the fMRI data set, fMRI_data.txt.
    This data is obtained from:
    Leila Wehbe, Brian Murphy, Partha Talukdar, Alona Fyshe, Aaditya Ramdas, and Tom Mitchell. Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses. PLOS ONE, November Issue, 2014.
Code implementing the more general version of the p-filter, allowing for null proportion adaptivity, weighting the p-values according to varying priors or penalties, overlapping groups, and incomplete partitions, can be found here: