Introduction

This demo reproduces the simulated linear regression experiment in the paper:

The demo was run using R version 3.2.0 and the package selectiveInference 1.1.3.

Setup: selective sequential test p-values

Define our parameters:

m=200; n=100 # m = sample size, n = # of variables
prlist=c(0.2, 0.1) # proportion of true signals (nonzero coefficients)
gammalist=c(9,5,1) # signal strength (nonzero coefficient magnitude)
ntrials=50

First we produce selective sequential p-values for the LARS path, and store the results. (Code hidden from output, but visible in the .Rmd file. This code was run once and the results saved; to run again, remove the option ‘eval=FALSE’ in the .Rmd file.)

Since this has been run previously, we can just load in this data. We also truncate p-values at the threshold \(0.9999\) since p-values that are (nearly) equal to 1 can lead to degenerate results. NaN values indicate a p-value that should be zero.

load("SelectiveSeqPValue.rda")
ProcPVal = SettingList #ProcPVal saves processed p-value
for(i in 1:length(SettingList)){
  for (j in 1:length(SettingList[[1]])){
    ProcPVal[[i]][[j]][[1]][is.na(ProcPVal[[i]][[j]][[1]][,1]),1]=0
    ProcPVal[[i]][[j]][[1]][,1] = pmin(ProcPVal[[i]][[j]][[1]][,1], 0.9999)
  }  
}

Next some setup for the accumulation tests:

source('accumulation_test_functions.R')
hfuns=c(create_SeqStep_function(C=2),create_SeqStep_function(C=2),create_HingeExp_function(C=2),create_ForwardStop_function())
numerator_pluses=c(2,0,0,0)
denominator_pluses=c(1,0,0,0)
names=c('SeqStep+ (C=2)','SeqStep (C=2)','HingeExp (C=2)','ForwardStop')
set_alpha=(1:5)/20

Define some functions (code hidden from output, but visible in .Rmd file):

## [1] "Model selection simulation functions defined"
## [1] "Model selection simulation plotting functions defined"

Results

Now we apply accumulation tests to the p-values from the selective sequential test on the LARS path. First, for a desired FDR level \(\alpha\in\{0.05,0.1,\dots,0.25\}\), we run each method and display its actual FDR and its power to detect signals, averaged over 50 trials:

#pdf('FDR_vs_power_plot_ModelSel.pdf',18,12.5)
par(mfrow=c(length(prlist),length(gammalist)))
par(mar=c(7,3,3,3))
for(j in 1:length(prlist)){
  pr=prlist[j]
  for(i in 1:length(gammalist)){
    gamma = gammalist[i]
    printLegend=(i==3 && j==1)
    
    PvalList = ProcPVal[[i+length(gammalist)*(j-1)]]
    
    temp=CompEffectFunc_ModelSel(PvalList, hfuns, numerator_pluses, denominator_pluses, set_alpha)
    FDR_and_power_plot_ModelSel(temp[[2]],temp[[1]],names=names,cols=c('black','blue','purple','red'),pchs=15:18,alpha=(1:5)/20, title=get_title(n*prlist[j],gamma), printLegend)
}};#dev.off()