O'Brien-Fleming method

TL;DR

Adjusts per-test critical values when multiple hypotheses are tested to control false discoveries.
Uses a predetermined FDR threshold (for example, 0.05) and tightens critical values as the number of tests increases.
Reduces the likelihood of false positives in multi-test studies (for example, clinical trials).

Definition

The O’Brien-Fleming method is a statistical procedure used to control the false discovery rate (FDR) in multiple hypothesis testing by adjusting critical values for each hypothesis test based on a predetermined FDR threshold and the number of hypotheses being tested.

Explanation

The method requires selecting a predetermined threshold for the FDR. Critical values for each hypothesis test are then adjusted according to that threshold and the number of tests being performed. These adjusted critical values are more stringent when more hypotheses are tested, which reduces the probability of false positives. The adjustment is performed using a formula that incorporates the predetermined FDR threshold and the number of hypotheses, yielding per-test significance criteria that differ from the single-test threshold.

Examples

Single hypothesis test

If a single hypothesis test is conducted and the predetermined FDR threshold is 0.05, the critical value might be set at p < 0.05. If the p-value is less than 0.05, the null hypothesis would be rejected.

Multiple hypothesis tests

If two hypothesis tests are conducted under the same predetermined FDR threshold of 0.05, the O’Brien-Fleming adjustment might set the critical value at p < 0.025 (i.e., half the threshold for a single hypothesis test). If the p-value is less than 0.025 in either of the two tests, the null hypothesis would be rejected.

Use cases

Controlling the FDR when testing the effectiveness of a new drug in clinical trials, including situations with multiple clinical trials.

Notes or pitfalls

The method is intended to reduce the likelihood of false positives when multiple hypotheses are tested.
It achieves this by tightening critical values based on the predetermined FDR threshold and the number of tests, which promotes more reliable results.

False discovery rate (FDR)
False positives
Multiple hypothesis testing
p-value
Null hypothesis
Critical value