Part 1: Group Sequential and Adaptive Designs

GKM

Gernot Wassmer and Friedrich Pahlke

December 12, 2024

Introduction

A Practical Example

  • A clinical trial for the comparison of a new blood pressure lowering drug A as compared to standard medication B is planned. It is assumed that blood pressure lowering is normally distributed with standard deviation \(\sigma = 20\).

  • The researcher team assumes that A lowers the blood pressure on average by 20mmHg, whereas B lowers only on average by 10mmHg.

  • Propose a sample size for this study at level \(\alpha = 0.025\) and power 80%.

  • The statisticians penetratingly asks if the standard deviation can be \(\sigma = 25\), and even the difference in blood pressure lowering might be only 5mmHg (which is clinically relevant, too). So the team might be overoptimistic.

What can be proposed?

The classical (frequentist) paradigm for testing a hypothesis

  • Fix significance level
  • Fix endpoint and hypothesis
  • Fix type of test and test statistic
  • Fix (compute) sample size under specified effect size, variability, and power
  • Observe patients, compute p-value for specified hypothesis
  • Make test decision.

More general study designs

  • Single fixed sample: Win or loose
  • Group sequential: Perform interim analyses
  • Internal pilot: Blinded sample size reassessment
  • Adaptive group sequential: Adaptive (or flexible) group sequential designs allow to redesign a study based on the results observed so far, under control of the overall Type I error rate.

Example: Single fixed sample

  • \(\alpha = 0.025\) (one-sided), \(1-\beta = 0.80\)
  • Relevant (expected) effect \(\delta^* = 10\)
  • Standard deviation \(\sigma = 20\)
  • Sample size per treatment group \(n = 64\).

Caution

No possibility to adjust in case of
- Over or underestimation of effect size
- Over or underestimation of variability

Caution

No early stopping of the trial

Group Sequential Design

  • \(\alpha = 0.025\) (one-sided), \(1-\beta = 0.80\), \(\delta^* = 10\), \(\sigma = 20\)
  • Four-stage group sequential design with constant critical boundaries
  • Maximum sample size per treatment group \(4 \cdot 19 = 76\)
  • Average (expected) sample size under \(\delta^* = 10: 51.4\)

Caution

No possibility to adjust in case of
- Over or underestimation of effect size
- Over or underestimation of variability

Possible

Interim looks to assess stopping the trial early either for success, futility or harm

\(\hspace{1cm}\)

Caveat

Don’t fix the subsequent sample sizes in a “data driven” way. This could lead to a serious inflation of the Type I error rate. The effects of not considering this is described in, e.g., Proschan, Follmann and Waclawiw (1992).

\(\hspace{1cm}\)

Caveat

Furthermore, you have to fix the designing parameters (e.g., shape of decision boundaries, the test statistic to be used, the hypothesis to be tested) prior to the experiment. These cannot be changed during the course of the trial.

Example: Adaptive Confirmatory Designs

Possible

Possibility to adjust in case of
- Over- or underestimation of effect size
- Over- or underestimation of variability

\(\hspace{1cm}\)

Possible

Interim looks to assess stopping the trial early either for success, futility or harm

\(\hspace{1cm}\)

… and even much more

Group Sequential Designs

Group Sequential Designs: Basic Theory

Pocock and O’Brien and Fleming design
Wang and Tsiatis \(\Delta\)-class \(u_k = k^{\Delta-0.5}\). O’Brien and Fleming: \(\Delta\) = 0; Pocock: \(\Delta\) = 0.5

How this is done with rpact

library(rpact)
getDesignGroupSequential(kMax = 5,
            typeOfDesign = "WT", deltaWT = 0.25)  |>
    plot()

getDesignGroupSequential(kMax = 5,
            typeOfDesign = "WT", deltaWT = 0.25)  |>
    summary()

Sequential analysis with a maximum of 5 looks (group sequential design)

Wang & Tsiatis Delta class design (deltaWT = 0.25), one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.0718, ASN H1 0.7868, ASN H01 0.9982, ASN H0 1.0651.

Stage 1 2 3 4 5
Planned information rate 20% 40% 60% 80% 100%
Cumulative alpha spent 0.0007 0.0041 0.0098 0.0170 0.0250
Stage levels (one-sided) 0.0007 0.0036 0.0076 0.0120 0.0163
Efficacy boundary (z-value scale) 3.194 2.686 2.427 2.259 2.136
Cumulative power 0.0289 0.2017 0.4447 0.6544 0.8000

Group Sequential Designs: Basic Theory

Many other designs

  • Wang and Tsiatis class
  • Haybittle and Peto \(u_1 = \ldots = u_{K-1} = 3\), \(u_K\) accordingly
  • One-sided tests
  • Futility stopping

Also:

  • \(\alpha\)-spending or use-function approach: Specification of the critical values through the use of a function that determines how the significance level is spent over the interim stages.

The Use Function Approach

Examples of \(\alpha\)-spending functions

Examples of \(\alpha\)-spending functions. \(\alpha_1^*\) and \(\alpha_2^*\) approximate O’Brien and Fleming’s and Pocock’s design, respectively. \(\alpha_3^*(\varrho)\) is plotted for \(\varrho = 1.0\), \(1.5\), and \(2.0\); \(\alpha = 0.05\).

The Use Function Approach

  • Computation of critical values does not depend on future information rates
  • Accounting for random under- and overrunning is possible
  • Specifically applicative for survival data
  • Number of interim analyses need not be fixed in advance
  • Planning is usually based on assuming equidistant information rates but can also be performed for suitably chosen information rates
  • Do not change sample size or analysis time in a data driven way!

With rpact

library(rpact)
getDesignGroupSequential(alpha = 0.05, 
        kMax = 5, typeOfDesign = "asOF") |>
    plot(type = c(1, 4)) 

Confirmatory Adaptive Designs

Confirmatory Adaptive Designs: Basics

“Confirmatory adaptive” means:

Planning of subsequent stages can be based on information observed so far, under control of an overall Type I error rate.

Definition

A confirmatory adaptive design is a multi-stage clinical trial design that uses accumulating data to decide how to modify design aspects without compromising its validity and integrity.

Adaptive Test Procedures: The Beginning

  • A trial is performed in two stages
  • In an interim analysis the trial may be
    • stopped for futility or efficacy or
    • continued and possibly adapted (sample size, test statistics)
  • Adaptation of the design for second stage
    • adaptations depend on all (unblinded) interim data including secondary and safety endpoints
    • the adaptation rule is not (completely) preplanned

How to construct a test that controls the Type I error?

Two Pioneering Proposals

Proposals

Bauer and Köhne (Biometrics, 1994): Combination of \(p\,\)-values with a specific combination function (Bauer, 1989)

Proschan and Hunsberger (Biometrics, 1995): Specification of a conditional error function.

The Combination Test (Bauer ’89, Bauer & Köhne ’94)

Stopping boundaries and combination functions have to be laid down a priori!

Clue of the Adaptive Test

  • Do not pool the data of the stages, combine the stage-wise \(p\,\)-values.
  • Then the distribution of the combination function under the null does not depend on design modifications and the adaptive test is still a test at the level \(\alpha\) for the modified design.
  • In the two stages, different hypotheses \(H_{01}\) and \(H_{02}\) can be considered, the considered global test is a test for \(H_0 = H_{01} \cap H_{02}\).
  • Or there are multiple hypotheses at the beginning of the trials and maybe some selected.
  • Or there will be even hypotheses to be added at an interim stage (not of practical concern).

Clue of the Adaptive Test

  • The rules for adapting the design need not to be prespecified!
  • The combination test needs to be prespecified:
    • Bauer and Köhne (1994) proposed Fisher’s combination test but also mentioned other combination tests, e.g., the inverse normal combination test.
    • It was shown that the inverse normal combination test has the decisive advantage that then the adaptive confirmatory design simply generalizes the group sequential design (Lehmacher and Wassmer 1999).
    • So (initial) planning of an adaptive design is essentially the same as planning of a classical group sequential design and the same software can be used.
    • An equivalent procedure (not based on p-values but based on weighted \(Z\) score statistic) was proposed by Cui, Hung, and Wang, 1999.

Possible Data-Dependent Changes of Design

  • Sample size recalculation
  • Change of allocation ratio
  • Change of test statistic
  • Flexible number of looks
  • Treatment arm selection (seamless phase II/III)
  • Population selection (population enrichment)
  • Selection of endpoints

For the latter three, in general, multiple hypotheses testing applies and a closed testing procedure can be used in order to control the experimentwise error rate in a strong sense.

Seamless Phase II/III Trials: Treatment Arm Selection

- Conduct phase II trial as internal part of a combined trial - Plan phase III trial based on data from phase II part - Conduct phase III trial as internal part of the same trial - Demonstrate efficacy with data from phase III + II part

Enrichment: Phase 2/3 Study in HER2- Early Stage BC

  • Stage 1 objective
    • Stop for futility/efficacy
    • To continue with HER2- (Full) population – Broad Label (F) or Enhanced Label (F+S)
    • To confirm greater benefit in TNBC Subpopulation – Restricted Label (S)
    • To adjust the sample size
  • Stage 2 data and the relevant groups from Stage 1 data combined

Sources for alpha Inflation

  • Interim analysis
  • Sample size reassessment
  • Multiple hypotheses

The proposed adaptive procedure fulfills the regulatory requirements for the analysis of adaptive trials as it strongly controls the prespecified multiple Type I error rate (strong control of familywise error rate).

Strong Type I Error Rate Control

Multiple Type I error rate

Probability to reject at least one true null hypothesis.

(Probability to declare at least one ineffective treatment as effective).

\(\hspace{1.6cm}\)

Strong control of multiple Type I error rate

Regardless of the number of true null hypotheses (ineffective treatments): \[\text{Multiple Type I error rate }\le \alpha\]

Methods

Multi-Arm Multi-Stage (MAMS) Designs

  • Methods for predefined selection rules (Stallard & Todd 2003, Maggirr et al, 2012, …)
  • Flexible Multi-Stage Closed Adaptive Tests (Bauer & Kieser 1999; Hommel 2001; …)
    • Do not require a predefined treatment and sample size selection rule.
    • Combine two methodology concepts: Combination Tests and Closed Testing Principle.
    • Available in rpact

Discussion

The Critism from the Statistical Perspective

  • Non-correctness of the procedure
  • Reluctance to accept connection between adaptive and group sequential designs
  • Inefficiency as compared to classical group sequential or other designs, violating the sufficiency principle
  • Uncertainty of interim information
  • Construction of confidence intervals and bias adjusted estimates not possible.

None of these critisms are sustainable

The Regulatory Perspective

  • Adaptive designs per se seem to be accepted by the agencies for regulatory research as long as a detailed plan is provided, e.g., requirement for prospectively written standard operating procedures.

  • Do not use too many interim analysis.

  • Do not perform too early interim for showing efficacy.

  • Guidance advise against operational bias, e.g., treatment effect may be deduced from knowledge of adaptive decision. Sponsor has to take care of that!

  • Support study design through comprehensive simulation reports.

Careful application but principal acceptance

Final Remarks

  • In the meantime, there is some clarification of how and when to use an adaptive design.
  • Some concern being caused by unblinding the study results. This is an inherited problem!
  • Commonly accepted that a group sequential design can be made more flexible through the use of the inverse normal combination test.
  • Increasing use of Bayesian methodology.
  • Software available (ADDPLAN, EaSt, nQuery, gsDesign, rpact, SAS, etc.)

Final Remarks

  • Most important applications
    • Sample size recalculation
    • Treatment arm selection designs
    • Population enrichment designs
  • Many applications run so far, overview papers / books exist.
  • Report in journals often do not mention the use of adaptive designs.
  • There are successful examples of carefully planning and performing adaptive confirmatory designs.

Book reference: Wassmer and Brannath (2016)

See also:

Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls P. Bauer, F. Bretz, V. Dragalin, F. König, and G. Wassmer. Featured Article in Statistics in Medicine 35, 325-347, 2016. http://dx.doi.org/10.1002/sim.6472 (Open Access)

With invited discussion by Hung, Wang and Lawrence; Mehta and Liu; Vollmar; Maurer

The Future