Prof. Dr. Christoph Lippert

Methodology for Innovative Trial Designs


In different projects, we combine statistics, machine learning and causal inference to investigate methods for the design and analysis of N-of-1 trials, adaptive trials, and micro-randomized trials.


  • Congratulations, Juliana for your accepted talk at the upcoming 44th Annual Conference of the International Society for Clinical Biostatistics!
  • Congratulations, Thomas for your accepted talk at the upcoming 44th Annual Conference of the International Society for Clinical Biostatistics!

Anytime-valid inference in N-of-1 trials

  • Abstract: App-based N-of-1 trials offer a scalable experimental design for assessing the effects of health interventions at an individual level. Their practical success depends on the strong motivation of participants, which, in turn, translates into high adherence and reduced loss to follow-up. One way to maintain participant engagement is by sharing their interim results. Continuously testing hypotheses during a trial, known as “peeking”, can also lead to shorter, lower-risk trials by detecting strong effects early. Nevertheless, traditionally, results are only presented upon the trial’s conclusion. In this work, we introduce a potential outcomes framework that permits interim peeking of the results and enables statistically valid inferences to be drawn at any point during N-of-1 trials. Our work builds on the growing literature on valid confidence sequences , which enables anytime-valid inference with uniform type-1 error guarantees over time. We propose several causal estimands for treatment effects applicable in an N-of-1 trial and demonstrate, through empirical evaluation, that the proposed approach results in valid confidence sequences over time. We anticipate that incorporating anytime-valid inference into clinical trials can significantly enhance trial participation and empower participants.
  • Reference: Malenica I, Guo Y, Gan K, Konigorski S (2023). Anytime-valid inference in N-of-1 trials. Proceedings of Machine Learning Research 225:307–322.  https://proceedings.mlr.press/v225/malenica23a.html

Designing and evaluating an online reinforcement learning agent for physical exercise recommendations in N-of-1 trials.

  • Abstract: Personalized adaptive interventions offer the opportunity to increase patient benefits, however, there are challenges in their planning and implementation. Once implemented, it is an important question whether personalized adaptive interventions are indeed clinically more effective compared to a fixed gold standard intervention. In this paper, we present an innovative N-of-1 trial study design testing whether implementing a personalized intervention by an online reinforcement learning agent is feasible and effective. Throughout, we use a new study on physical exercise recommendations to reduce pain in endometriosis for illustration. We describe the design of a contextual bandit recommendation agent and evaluate the agent in simulation studies. The results show that, first, implementing a personalized intervention by an online reinforcement learning agent is feasible. Second, such adaptive interventions have the potential to improve patients’ benefits even if only few observations are available. As one challenge, they add complexity to the design and implementation process. In order to quantify the expected benefit, data from previous interventional studies is required. We expect our approach to be transferable to other interventions and clinical interventions.
  • Reference: Meier D, Ensari I, Konigorski S (2023). Designing and evaluating an online reinforcement learning agent for physical exercise recommendations in N-of-1 trials. Proceedings of Machine Learning Research 225:340–352. https://proceedings.mlr.press/v225/meier23a.html

Multimodal N-of-1 trials

  • Abstract: N-of-1 trials are randomized multi-crossover trials in single participants with the purpose of investigating the possible effects of one or more treatments.
    Research in the field of N-of-1 trials has primarily focused on scalar outcomes. However, with the increasing use of digital technologies, we propose to adapt this design to multimodal outcomes, such as audio, video, or image data or also sensor measurements, that can easily be collected by the trial participants on their personal mobile devices. We present here a fully automated approach for analyzing multimodal N-of-1 trials by combining unsupervised deep learning models with statistical inference. First, we train an autoencoder on all images across all patients to create a lower-dimensional embedding. In the second step, the embeddings are reduced to a single dimension by projecting on the first principal component, again using all images. Finally, we test on an individual level whether treatment and non-treatment periods differ with respect to the component. We apply our proposed approach to a published series of multimodal N-of-1 trials of 5 participants who tested the effect of creams on acne captured through images over 16 days. We compare several parametric and non-parametric statistical tests, and we also compare the results to an expert analysis that rates the pictures directly with respect to their acne severity and applies a t-test on these scores. The results indicate a treatment effect for one individual in the expert analysis. This effect was replicated with the proposed unsupervised pipeline. In summary, our proposed approach enables the use of novel data types in N-of-1 trials while avoiding the need for manual labels. We anticipate that this can be the basis for further explorations of valid and interpretable approaches and their application in clinical multimodal N-of-1 trials.
  • Reference: Schneider J, Gärtner T, Konigorski S (2023). Multimodal outcomes in N-of-1 trials: combining unsupervised learning and statistical inference. arXiv. http://arxiv.org/abs/2309.06455
  • Abstract: N-of-1 trials aim to estimate treatment effects on the individual level and can be applied to personalize a wide range of physical and digital interventions in mHealth. In this study, we propose and apply a framework for multimodal N-of-1 trials in order to allow the inclusion of health outcomes assessed through images, audio or videos. We illustrate the framework in a series of N-of-1 trials that investigate the effect of acne creams on acne severity assessed through pictures. For the analysis, we compare an expert-based manual labelling approach with different deep learning-based pipelines where in a first step, we train and fine-tune convolutional neural networks (CNN) on the images. Then, we use a linear mixed model on the scores obtained in the first step in order to test the effectiveness of the treatment. The results show that the CNN-based test on the images provides a similar conclusion as tests based on manual expert ratings of the images, and identifies a treatment effect in one individual. This illustrates that multimodal N-of-1 trials can provide a powerful way to identify individual treatment effects and can enable large-scale studies of a large variety of health outcomes that can be actively and passively assessed using technological advances in order to personalized health interventions.
  • Reference: Fu J, Liu S, Du S, Ruan S, Guo X, Pan W, Sharma A, Konigorski S (2023). Multimodal N-of-1 trials: a novel personalized healthcare design. arXiv.https://arxiv.org/abs/2302.07547.

Comparison of Bayesian networks, G-estimation and linear models to estimate causal treatment effects in aggregated N-of-1 trials

  • Abstract: The aggregation of a series of N-of-1 trials presents an innovative and efficient study design, as an alternative to traditional randomized clinical trials. Challenges for the statistical analysis arise when there are carry-over effects or confounding of the treatment effect of interest. In this study, we evaluate and compare methods for the analysis of aggregated N-of-1 trials in different scenarios with carry-over and confounding effects. For this, we simulate data of a series of N-of-1 trials for chronic nonspecific low back pain based on assumed causal relationships parameterized by directed acyclic graphs. In addition to existing statistical methods such as regression models, Bayesian Networks and G-estimation, we introduce a linear model adjusted for time-varying treatment carry-over effects. The results show that all evaluated existing models have a good performance when there is no carry-over and confounding. When there are carry-over effects, our proposed method yields unbiased estimates while all methods show some bias in the estimation. When there is known confounding, all approaches that specify the confounding correctly yield unbiased estimates. Finally, the efficiency of all methods decreases slightly when there are missing values, and the bias in the estimates can also increase. This study presents a systematic evaluation of existing and novel approaches for the statistical analysis of a series of N-of-1 trials. We derive practical recommendations which methods may be best in which scenarios.
  • Reference: Gärtner T, Schneider J, Arnrich B, Konigorski S (2022). Comparison of Bayesian networks, G-estimation and linear models to estimate causal treatment effects in aggregated N-of-1 trials. medRxiv. https://doi.org/10.1101/2022.07.21.22277832.

Analyzing population-level trials as N-of-1 trials: an application to gait

  • Abstract: Studying individual causal effects of health interventions is important whenever intervention effects are heterogeneous between study participants. Conducting N-of-1 trials, which are single-person randomized controlled trials, is the gold standard for their analysis. As an alternative method, we propose to re-analyze existing population-level studies as N-of-1 trials, and use gait as a use case for illustration. Gait data were collected from 16 young and healthy participants under fatigued and non-fatigued, as well as under single-task (only walking) and dual-task (walking while performing a cognitive task) conditions. As a reference to the N-of-1 trials approach, we first computed standard population-level ANOVA models to evaluate differences in gait parameters (stride length and stride time) across conditions. Then, we estimated the effect of the interventions on gait parameters on the individual level through Bayesian repeated-measures models, viewing each participant as their own trial, and compared the results. The results illustrated that while few overall population-level effects were visible, individual-level analyses revealed differences between participants. Baseline values of the gait parameters varied largely among all participants, and the effects of fatigue and cognitive task were also heterogeneous, with some individuals showing effects in opposite directions. These differences between population-level and individual-level analyses were more pronounced for the fatigue intervention compared to the cognitive task intervention. Following our empirical analysis, we discuss re-analyzing population studies through the lens of N-of-1 trials more generally and highlight important considerations and requirements. Our work encourages future studies to investigate individual effects using population-level data.
  • Reference: Zhou T, Schneider J, Arnrich B, Konigorski S (2024). Analyzing population-level trials as N-of-1 trials: an application to gait. Contemporary Clinical Trials Communications 38: 101282.https://doi.org/10.1016/j.conctc.2024.101282

Further ongoing projects

  • Causal inference for N-of-1 trials
  • Adaptive N-of-1 trials
  • Aggregated N-of-1 trials versus randomized controlled trials - a framework for statistical and economic comparisons


  • Stefan Konigorski
  • Marco Piccininni
  • Thomas Gärtner
  • Juliana Schneider
  • Dominik Meier
  • Lasse von der Heydt
  • Lin Zhou