Beta Upper Confidence Bound Policy for the Design of Clinical Trials


  • Andrii Dzhoha
  • Iryna Rozora



The multi-armed bandit problem is a classic example of the exploration-exploitation trade-off well suited to model sequential resource allocation under uncertainty. One of its typical motivating applications is the adaptive designs in clinical trials which modify the trial's course in accordance with the pre-specified objective by utilizing results accumulating in the trial. Since the response to a procedure in clinical trials is not immediate, the multi-armed bandit policies require adaptation to delays to retain their theoretical guarantees. In this work, we show the importance of such adaptation by evaluating policies using the publicly available dataset
The International Stroke Trial of a randomized trial of aspirin and subcutaneous heparin among 19,435 patients with acute ischaemic stroke. In addition to adapted policies, we analyze the Upper Confidence Bound policy with the beta feedback to mitigate delays when the certainty evidence of successful treatment is available in a relatively short-term period after the procedure.




How to Cite

Dzhoha, A., & Rozora, I. (2023). Beta Upper Confidence Bound Policy for the Design of Clinical Trials. Austrian Journal of Statistics, 52(SI), 26–39.