SA213, Science Building I, NYCU
(交通大學科學一館 213室)
A Mathematical Study on Reproducibility I & II
Lo-Bin Chang (Ohio State University)
Yuan-Chung Sheu ( )
Abstract:
In recent years “reproducibility” has emerged as a key factor in evaluating applications of statistics to the biomedical sciences, for example, predictors of disease phenotypes learned from high-throughput “omics” data. Among other factors, validation of such predictors entails comparing the reported error rates, usually estimated by standard cross-validation, to the accuracy observed on additional data collected from new studies. Unfortunately, the rates originally published are frequently lower, and this discrepancy is then seen as a barrier to translational research. In the first talk, I will review some statistical theory and discuss recent empirical studies of this inconsistency. In the second talk, I will provide a mathematical setup in the large sample limit to study the reproducibility based on the gap between the error rates in cross- study validation (CSV) and that in ordinary randomized cross-validation (RCV).
Theoretical results cohere with the trends observed in practice: for any number m of studies, the cross-study error rate exceeds that of ordinary randomized cross-validation, the latter (averaged) increases with m, and both converge to the optimal rate.