NCTS Data Sciences Forum
15:30 - 16:30, January 2, 2018 (Tuesday)
Room 202, Astronomy-Mathematics Building, NTU
(台灣大學天文數學館 202室)
Analyzing proportions in Medical Sciences
Chin-Chi Kuo (Hospital, China Medical University)


Some important biomarkers are proportions in daily practice. For example, the component of white blood cells, lipid profiles, and makers of metabolism (e.g., arsenic metabolism). Their main characteristic is that their sum is constrained to a constant. In Medicine, this constant is usually equal to 1. Most researchers tend to treat proportions using conventional statistical methods designed for unconstrained data. However, it should be recognized that the sample space for compositional vectors is different from the real Euclidean space associated with unconstrained data. In 1897, Karl Pearson had pointed out so called “spurious correlation” if ignoring the data constraint. Unfortunately, this warning has not been widely appreciated in the field of Medicine. In Geology, researchers have adopted compositional data analysis (CoDa) for data in proportions. In today’s talk, I will give an overview of the methodology of CoDa, an example of analyzing arsenic metabolism, and barriers for real application. 

