TY - JOUR AU - Gloor, Gregory Brian AU - Macklaim, Jean M. AU - Vu, Michael AU - Fernandes, Andrew D. PY - 2016/07/28 Y2 - 2024/03/28 TI - Compositional uncertainty should not be ignored in high-throughput sequencing data analysis JF - Austrian Journal of Statistics JA - AJS VL - 45 IS - 4 SE - Compositional Data Analysis DO - 10.17713/ajs.v45i4.122 UR - https://www.ajs.or.at/index.php/ajs/article/view/vol45-4-5 SP - 73-87 AB - <div class="page" title="Page 1"><div class="layoutArea"><div class="column"><div class="page" title="Page 1"><div class="layoutArea"><div class="column"><p><span>High throughput sequencing generates sparse compositional data, yet these datasets are rarely analyzed using a compositional approach. In addition, the variation inherent in these datasets is rarely acknowledged, but ignoring it can result in many false positive inferences. We demonstrate that examination of point estimates of the data can result in false positive results, even with appropriate zero replacement approaches, using an </span><span>in vitro </span><span>selection dataset with an outside standard of truth. The variation inherent in real high-throughput sequencing datasets is demonstrated, and we show that this varia- tion can be approximated, and hence accounted for, by Monte-Carlo sampling from the Dirichlet distribution. This approximation when used by itself is itself problematic, but becomes useful when coupled with a log-ratio approach commonly used in compositional data analysis. Thus, the approach illustrated here that merges Bayesian estimation with principles of compositional data analysis should be generally useful for high-dimensional count compositional data of the type generated by high throughput sequencing. </span></p></div></div></div></div></div></div> ER -