A key step in data engineering is the analysis of large data sets for the prediction of outcomes and the automated detection of common patterns. Ideal approaches combine high accuracy with scalability, robustness and interpretability. Here, statistics and machine learning have long taken differing approaches: parameterizing existing models with data and predicting directly from data, respectively. In our work, we draw heavily from both fields and leverage pre-existing knowledge into predictions. We tailor analysis methods to specific data characteristics and bundle them in end-to-end pipelines. In this lecture, the analysis of human microbiome data will serve as showcase for application.
The event will take place at the Hasso Plattner Institut, lecture hall 1, at 5 pm.