Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
Prof. Dr. Emmanuel Müller
  
 

Data Science: An alternative research process focusing on extensive exploitation of available data

Scientific data is ubiquitous in today’s research process. Data is collected as observation of natural phenomena, it is created from simulation models and experiments, data is used for modelling and verification of complex processes, and data is published as empirical evidence of scientific results. Overall, data has a significant influence on the quality of research results. However, lack of technology and education in advanced data management, data analytics, and data exploration is in clear contrast to the value of data in the scientific process.

Objectives:

Data Science addresses these open research gaps. It complements the domain expertise of scientists by computer science expertise and proposes an “alternative research process focusing on extensive exploitation of available data”. With our computer science research we focus on the open challenges in scientific data handling: We study new computer science concepts in advanced data management, data analytics, and data exploration. We develop new scientific processes ranging from data integration up to data analysis and data exploration. Our research results are computer science methods that allow for efficient and scalable data processing in big and heterogeneous data repositories, (semi-)automated knowledge discovery in complex databases, and intuitive data exploration by interactive data visualization. In all these research fields we focus on the inclusion of the scientists’ domain expertise into our data science methods. This allows scientists to steer this data analysis process to novel data-driven hypothesis and a comprehensive understanding of their data. Thus, data science is a novel computer science research direction with significant impact on a variety of scientific domains such as geoscience.  

Data Science: An alternative research process focusing on extensive exploitation of available data

Contributions:

The establishment of data science, and hence, computer science research is a long-term process requiring innovation in three fields: (1) Novel computer science methodology on the level of formal algorithm development. (2) Novel computer science tools on the practical level of a Data Science Platform. (3) Post-graduate education of scientist in data science. Our group aims at establishing of all these fields. Our scientific contributions are directed to the following goals:

  • Computer Science Research:
    • Data integration of multiple (heterogeneous) data sources
    • Efficient data access and efficient data processing of big databases
    • Data exploration by interactive data visualization 
    • (Semi-)Automated knowledge discovery in complex databases
    •  Integration of prior knowledge into our data science methods
  • Technology Development:
    • Novel data analytics processes including our data science methods and tools
      for data management, analysis, and exploration
    • Availability of research results (i.e. algorithms) to the interested public
    • Extensibility of algorithms by the community
    • Transfer of algorithms to other scientific domains and industry
  • Post-graduate Education in Data Science: