For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.
Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.
Antonio Sala, Information Engineering Department of the University of Modena and Reggio Emilia, Italy
Extraction of Management Concepts from Web Sites for Sentiment Analysis
Arvid Heise, Masters Theses Results
Duplikaterkennung unter Verwendung Unstrukturturierter Anteile
David Sonnabend, Masters Theses Results
Optimizing Query Execution to Improve the Energy Efficiency of DBMS
Tobias Flach, Masters Theses Results
Wikipedia Cross-lingual Infobox Alignment and Conflict Detection
Daniel Rinser, Masters Theses Results
Towards Granular Data Placement Strategies for Cloud Platforms
Johannes Lorey, Practice Talk for Symposium on Cloud Computing and the Web at 2010 IEEE GrC
Finding Unique Column Combinations within a Database
Ziawasch Abedjan, Masters Theses Results
Sensitivity of Spatiotemporal Patterns based on Integrated Data from Distributed Environmental Sensor Networks
Sören Nils Haubrock, Masters Theses Results
Antonio Sala: Aggregated Search of Data and Services From a user perspective, data and services provide a complementary vision of an information source: data provide detailed information about specific needs, while services execute processes involving data and returning an informative result too. For this reason, users need to perform aggregated searches able to identify not only relevant data, but also services able to operate on them. At the current state of the art, such aggregated search can be performed manually only by expert users, that first identify relevant data, and then identify existing relevant services. We propose a semantic approach to perform aggregated search of data and services. In particular, we developed a technique that, on the basis of an ontological representation of data and services related to a domain, supports the translation of a data query into a service discovery process. To evaluate our approach, we developed a prototype that extends the existing MOMIS data integration system (http://www.dbgroup.unimore.it/Momis) with a new information retrieval-based web service engine called XIRE.
Stephan Ewen: Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing Data Intensive Scalable Computing is a much-investigated topic in current research. Next to parallel databases, new flavors of data processors have established themselves - most prominently the map/reduce programming and execution model. These new systems provide key features that current parallel databases lack, such as flexibility in the data models, the ability to parallelize custom functions, and fault tolerance that enables them to scale out to thousands of machines. We present the Nephele/PACs system - a parallel data processor centered around a programming model of so-called Parallelization Contracts (PACs) and the scalable parallel execution engine Nephele. The PAC programming model is a generalization of the well-known map/reduce programming model, extending it with additional higher-order functions and output contracts that give guarantees about the behavior of a function. A PAC program is transformed into a data flow for Nephele, which executes its sequential building blocks in parallel and provides communication, synchronization, and fault tolerance. The PACs are defined in such a way that this transformation can apply several types of optimizations on the data flow. The system as a whole is as generic as map/reduce systems, while overcoming several of their major weaknesses.
Arvid Heise: Extraction of Management Concepts from Web Sites for Sentiment Analysis Company Web sites are vital information sources for organization theorists to summarize and assess the management concepts that are implemented by the respective companies. However, the enormous amount of data renders manual analyses virtually infeasible. We present the Management Concept Miner, an integrated tool that continuously extracts the texts from several hundred company Web sites into a relational database and that scores the management concepts of the companies. It comprises an incremental Web crawler, PDF and HTML text extractors, dictionary-based annotators for concept-related key phrases, and an automatic assessment of the annotated texts. We apply viewpoint detection and sentiment analysis techniques to the new domain of management concepts to assess the importance of the management concept to the company. The achieved results are on a comparable level to the assessment performance of domain experts. Additionally, the evaluation shows that the Management Concept Miner crawls the huge amount of data resource-efficiently and extracts the texts reliably. The Management Concept Miner demonstrates that the data of Web sites can be exploited to summarize the management concepts of a company. However, the presented design can be applied to further domains beyond management concepts. Since the amount of publicly available opinionated data is steadily increasing, the combination of information extraction techniques with viewpoint detection has a huge potential.
David Sonabend: Duplikaterkennung unter Verwendung Unstrukturturierter Anteile In einer Vielzahl von Datensätzen, wie beispielsweise Produktdatenbanken, werden strukturierte Informationen der repräsentierten Objekte durch unstrukturierte Daten wie z.B. detaillierte Entitätsbeschreibungen ergänzt. Herkömmliche Duplikaterkennungsansätze nutzen lediglich die strukturierten Daten, um Duplikate innerhalb der Datenmenge zu identifizieren. Im Rahmen dieser Masterarbeit werden Ähnlichkeitsmaße vorgestellt, die durch das Einbeziehen solch unstrukturierter Daten die Effektivität der Duplikaterkennung maßgeblich steigern.