Our group includes PostDocs, PhD students, and student assistants, and is headed by Prof. Felix Naumann. If you are interested in joining our team, please contact Felix Naumann.
For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.
Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.
Im Rahmen dieses Forschungsseminars stellen Mitarbeiter und Studenten ihre Forschungsarbeiten auf diesem Gebiet vor. Studenten und Gäste sind herzlich eingeladen.
Das Forschungsseminar wird von Christoph Böhm koordiniert.
Allgemein
Wann: tba
Wo: tba
Termine
15.02.2011, 11:00, A.2-2
Bridging the Vocabulary Gap between Questions and Answers Sentences in Question Answering Systems
Saeedeh Momtazi
31.01.2011, 11:30, A.1-1
ETL (Extract-Transform-Load) Process Recommendation
Masters Thesis Results
Andriy Vedrych
21.01.2011, 13:00, A.1-2
A Flexible Index Structure for Interactive Data Profiling
Masters Thesis Results
09.11.2010, 13:00, H-E.52
ECIR - a Lightweight Approach for Entity-centric Information Retrieval
practice talk for the TREC 2010 conference
Michael Leben
25.11.2010, 11:00, H-2.57
Self-Service Development of Linked Data Applications with the Information Workbench
Dr. Peter Haase, Senior Architect - Research & Development @ fluid Operations AG
Abstracts
Saeedeh Momtazi: Bridging the Vocabulary Gap between Questions and Answers Sentences in Question Answering Systems Sentence retrieval plays an important role in question answering systems. It aims to find small segments of text that contain an exact answer to users' questions rather than overwhelm them with a large number of retrieved documents which they must sort through to find the desired answer. As the search in sentence retrieval is conducted over smaller segments of data than in a document retrieval task, the problems of data sparsity and exact matching become more critical than document retrieval. In this talk, we propose two different language modeling techniques to overcome vocabulary mismatch problem by capturing term relationships. The first method, the class-based language model, uses a word clustering algorithm to capture term relationships to deal with the data sparsity and vocabulary mismatch problems. In this model, we assume there is a relation between the terms that belong to the same cluster; as a result, they can be substituted when searching for relevant sentences. The second method, the trained trigger language model, finds pairs of trigger and target words when trained on a large corpus. If a trigger word appears in the question and a sentence contains the corresponding target word, the model considers a relation between the question and the sentence. The experimental results show that both models significantly improve sentence retrieval performance.
Michael Leben: ECIR - a Lightweight Approach for Entity-centric Information Retrieval This paper describes our system developed for the TREC 2010 Entity track. In particular we study the exploitation of advanced features of different Web search engines to achieve high quality answers for the 'related entity finding'-task. Our system preprocesses a user query using part-ofspeech tagging and synonym dictionaries, and generates an enriched keyword query employing advanced features of the particular Web search engine. After retrieving a corpus of documents, the system constructs rules for extracting candidate entities. Potentially related entities are deduplicated and scored for each document with respect to the distance to the source entity that is defined in the query. Finally, these scores are aggregated across the corpus by incorporating the rank position of a document. For homepage retrieval we further employ advanced features of Web search engines for instance to retrieve candidate URLs by queries such as entity in anchor. Homepages are ranked by a weighted aggregation of feature vectors. The weight for each individual feature was determined beforehand using a genetic learning algorithm. We employed a commercial information extraction system as basis and implemented our system for three different web search engines. We discuss our experiments for the different web search engines and elaborate on the lessons learned.
Dr. Peter Haase: Self-Service Development of Linked Data Applications with the Information Workbench With existing datasets growing and new data being added constantly, the Linked Open Data (LOD) cloud becomes increasingly interesting for the Enterprise, allowing companies to augment and complement internal knowledge with external information. The development of domain-specific applications that benefit from LOD repositories, though, is often a time-consuming and costly task. In our talk, we present the Information Workbench, a self-service platform for the fast development of domain-specific linked data applications. Designed with the goal to leverage linked data deployment in the enterprise, the Information Workbench implements concepts and features for integrating internal and external information, general paradigms for the investigation and augmentation of the integrated knowledge, as well as the collaborative interaction. In the presentation, we will review features as well as the architecture of the Information Workbench in more detail, focusing on the process of building linked data applications in a self-service manner. The presentation concludes with a demo presenting a sample application built with the Information Workbench.