Forschungsseminar 2010/2011

Neue Entwicklungen im Bereich Informationssysteme

Im Rahmen dieses Forschungsseminars stellen Mitarbeiter und Studenten ihre Forschungsarbeiten auf diesem Gebiet vor. Studenten und Gäste sind herzlich eingeladen.

Das Forschungsseminar wird von Christoph Böhm koordiniert.

Allgemein

Wann: tba

Wo: tba

Termine

15.02.2011, 11:00, A.2-2
- Bridging the Vocabulary Gap between Questions and Answers Sentences in Question Answering Systems
- Saeedeh Momtazi
31.01.2011, 11:30, A.1-1
- ETL (Extract-Transform-Load) Process Recommendation
- Masters Thesis Results
- Andriy Vedrych
21.01.2011, 13:00, A.1-2
- A Flexible Index Structure for Interactive Data Profiling
- Masters Thesis Results
09.11.2010, 13:00, H-E.52
- ECIR - a Lightweight Approach for Entity-centric Information Retrieval
- practice talk for the TREC 2010 conference
- Michael Leben
25.11.2010, 11:00, H-2.57
- Self-Service Development of Linked Data Applications with the Information Workbench
- Dr. Peter Haase, Senior Architect - Research & Development @ fluid Operations AG

Abstracts

Saeedeh Momtazi: Bridging the Vocabulary Gap between Questions and Answers Sentences in Question Answering Systems
Sentence retrieval plays an important role in question answering systems. It aims to find small segments of text that contain an exact answer to users' questions rather than overwhelm them with a large number of retrieved documents which they must sort through to find the desired answer. As the search in sentence retrieval is conducted over smaller segments of data than in a document retrieval task, the problems of data sparsity and exact matching become more critical than document retrieval.
In this talk, we propose two different language modeling techniques to overcome vocabulary mismatch problem by capturing term relationships. The first method, the class-based language model, uses a word clustering algorithm to capture term relationships to deal with the data sparsity and vocabulary mismatch problems. In this model, we assume there is a relation between the terms that belong to the same cluster; as a result, they can be substituted when searching for relevant sentences. The second method, the trained trigger language model, finds pairs of trigger and target words when trained on a large corpus. If a trigger word appears in the question and a sentence contains the corresponding target word, the model considers a relation between the question and the sentence. The experimental results show that both models significantly improve sentence retrieval performance.
Michael Leben: ECIR - a Lightweight Approach for Entity-centric Information Retrieval
This paper describes our system developed for the TREC 2010 Entity track. In particular we study the exploitation of advanced features of different Web search engines to achieve high quality answers for the 'related entity finding'-task. Our system preprocesses a user query using part-ofspeech tagging and synonym dictionaries, and generates an enriched keyword query employing advanced features of the particular Web search engine. After retrieving a corpus of documents, the system constructs rules for extracting candidate entities. Potentially related entities are deduplicated and scored for each document with respect to the distance to the source entity that is defined in the query. Finally, these scores are aggregated across the corpus by incorporating the rank position of a document. For homepage retrieval we further employ advanced features of Web search engines for instance to retrieve candidate URLs by queries such as entity in anchor. Homepages are ranked by a weighted aggregation of feature vectors. The weight for each individual feature was determined beforehand using a genetic learning algorithm. We employed a commercial information extraction system as basis and implemented our system for three different web search engines. We discuss our experiments for the different web search engines and elaborate on the lessons learned.
Dr. Peter Haase: Self-Service Development of Linked Data Applications with the Information Workbench
With existing datasets growing and new data being added constantly, the Linked Open Data (LOD) cloud becomes increasingly interesting for the Enterprise, allowing companies to augment and complement internal knowledge with external information. The development of domain-specific applications that benefit from LOD repositories, though, is often a time-consuming and costly task. In our talk, we present the Information Workbench, a self-service platform for the fast development of domain-specific linked data applications.
Designed with the goal to leverage linked data deployment in the enterprise, the Information Workbench implements concepts and features for integrating internal and external information, general paradigms for the investigation and augmentation of the integrated knowledge, as well as the collaborative interaction. In the presentation, we will review features as well as the architecture of the Information Workbench in more detail, focusing on the process of building linked data applications in a self-service manner.
The presentation concludes with a demo presenting a sample application built with the Information Workbench.

Vergangene Forschungsseminare