Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Neue Entwicklungen im Bereich Informationssysteme

Im Rahmen dieses Forschungsseminars stellen Mitarbeiter und Studenten ihre Forschungsarbeiten auf diesem Gebiet vor. Studenten und Gäste sind herzlich eingeladen.

Das Forschungsseminar wird von Christoph Böhm koordiniert.

Allgemein

Wann: Di, 15:00-16:00

Wo: A-2.2

Termine

  • 18.05.2010

    • Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing
    • Stephan Ewen, TU Berlin

  • 25.05.2010

    • Aggregated Search of Data and Services
    • Antonio Sala, Information Engineering Department of the University of Modena and Reggio Emilia, Italy

  • 15.06.2010

    • Extraction of Management Concepts from Web Sites for Sentiment Analysis
    • Arvid Heise, Masters Theses Results

  • 06.07.2010

    • Duplikaterkennung unter Verwendung Unstrukturturierter Anteile
    • David Sonnabend, Masters Theses Results

  • 13.07.2010

    • Optimizing Query Execution to Improve the Energy Efficiency of DBMS
    • Tobias Flach, Masters Theses Results

  • 13.07.2010

    • Wikipedia Cross-lingual Infobox Alignment and Conflict Detection
    • Daniel Rinser, Masters Theses Results

  • 04.08.2010

    • Towards Granular Data Placement Strategies for Cloud Platforms
    • Johannes Lorey, Practice Talk for Symposium on Cloud Computing and the Web at 2010 IEEE GrC

  • 21.09.2010

    • Finding Unique Column Combinations within a Database
    • Ziawasch Abedjan, Masters Theses Results

  • 21.09.2010

    • Sensitivity of Spatiotemporal Patterns based on Integrated Data from Distributed Environmental Sensor Networks
    • Sören Nils Haubrock, Masters Theses Results

Abstracts

  • Antonio Sala: Aggregated Search of Data and Services
    From a user perspective, data and services provide a complementary vision of an information source: data provide detailed information about specific needs, while services execute processes involving data and returning an informative result too. For this reason, users need to perform aggregated searches able to identify not only relevant data, but also services able to operate on them. At the current state of the art, such aggregated search can be performed manually only by expert users, that first identify relevant data, and then identify existing relevant services.
    We propose a semantic approach to perform aggregated search of data and services. In particular, we developed a technique that, on the basis of an ontological representation of data and services related to a domain, supports the translation of a data query into a service discovery process.
    To evaluate our approach, we developed a prototype that extends the existing MOMIS data integration system (http://www.dbgroup.unimore.it/Momis) with a new information retrieval-based web service engine called XIRE.
  • Stephan Ewen: Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing
    Data Intensive Scalable Computing is a much-investigated topic in current research. Next to parallel databases, new flavors of data processors have established themselves - most prominently the map/reduce programming and execution model. These new systems provide key features that current parallel databases lack, such as flexibility in the data models, the ability to parallelize custom functions, and fault tolerance that enables them to scale out to thousands of machines.
    We present the Nephele/PACs system - a parallel data processor centered around a programming model of so-called Parallelization Contracts (PACs) and the scalable parallel execution engine Nephele. The PAC programming model is a generalization of the well-known map/reduce programming model, extending it with additional higher-order functions and output contracts that give guarantees about the behavior of a function. A PAC program is transformed into a data flow for Nephele, which executes its sequential building blocks in parallel and provides communication, synchronization, and fault tolerance. The PACs are defined in such a way that this transformation can apply several types of optimizations on the data flow. The system as a whole is as generic as map/reduce systems, while overcoming several of their major weaknesses.
  • Arvid Heise: Extraction of Management Concepts from Web Sites for Sentiment Analysis
    Company Web sites are vital information sources for organization theorists to summarize and assess the management concepts that are implemented by the respective companies.  However, the enormous amount of data renders manual analyses virtually infeasible.  We present the Management Concept Miner, an integrated tool that continuously extracts the texts from several hundred company Web sites into a relational database and that scores the management concepts of the companies.  It comprises an incremental Web crawler, PDF and HTML text extractors, dictionary-based annotators for concept-related key phrases, and an automatic assessment of the annotated texts.
    We apply viewpoint detection and sentiment analysis techniques to the new domain of management concepts to assess the importance of the management concept to the company. The achieved results are on a comparable level to the assessment performance of domain experts. 
    Additionally, the evaluation shows that the Management Concept Miner crawls the huge amount of data resource-efficiently and extracts the texts reliably.
    The Management Concept Miner demonstrates that the data of Web sites can be exploited to summarize the management concepts of a company. 
    However, the presented design can be applied to further domains beyond management concepts.  Since the amount of publicly available opinionated data is steadily increasing, the combination of information extraction techniques with viewpoint detection has a huge potential.
  • David Sonabend: Duplikaterkennung unter Verwendung Unstrukturturierter Anteile
    In einer Vielzahl von Datensätzen, wie beispielsweise Produktdatenbanken, werden strukturierte Informationen der repräsentierten Objekte durch unstrukturierte Daten wie z.B. detaillierte Entitätsbeschreibungen ergänzt. Herkömmliche Duplikaterkennungsansätze nutzen lediglich die strukturierten Daten, um Duplikate innerhalb der Datenmenge zu identifizieren. Im Rahmen dieser Masterarbeit werden Ähnlichkeitsmaße vorgestellt, die durch das Einbeziehen solch unstrukturierter Daten die Effektivität der Duplikaterkennung maßgeblich steigern.