Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
  
Login
  • de
 

Open Data Initiatives at HPI

This page gathers open-data-related activities at HPI, in particular in the area of linked open data and the semantic web. Contributing groups are

Research Projects

  • GovWILD
    GovWILD gathers and integrates government data from diverse government data sources, presents them in a Web interface and provides the data for download.
    Received the IBM Scalable Data Analytics Award
  • ProLOD
    Profiler for Linked Open Data (ProLOD) is a Web-based tool to perform data profiling tasks on datasets such as DBpedia. (NTII 2010 paper)
  • Yovisto
    Yovisto is a semantic video search engine specialized in academic content. Yovisto's search index is based on the combination of automated content based video analysis with user generated collaborative annotation (collaborative tagging, discussions, and comments) and Linked Open Data resources such as DBpedia. In difference to traditional video search engines, Yovisto enables pinpoint access within video data by providing fine-granular, time-dependent semantic metadata being published as Linked Open Data (SAMT 2010 paper / SemSearch 2010 paper).
  • Mediaglobe
    Cultural Memory provides an ever-increasing amount of information. Only a negligible small portion of this content is already accessable in the World Wide Web. Mediaglobe is a funded by the Federal Ministry of Economics and Technology within the context of the THESEUS research program 'New technologies for the internet of services' and aims to take media archives into the digital future providing availability and usability for the growing stock of audiovisual documents concerning German contemporary history. Mediaglobe offers state-of-the-art services for multimedia analysis and semantic multimedia search based on mappings to Linked Open Data resources on the web.
  • Stratosphere
    The DFG Research Unit (Forschergruppe) is developing a cloud-based data management system. One of the use cases is to query and cleanse linked open data.
  • iPopulator
    With iPopulator, we have introduced a system that automatically populates infoboxes of Wikipedia articles by extracting attribute values from the article's text.  In contrast to prior work, iPopulator detects and exploits the structure of attribute values to independently extract value parts. We have published a set of extracted facts from Wikipedia. (CIKM 2010 paper / technical report)

Publications

  • Langer, P. et al., 2014. Assigning Global Relevance Scores to DBpedia Facts. InInternational Workshop on Data Engineering meets the Semantic Web (DESWeb).Chicago, IL.
     
  • Abedjan, Z. & Naumann, F., 2014. Amending RDF Entities with New Facts. InKnow@LOD Workshop in conjunction with ESWC.Creete, Greece.
     
  • Abedjan, Z. et al., 2014. Profiling and Mining RDF Data with ProLOD++. InProceedings of the IEEE International Conference on Data Engineering (ICDE), Demo.Chicago, IL.
     
  • Lorey, J. & Naumann, F., 2013. Detecting SPARQL Query Templates for Data Prefetching. InProceedings of the 10th Extended Semantic Web Conference (ESWC).Montpellier, France.
     
  • Abedjan, Z. & Naumann, F., 2013. Synonym Analysis for Predicate Expansion. InProceedings of the Extended Semantic Web Conference (ESWC), Montpellier, France.
     
  • Lorey, J., 2013. SPARQL Endpoint Metrics for Quality-Aware Linked Data Consumption. InProceedings of the 15th International Conference on Information Integration and Web-based Applications & Services (iiWAS '13).Vienna, Austria.
     
  • Abedjan, Z. & Naumann, F., 2013. Improving RDF Data through Association Rule Mining. Datenbank-Spektrum (Special Issue on RDF Data Management), 13(2), Bl 111--120.
     
  • Lorey, J. & Naumann, F., 2013. Caching and Prefetching Strategies for SPARQL Queries. InProceedings of the 3rd International Workshop on Usage Analysis and the Web of Data (USEWOD).Montpellier, France.
     
  • Lorey, J., 2013. Storing and Provisioning Linked Data as a Service. InProceedings of the 10th Extended Semantic Web Conference (ESWC).Montpellier, France.
     
  • Lorey, J. & Naumann, F., 2013. Caching and Prefetching Strategies for SPARQL Queries. InESWC 2013 Satellite Events -- Revised Selected Papers.Montpellier, France.
     
  • Abedjan, Z., Lorey, J. & Naumann, F., 2012. Reconciling Ontologies and the Web of Data. InProceedings of the 21st International Conference on Information and Knowledge Management (CIKM).Maui, Hawaii, USA, , Bll 1532-1536.
     
  • Böhm, C., Kasneci, G. & Naumann, F., 2012. Latent Topics in Graph-Structured Data. InProceedings of the Conference on Information and Knowledge Management (CIKM).
     
  • Böhm, C. et al., 2012. GovWILD: Integrating Open Government Data for Transparency (demo). InProceedings of the International World Wide Web Conference (WWW).Lyon, France.
     
  • Böhm, C. et al., 2012. LINDA: Distributed Web-of-Data-Scale Entity Matching. InProceedings of the International Conference on Information and Knowledge Management (CIKM), Maui, Hawaii.
     
  • Gruetze, T., Böhm, C. & Naumann, F., 2012. Holistic and Scalable Ontology Alignment for Linked Open Data. InProceedings of the 5th Linked Data on the Web (LDOW) Workshop at the 21th International World Wide Web Conference (WWW).Lyon, France.
     
  • Abedjan, Z. & Naumann, F., 2011. Context and Target Configurations for Mining RDF Data. InInternational Workshop on Search & Mining Entity-Relationship Data (SMER), Glasgow, UK.
     
  • Lorey, J. et al., 2011. RDF Ontology (Re-)Engineering through Large-scale Data Mining. InBillion Triples Challenge (BTC) at the 10th International Semantic Web Conference (ISWC).Koblenz, Germany.
     
  • Böhm, C., Lorey, J. & Naumann, F., 2011. Creating voiD Descriptions for Web-scale Data. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 9(3), Bll 339-345.
     
  • Böhm, C. et al., 2010. Profiling linked open data with ProLOD. InWorkshops Proceedings of the 26th International Conference on Data Engineering (ICDE), Long Beach, CA., , Bll 175-178.
     
  • Lange, D., Böhm, C. & Naumann, F., 2010. Extracting structured information from Wikipedia articles to populate infoboxes, Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam.
     
  • Böhm, C. et al., 2010. Linking open government data: what journalists wish they had known. InProceedings the 6th International Conference on Semantic Systems (I-SEMANTICS), Graz, Austria. Available at: http://portal.acm.org/citation.cfm?id=1839751.
     
  • Böhm, C. et al., 2010. Creating voiD Descriptions for Web-Scale Data. InBillion Triples Challenge (BTC) at the 9th International Semantic Web Conference (ISWC).Shanghai, China.
     
  • Lange, D., Böhm, C. & Naumann, F., 2010. Extracting structured information from Wikipedia articles to populate infoboxes. InProceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM).Toronto, Canada, , Bll 1661-1664.
     
  • Böhm, C., Groth, P. & Leser, U., 2009. Graph-Based Ontology Construction from Heterogeneous Evidences. InProceedings of the International Semantic Web Conference (ISWC)., , Bll 91-96.
     

Courses

Master's Theses

  • Learning to Extract Structured Information from Wikipedia Articles to Populate Infoboxes (Dustin Lange, 2009) (see also CIKM 2010 paper and tech report)
  • Wikipedia cross-lingual Concept Identification and Infobox Alignment (Daniel Rinser, 2010)
  • Strategies for structure-based rewriting of SPARQL queries for data prefetching (Armin Zamani, 2013)

Bilder