Hasso Plattner Institut
Imprint   Data Privacy

Ralf Krestel

You are here:   Home > Publications > Workshop Papers > ENRICH 13


"Gute Arbeit": Topic Exploration and Analysis Challenges for the Corpora of German Qualitative Studies


Given their long-standing research traditions, a tremendous body of data has been collected in the social sciences by observing or interviewing people regarding their behavior, attitudes, beliefs, etc. "Sofi" is a Sociological Institute in Göttingen (Germany) which carried out a number of studies observing working situation in German automobile and shipyard industry after the rapid economic growth in post-World War II Germany - the so-called German "economic miracle". Qualitative data in form of worker interviews was collected during the period of over the last 40 years, starting from early 60's (i.e Volkswagen and German dockyard studies) and findings of these studies made a significant impact on the working situation in German industry. Intelligent access to this heritage of qualitative data would turn such data collection into a valuable source for a secondary research, e.g., for longitudinal (meta)analysis or historical investigations. By using modern information technologies the project "Gute Arbeit" aims at providing intelligent access to qualitative social science data on the subject of ``good work''. Topic modeling has gained a lot of popularity as a means for identifying and describing the topical structure of textual documents and whole corpora. However, when applied to the corpora directly, topic modelling leads to poor quality topic models due to the limited number of sociological surveys in our dataset. In our previous work we proposed "topic cropping" a fully automated process for selecting and incorporating additional domain-specific documents with similar topical content which can expand a dataset and significantly improve the quality of inferred topic models. We tested our approach on thematically close English and German document corpora and investigated that the produced results for German corpora slightly outperformed those for the English dataset.

Full Paper


BibTex Entry


Watch our new MOOC in German about hate and fake in the Internet ("Trolle, Hass und Fake-News: Wie können wir das Internet retten?") on openHPI (link).

New Photos

I added some photos from my trip to Hildesheim.

Powered by CMSimple| Template: ge-webdesign.de| Login