Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI

Anne Radunski

Research Associate, Doctoral Candidate

Chair of Data Analytics and Computational Statistics
Hasso Plattner Institute

Office: Campus I, Room K-E.09/10
Tel.: +49 (0)331 5509 4933
Email: Anne.Radunski(at)hpi.de

Links: Webpage | LinkedIn | Twitter

Supervisor: Prof. Dr. Katharina Hölzle, MBA

Research Interests

My research focuses on the analysis of entrepreneurs' mental health focusing on textual data, and more generally, the application of data science methodologies in management research.

> Natural Language Processing

> Emotion Analysis

> Management Research

> Mental Health


Umland, J., Hölzle, K., Maul, V., Rose, R., Radunski, A. (2022). It's Twitter TIME – Mapping the Structure of the Scholarly Twitter Network in the Field of Technology and Innovation Management & Entrepreneurship. Accepted paper at Journal of Product Innovation Management (JPIM) Research Forum, Orlando, United States.

Petzolt, S., Radunski, A., Fox, D., Hölzle, K. (2022). Using Twitter Data as a Proxy for Trend Detection and Analysis of Small and Medium-sized Enterprises in the Digital Transformation. Accepted paper at 25. Interdisziplinäre Jahreskonferenz Zu Entrepreneurship, Innovation Und Mittelstand (G-Forum), Dresden, Germany.

Hölzle, K., Petzolt, S., Radunski, A., Fox, D., Kulik, O. (2022). Technologie-Trendreport – Identifikation von Trends fuer kleine und mittlere Unter- nehmen im digitalen Wandel — eine Analyse auf Basis von Twitterdaten. Mittelstand-Digital Zentrum Berlin. [Report]



If you are curious about my research interests and would like to write your thesis with me, please feel free to contact me.


  • Master's thesis of Jonas Umland (#WhoFollowsWhom? Mapping the Diversity of the Entrepreneurship Scholar Network on Social Media)



  • DAKI-FWS - Responsible for developing the business model for the AI-based early warning system focusing on German SMEs.

Current Research Project


In an unstable and critical time marked by the COVID-19 pandemic, the Russian war against Ukraine, and the climate crisis, imminent changes and constraints are inevitable. Especially entrepreneurs develop psychological issues due to the fear of the increasing inflation and the unstable economic situation resulting from the ongoing crises [1]. However, the role of entrepreneurs in the economy is essential, as they are drivers of innovation for society and create the conditions for future growth. In this research project, we aim to address the issue of how the current global crises influence the state of mind of entrepreneurs who have just started their business (novices) and those who have already founded a business (experts). Our longitudinal study employs a qualitative approach that includes multiple interviews to explore perceptions of the current global crises. Based on the collected data, annotated transcriptions are used to identify the mentioned emotions according to Plutchik's eight basic emotions (fear, anger, joy, sadness, acceptance, disgust, anticipation, and surprise), to determine their intensity as well as the underlying causes, and to distinguish whether these emotions are positive or negative.

Research Design


[1] Torrès, O., Benzari, A., Fisch, C., Mukerjee, J., Swalhi, A., & Thurik, R. (2022). Risk of burnout in french entrepreneurs during the covid-19 crisis. Small Business Economics, 58, 1–23. https://doi.org/10.1007/ s11187-021-00516-2

Former Research Project


Technologies constantly change, and identifying digital trends is a major challenge for organizations in their digital transformation. 
Digital trends are indicators of an organization's strategic future but are often overlooked or misinterpreted, posing a major challenge for SMEs. Being aware of digital trends is a crucial pillar to coping with the challenges of digital transformation and staying competitive. 
This study examined apparent trends for SMEs based on a comprehensive Twitter dataset over the past five years (2016-2021). We used tweets of the German "Mittelstand-Digital" initiative supporting SMEs in transforming digitally. Initially, we used topic modeling to identify general trends for German SMEs.  In this first analysis, we discovered that the digitalization of SMEs, digital business transformation, online qualification, knowledge transfer, and digital foresight are the most important drivers for digital growth. Furthermore, in a second analysis, we investigated which predefined technologies are particularly relevant for the digital transformation of SMEs. We derived five SME-relevant technology clusters (artificial intelligence, sustainability, production, cryptography, and autonomous systems) and associated technologies. Our work provides recommendations for managers, practitioners as well as policymakers and contributes to the question of how SMEs can manage their digital transformation successfully.

Process for identifying general trends for German SMEs.

Data Prepossessing

After collecting the data, we performed several prepossessing steps to remove unnecessary information and noise from the text using the following methods:
1) Using the stemming method, certain words are reduced to their root word and the derived affixes are shortened.
For instance, the words "connected", "connecting", and "connection" can all be shortened to the root word connect.
2) In addition, we used the lemmatization method, which, like stemming, reduces certain words to their root word, but transforms the root word by using vocabulary and morphology analysis. Essentially, with the lemmatization method, it is possible to recognize that certain words like "better" include "good" as its lemma.
3) Removing URLs, white spaces, special characters, and process the text to lower case.
4) Implementing the stop word method to remove German and English stop words, and implementing the tokenizer method to split the text paragraph into individual words for the feature extraction process.

For the feature extraction, we will use the following two methods: Bag-of-Words (BoW), and Term Frequency-Inverse Document Frequency (TF-IDF). To use the annotated text for the topic modeling and the classification, we have to convert this text to a vector of numbers. For the conversion of text to numbers, we use the BoW model, which counts the number of occurrences of word features in a document [1]. Additionally, we will use N-gram modeling to convert text from an unstructured format to a structured format. N-gram is a collection of n successive items in a text document and represents unigram (1-gram), bigram (2-gram), and trigram (3-gram) elements [2].
For our purpose, word sequences are very relevant, and therefore it is crucial to use the n-gram model in combination with the bag-of-words model to obtain more concise features. However, the bag-of-words model has several limitations because the BoW model includes only the most frequent words in a given corpus, leading to the conclusion that the underlying representation of the document is almost the same, as frequent words dominate the document. Therefore, to enhance the performance of the topic modeling, we will create a TF-IDF matrix from our bag-of-words model. The highest weight of TF-IDF occurs when a word has high term frequency (TF) in any document and low document frequency (DF) of the word in the entire dataset. The result is a term-by-document matrix X whose columns contain the TF-IDF values for each of the documents in the corpus [3]. Thus, terms t that occur frequently in any document d are ranked low, even if they occur very often because they have no particular relevance to that document d. However, if terms t occurs very often in one document d, while it does not occur so often in other documents D, it probably means that it is very relevant. 


To uncover common trends and the underlying narrative from the collected tweets, topic models such as Latent Dirichlet Allocation (LDA) [4] and Non-Negative Matrix Factorization (NMF) [5] have proven to be powerful unsupervised techniques [6]. Since our dataset consisted of relatively short tweets, we used the NMF model, which focuses on reducing the dimensionality of the dataset. In terms of dimensionality reduction, NMF can be applied as a statistical analysis of multivariate data. According to [5], NMF is considered for high-dimensional data where each element has a non-negative value, and it provides a lower rank approximation formed by factors whose elements are also non-negative. Therefore, given the data as a matrix V and a set of multivariate n-dimensional data vectors, the vectors are placed in the columns of n x m where m is the number of data points. To reduce the n original dimensions, the matrix is approximately factorized into an n x r sub-matrix W and an r x m sub-matrix H. The result is a compressed version of the original data matrix V.


Common trends and the correlated words from the collected Twitter dataset.


We contribute to the debate on the impact that computer science methodologies can have upon entrepreneurship research [7]. While many entrepreneurship researchers recognize the promise of computer science methodologies such as artificial intelligence, only a few use them purposefully. However, the use of computer science methodologies contributes significantly to original, rigorous theoretical and empirical research on all aspects of entrepreneurship. We have presented how natural language processing as a computer science methodology can provide beneficial insights into the field of entrepreneurship by predicting emerging trends on Twitter as a social media platform. In particular, various communication platforms provide publicly available Big Data to analyze and understand individuals' opinions. Therefore, in order to gain meaningful insights from such unstructured textual data, it is essential to rely on computer science methodologies. With our study, we highlighted that the application of computer science methodologies in entrepreneurship research enables researchers to empirically test the theoretical knowledge on a different database to conduct innovative research projects.


Petzolt, S., Radunski, A., Fox, D., Hölzle, K. (2022). Using Twitter Data as a Proxy for Trend Detection and Analysis of Small and Medium-sized Enterprises in the Digital Transformation. Accepted paper at 25. Interdisziplinäre Jahreskonferenz Zu Entrepreneurship, Innovation Und Mittelstand (G-Forum), Dresden, Germany.

Hölzle, K., Petzolt, S., Radunski, A., Fox, D., Kulik, O. (2022). Technologie-Trendreport – Identifikation von Trends fuer kleine und mittlere Unter- nehmen im digitalen Wandel — eine Analyse auf Basis von Twitterdaten. Mittelstand-Digital Zentrum Berlin. [Report]


[1] Wallach, H.M.: Topic modeling: Beyond bag-of-words, 977–984 (2006). https://doi.org/10.1145/1143844.1143967

[2] Robertson, A.M., Willett, P.: Applications of n-grams in textual information systems. Journal of Documentation 54(1), 48–67 (1998). https: //doi.org/10.1108/EUM0000000007161

[3] Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0

[4] Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation, vol. 3, pp. 601–608 (2001)

[5] F ́evotte, C., Idier, J.: Algorithms for nonnegative matrix factorizationwith the β-divergence. Neural Computation 23(9), 2421–2456 (2011). https://doi.org/10.1162/NECO a 00168

[6] Grootendorst, M.: Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794 (2022)

[7] L ́evesque, M., Obschonka, M., Nambisan, S.: Pursuing impactfulentrepreneurship research using artificial intelligence. Entrepreneurship Theory and Practice 46(4), 803–832 (2022). https://doi.org/10.1177/1042258720927369