1.
Borchert, F., Llorca, I., Roller, R., Arnrich, B., Schapranow, M.-P.: xMEN: A Modular Toolkit for Cross-Lingual Medical Entity Normalization. arXiv preprint arXiv:2310.11275. (2023).
2.
Borchert, F., Llorca, I., Schapranow, M.-P.: Cross-Lingual Candidate Retrieval and Re-ranking for Biomedical Entity Linking. In: Arampatzis, A., Kanoulas, E., Tsikrika, T., Vrochidis, S., Giachanou, A., Li, D., Aliannejadi, M., Vlachos, M., Faggioli, G., en Ferro, N. (reds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction. bll. 135–147. Springer Nature Switzerland, Cham (2023).
3.
Borchert, F., Llorca, I., Schapranow, M.-P.: HPI-DHC @ BC8 SympTEMIST Track: Detection and Normalization of Symptom Mentions with SpanMarker and xMEN. In: Islamaj, R., Arighi, C., Campbell, I., Gonzalez-Hernandez, G., Hirschman, L., Krallinger, M., Lima-López, S., Weissenbacher, D., en Lu, Z. (reds.) Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models. , New Orleans, LA (2023).
4.
Fox, S., Preiß, M., Borchert, F., Rasheed, A., Schapranow, M.-P.: HPIDHC at NTCIR-17 MedNLP-SC: Data Augmentation and Ensemble Learning for Multilingual Adverse Drug Event Detection. NTCIR 17 Conference: Proceedings of the 17th NTCIR Conference on Evaluation of Information Access Technologies. bll. 185–192. , Tokyo, Japan (2023).
5.
Hugo, J., Ibing, S., Borchert, F., Sachs, J.P., Cho, J., Ungaro, R.C., Böttinger, E.P.: Machine Learning Based Prediction of Incident Cases of Crohn’s Disease Using Electronic Health Records from a Large Integrated Health System. In: Juarez, J.M., Marcos, M., Stiglic, G., en Tucker, A. (reds.) Artificial Intelligence in Medicine. bll. 293–302. Springer Nature Switzerland, Cham (2023).
6.
Kämmer, N., and Borchert, F., and Winkler, S., and de Melo, G., and Schapranow, M.-P.: Resolving Elliptical Compounds in German Medical Text. The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. bll. 292–305. Association for Computational Linguistics, Toronto, Canada (2023).
7.
Ladas, N., Borchert, F., Franz, S., Rehberg, A., Strauch, N., Sommer, K.K., Marschollek, M., Gietzelt, M.: Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts. Health Informatics Journal. 29, 14604582231164696 (2023).
8.
Llorca, I., Borchert, F., Schapranow, M.-P.: A Meta-dataset of German Medical Corpora: Harmonization of Annotations and Cross-corpus NER Evaluation. Proceedings of the 5th Clinical Natural Language Processing Workshop. bll. 171–181. Association for Computational Linguistics, Toronto, Canada (2023).
Over the last years, an increasing number of publicly available, semantically annotated medical corpora have been released for the German language. While their annotations cover comparable semantic classes, the synergies of such efforts have not been explored, yet. This is due to substantial differences in the data schemas (syntax) and annotated entities (semantics), which hinder the creation of common meta-datasets. For instance, it is unclear whether named entity recognition (NER) taggers trained on one or more of such datasets are useful to detect entities in any of the other datasets. In this work, we create harmonized versions of German medical corpora using the BigBIO framework, and make them available to the community. Using these as a meta-dataset, we perform a series of cross-corpus evaluation experiments on two settings of aligned labels. These consist in fine-tuning various pre-trained Transformers on different combinations of training sets, and testing them against each dataset separately. We find that a) trained NER models generalize poorly, with F1 scores dropping approx. 20 pp. on unseen test data, and b) current pre-trained Transformer models for the German language do not systematically alleviate this issue. However, our results suggest that models benefit from additional training corpora in most cases, even if these belong to different medical fields or text genres.
9.
Richter-Pechanski, P., Wiesenbach, P., Schwab, D.M., Kiriakou, C., He, M., Allers, M.M., Tiefenbacher, A.S., Kunz, N., Martynova, A., Spiller, N., Mierisch, J., Borchert, F., Schwind, C., Frey, N., Dieterich, C., Geis, N.A.: A Distributable German Clinical Corpus Containing Cardiovascular Clinical Routine Doctor’s Letters. Scientific Data. 10, 207 (2023).
We present CARDIO:DE, the first freely available and distributable large German clinical corpus from the cardiovascular domain. CARDIO:DE encompasses 500 clinical routine German doctor's letters from Heidelberg University Hospital, which were manually annotated. Our prospective study design complies well with current data protection regulations and allows us to keep the original structure of clinical documents consistent. In order to ease access to our corpus, we manually de-identified all letters. To enable various information extraction tasks the temporal information in the documents was preserved. We added two high-quality manual annotation layers to CARDIO:DE, (1) medication information and (2) CDA-compliant section classes. To the best of our knowledge, CARDIO:DE is the first freely available and distributable German clinical corpus in the cardiovascular domain. In summary, our corpus offers unique opportunities for collaborative and reproducible research on natural language processing models for German clinical texts.
10.
Schapranow, M.-P., Borchert, F., Bougatf, N., Hund, H., Eils, R.: Software-Tool Support for Collaborative, Virtual, Multi-Site Molecular Tumor Boards. SN Computer Science. 4, 358 (2023).
11.
Schmidt, L., Ibing, S., Borchert, F., Hugo, J., Marshall, A., Peraza, J., Cho, J.H., Böttinger, E.P., Ungaro, R.C.: Extraction of Crohn’s Disease Clinical Phenotypes from Clinical Text Using Natural Language Processing. medRxiv. (2023).
12.
Steckhan, N., Ring, R., Borchert, F., Koppold, D.A.: Triangulation of Questionnaires, Qualitative Data and Natural Language Processing: A Differential Approach to Religious Bahá’í Fasting in Germany. Journal of Religion and Health. (2023).
Approaches to integrating mixed methods into medical research are gaining popularity. To get a holistic understanding of the effects of behavioural interventions, we investigated religious fasting using a triangulation of quantitative, qualitative, and natural language analysis. We analysed an observational study of Bahá'í fasting in Germany using a between-method triangulation that is based on links between qualitative and quantitative analyses. Individual interviews show an increase in the mindfulness and well-being categories. Sentiment scores, extracted from the interviews through natural language processing, positively correlate with questionnaire outcomes on quality of life (WHO-5: Spearman correlation r = 0.486, p = 0.048). Five questionnaires contribute to the first principal component capturing the spectrum of mood states (50.1% explained variance). Integrating the findings of the between-method triangulation enabled us to converge on the underlying effects of this kind of intermittent fasting.
13.
Steinwand, S., Borchert, F., Winkler, S., Schapranow, M.-P.: GGTWEAK: Gene Tagging with Weak Supervision for German Clinical Text. In: Juarez, J.M., Marcos, M., Stiglic, G., en Tucker, A. (reds.) Artificial Intelligence in Medicine. bll. 183–192. Springer Nature Switzerland, Cham (2023).