1.
Borchert, F., Llorca, I., Schapranow, M.-P.: Cross-Lingual Candidate Retrieval and Re-ranking for Biomedical Entity Linking. In: Arampatzis, A., Kanoulas, E., Tsikrika, T., Vrochidis, S., Giachanou, A., Li, D., Aliannejadi, M., Vlachos, M., Faggioli, G., en Ferro, N. (reds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction. bll. 135–147. Springer Nature Switzerland, Cham (2023).
2.
Borchert, F., Llorca, I., Schapranow, M.-P.: HPI-DHC @ BC8 SympTEMIST Track: Detection and Normalization of Symptom Mentions with SpanMarker and xMEN. In: Islamaj, R., Arighi, C., Campbell, I., Gonzalez-Hernandez, G., Hirschman, L., Krallinger, M., Lima-López, S., Weissenbacher, D., en Lu, Z. (reds.) Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models. , New Orleans, LA (2023).
3.
Fox, S., Preiß, M., Borchert, F., Rasheed, A., Schapranow, M.-P.: HPIDHC at NTCIR-17 MedNLP-SC: Data Augmentation and Ensemble Learning for Multilingual Adverse Drug Event Detection. NTCIR 17 Conference: Proceedings of the 17th NTCIR Conference on Evaluation of Information Access Technologies. bll. 185–192. , Tokyo, Japan (2023).
4.
Hugo, J., Ibing, S., Borchert, F., Sachs, J.P., Cho, J., Ungaro, R.C., Böttinger, E.P.: Machine Learning Based Prediction of Incident Cases of Crohn’s Disease Using Electronic Health Records from a Large Integrated Health System. In: Juarez, J.M., Marcos, M., Stiglic, G., en Tucker, A. (reds.) Artificial Intelligence in Medicine. bll. 293–302. Springer Nature Switzerland, Cham (2023).
5.
Kämmer, N., and Borchert, F., and Winkler, S., and de Melo, G., and Schapranow, M.-P.: Resolving Elliptical Compounds in German Medical Text. The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. bll. 292–305. Association for Computational Linguistics, Toronto, Canada (2023).
6.
Ladas, N., Borchert, F., Franz, S., Rehberg, A., Strauch, N., Sommer, K.K., Marschollek, M., Gietzelt, M.: Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts. Health Informatics Journal. 29, 14604582231164696 (2023).
7.
Llorca, I., Borchert, F., Schapranow, M.-P.: A Meta-dataset of German Medical Corpora: Harmonization of Annotations and Cross-corpus NER Evaluation. Proceedings of the 5th Clinical Natural Language Processing Workshop. bll. 171–181. Association for Computational Linguistics, Toronto, Canada (2023).
Over the last years, an increasing number of publicly available, semantically annotated medical corpora have been released for the German language. While their annotations cover comparable semantic classes, the synergies of such efforts have not been explored, yet. This is due to substantial differences in the data schemas (syntax) and annotated entities (semantics), which hinder the creation of common meta-datasets. For instance, it is unclear whether named entity recognition (NER) taggers trained on one or more of such datasets are useful to detect entities in any of the other datasets. In this work, we create harmonized versions of German medical corpora using the BigBIO framework, and make them available to the community. Using these as a meta-dataset, we perform a series of cross-corpus evaluation experiments on two settings of aligned labels. These consist in fine-tuning various pre-trained Transformers on different combinations of training sets, and testing them against each dataset separately. We find that a) trained NER models generalize poorly, with F1 scores dropping approx. 20 pp. on unseen test data, and b) current pre-trained Transformer models for the German language do not systematically alleviate this issue. However, our results suggest that models benefit from additional training corpora in most cases, even if these belong to different medical fields or text genres.
8.
Richter-Pechanski, P., Wiesenbach, P., Schwab, D.M., Kiriakou, C., He, M., Allers, M.M., Tiefenbacher, A.S., Kunz, N., Martynova, A., Spiller, N., Mierisch, J., Borchert, F., Schwind, C., Frey, N., Dieterich, C., Geis, N.A.: A Distributable German Clinical Corpus Containing Cardiovascular Clinical Routine Doctor’s Letters. Scientific Data. 10, 207 (2023).
We present CARDIO:DE, the first freely available and distributable large German clinical corpus from the cardiovascular domain. CARDIO:DE encompasses 500 clinical routine German doctor's letters from Heidelberg University Hospital, which were manually annotated. Our prospective study design complies well with current data protection regulations and allows us to keep the original structure of clinical documents consistent. In order to ease access to our corpus, we manually de-identified all letters. To enable various information extraction tasks the temporal information in the documents was preserved. We added two high-quality manual annotation layers to CARDIO:DE, (1) medication information and (2) CDA-compliant section classes. To the best of our knowledge, CARDIO:DE is the first freely available and distributable German clinical corpus in the cardiovascular domain. In summary, our corpus offers unique opportunities for collaborative and reproducible research on natural language processing models for German clinical texts.
9.
Schapranow, M.-P., Borchert, F., Bougatf, N., Hund, H., Eils, R.: Software-Tool Support for Collaborative, Virtual, Multi-Site Molecular Tumor Boards. SN Computer Science. 4, 358 (2023).
10.
Schmidt, L., Ibing, S., Borchert, F., Hugo, J., Marshall, A., Peraza, J., Cho, J.H., Böttinger, E.P., Ungaro, R.C.: Extraction of Crohn’s Disease Clinical Phenotypes from Clinical Text Using Natural Language Processing. medRxiv. (2023).
11.
Steinwand, S., Borchert, F., Winkler, S., Schapranow, M.-P.: GGTWEAK: Gene Tagging with Weak Supervision for German Clinical Text. In: Juarez, J.M., Marcos, M., Stiglic, G., en Tucker, A. (reds.) Artificial Intelligence in Medicine. bll. 183–192. Springer Nature Switzerland, Cham (2023).