Florian Borchert, M.Sc.

Research Assistant, PhD Candidate

Phone:	+49 (331) 5509-4839
Fax:	+49 (331) 5509-163
Mail:	florian.borchert(at)hpi.de
Room:	G-2.2.16 (Campus III)
Personal Website:	florianborchert.de

Research Interests

Clinical NLP
Entity Linking
Medical Evidence Synthesis & Clinical Guidelines

Projects

Teaching

Data Management for Digital Health (Lecture)
Hands-on Artificial Intelligence for Digital Health (Seminar)
Trends and Concepts in Digital Health (Seminar)

Awards & Competitions

SympTEMIST Shared Task @ BioCreative VIII (2023): 1st Place in Entity Linking Track
DisTEMIST Shared Task @ BioASQ 10 (2022): 1st Place in Entity Linking Track
Distinguished Paper Award - AMIA 2021 Annual Symposium
1st Place - SMS Industry Data Challenge (2017)
Best Master's Thesis - Department of CS at HU Berlin (2017)

Publications

2024

Borchert, F., Llorca, I., Schapranow, M.-P.: Improving biomedical entity linking for complex entity mentions with LLM-based text simplification. Database. 2024, baae067 (2024).

[ Abstract ] [ BibTeX ] [ URL ]

@article{10.1093/database/baae067,
  abstract = {{Large amounts of important medical information are captured in free-text documents in biomedical research and within healthcare systems, which can be made accessible through natural language processing (NLP). A key component in most biomedical NLP pipelines is entity linking, i.e. grounding textual mentions of named entities to a reference of medical concepts, usually derived from a terminology system, such as the Systematized Nomenclature of Medicine Clinical Terms. However, complex entity mentions, spanning multiple tokens, are notoriously hard to normalize due to the difficulty of finding appropriate candidate concepts. In this work, we propose an approach to preprocess such mentions for candidate generation, building upon recent advances in text simplification with generative large language models. We evaluate the feasibility of our method in the context of the entity linking track of the BioCreative VIII SympTEMIST shared task. We find that instructing the latest Generative Pre-trained Transformer model with a few-shot prompt for text simplification results in mention spans that are easier to normalize. Thus, we can improve recall during candidate generation by 2.9 percentage points compared to our baseline system, which achieved the best score in the original shared task evaluation. Furthermore, we show that this improvement in recall can be fully translated into top-1 accuracy through careful initialization of a subsequent reranking model. Our best system achieves an accuracy of 63.6\\% on the SympTEMIST test set. The proposed approach has been integrated into the open-source xMEN toolkit, which is available online via https://github.com/hpi-dhc/xmen.}},
  author = {Borchert, Florian and Llorca, Ignacio and Schapranow, Matthieu-P},
  journal = {Database},
  keywords = {nlp myown sys:relevantfor:dhc gemtex highmed},
  month = {07},
  pages = {baae067},
  title = {Improving biomedical entity linking for complex entity mentions with {LLM}-based text simplification},
  volume = 2024,
  year = 2024
}

Bressem, K.K., Papaioannou, J.-M., Grundmann, P., Borchert, F., Adams, L.C., Liu, L., Busch, F., Xu, L., Loyen, J.P., Niehues, S.M., Augustin, M., Grosser, L., Makowski, M.R., Aerts, H.J., Löser, A.: medBERT.de: A Comprehensive German BERT Model for the Medical Domain. Expert Systems with Applications. 121598 (2024).

[ Abstract ] [ BibTeX ] [ URL ]

2023

Borchert, F., Llorca, I., Roller, R., Arnrich, B., Schapranow, M.-P.: xMEN: A Modular Toolkit for Cross-Lingual Medical Entity Normalization. arXiv preprint arXiv:2310.11275. (2023).

[ BibTeX ] [ URL ]

Borchert, F., Llorca, I., Schapranow, M.-P.: Cross-Lingual Candidate Retrieval and Re-ranking for Biomedical Entity Linking. In: Arampatzis, A., Kanoulas, E., Tsikrika, T., Vrochidis, S., Giachanou, A., Li, D., Aliannejadi, M., Vlachos, M., Faggioli, G., en Ferro, N. (reds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction. bll. 135–147. Springer Nature Switzerland, Cham (2023).

[ BibTeX ] [ URL ]

Borchert, F., Llorca, I., Schapranow, M.-P.: HPI-DHC @ BC8 SympTEMIST Track: Detection and Normalization of Symptom Mentions with SpanMarker and xMEN. In: Islamaj, R., Arighi, C., Campbell, I., Gonzalez-Hernandez, G., Hirschman, L., Krallinger, M., Lima-López, S., Weissenbacher, D., en Lu, Z. (reds.) Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models. , New Orleans, LA (2023).

[ BibTeX ] [ URL ]

Fox, S., Preiß, M., Borchert, F., Rasheed, A., Schapranow, M.-P.: HPIDHC at NTCIR-17 MedNLP-SC: Data Augmentation and Ensemble Learning for Multilingual Adverse Drug Event Detection. NTCIR 17 Conference: Proceedings of the 17th NTCIR Conference on Evaluation of Information Access Technologies. bll. 185–192. , Tokyo, Japan (2023).

[ BibTeX ] [ URL ]

Hugo, J., Ibing, S., Borchert, F., Sachs, J.P., Cho, J., Ungaro, R.C., Böttinger, E.P.: Machine Learning Based Prediction of Incident Cases of Crohn’s Disease Using Electronic Health Records from a Large Integrated Health System. In: Juarez, J.M., Marcos, M., Stiglic, G., en Tucker, A. (reds.) Artificial Intelligence in Medicine. bll. 293–302. Springer Nature Switzerland, Cham (2023).

[ BibTeX ] [ URL ] [ Download ]

Kämmer, N., and Borchert, F., and Winkler, S., and de Melo, G., and Schapranow, M.-P.: Resolving Elliptical Compounds in German Medical Text. The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. bll. 292–305. Association for Computational Linguistics, Toronto, Canada (2023).

[ BibTeX ] [ URL ]

Ladas, N., Borchert, F., Franz, S., Rehberg, A., Strauch, N., Sommer, K.K., Marschollek, M., Gietzelt, M.: Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts. Health Informatics Journal. 29, 14604582231164696 (2023).

[ BibTeX ] [ URL ]

Llorca, I., Borchert, F., Schapranow, M.-P.: A Meta-dataset of German Medical Corpora: Harmonization of Annotations and Cross-corpus NER Evaluation. Proceedings of the 5th Clinical Natural Language Processing Workshop. bll. 171–181. Association for Computational Linguistics, Toronto, Canada (2023).

[ Abstract ] [ BibTeX ] [ URL ]

Richter-Pechanski, P., Wiesenbach, P., Schwab, D.M., Kiriakou, C., He, M., Allers, M.M., Tiefenbacher, A.S., Kunz, N., Martynova, A., Spiller, N., Mierisch, J., Borchert, F., Schwind, C., Frey, N., Dieterich, C., Geis, N.A.: A Distributable German Clinical Corpus Containing Cardiovascular Clinical Routine Doctor’s Letters. Scientific Data. 10, 207 (2023).

[ Abstract ] [ BibTeX ] [ URL ]

10.

Schapranow, M.-P., Borchert, F., Bougatf, N., Hund, H., Eils, R.: Software-Tool Support for Collaborative, Virtual, Multi-Site Molecular Tumor Boards. SN Computer Science. 4, 358 (2023).

[ BibTeX ] [ URL ]

11.

Schmidt, L., Ibing, S., Borchert, F., Hugo, J., Marshall, A., Peraza, J., Cho, J.H., Böttinger, E.P., Ungaro, R.C.: Extraction of Crohn’s Disease Clinical Phenotypes from Clinical Text Using Natural Language Processing. medRxiv. (2023).

[ BibTeX ] [ URL ]

12.

Steckhan, N., Ring, R., Borchert, F., Koppold, D.A.: Triangulation of Questionnaires, Qualitative Data and Natural Language Processing: A Differential Approach to Religious Bahá’í Fasting in Germany. Journal of Religion and Health. (2023).

[ Abstract ] [ BibTeX ] [ URL ]

13.

Steinwand, S., Borchert, F., Winkler, S., Schapranow, M.-P.: GGTWEAK: Gene Tagging with Weak Supervision for German Clinical Text. In: Juarez, J.M., Marcos, M., Stiglic, G., en Tucker, A. (reds.) Artificial Intelligence in Medicine. bll. 183–192. Springer Nature Switzerland, Cham (2023).

[ BibTeX ] [ URL ] [ Download ]

2022

Borchert, F., Lohr, C., Modersohn, L., Witt, J., Langer, T., Follmann, M., Gietzelt, M., Arnrich, B., Hahn, U., Schapranow, M.-P.: GGPONC 2.0 - The German Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline NER Taggers. Proceedings of the Language Resources and Evaluation Conference. bll. 3650–3660. European Language Resources Association, Marseille, France (2022).

[ Abstract ] [ BibTeX ] [ URL ]

Borchert, F., Schapranow, M.-P.: HPI-DHC @ BioASQ DisTEMIST: Spanish Biomedical Entity Linking with Pre-trained Transformers and Cross-lingual Candidate Retrieval. Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum. bll. 244–258. , Bologna, Italy (2022).

[ BibTeX ] [ URL ]

Henkenjohann, R., Bergner, B., Borchert, F., Bougatf, N., Hund, H., Eils, R., Schapranow, M.-P.: An Engineering Approach towards Multi-Site Virtual Molecular Tumor Board Software Support. In: Pissaloux, E., Papadopoulos, G., Achilleos, A., en Velázquez, R. (reds.) ICT for Health, Accessibility and Wellbeing. IHAW 2021. bll. 156–170. Springer, Cham (2022).

[ BibTeX ] [ URL ]

2021

Borchert, F., Meister, L., Langer, T., Follmann, M., Arnrich, B., Schapranow, M.-P.: Controversial Trials First: Identifying Disagreement Between Clinical Guidelines and New Evidence. AMIA Annual Symposium Proceedings. bll. 237–246. American Medical Informatics Association (2021).

[ BibTeX ] [ URL ]

Borchert, F., Mock, A., Tomczak, A., Hügel, J., Alkarkoukly, S., Knurr, A., Volckmar, A.-L., Stenzinger, A., Schirmacher, P., Debus, J., Jäger, D., Longerich, T., Fröhling, S., Eils, R., Bougatf, N., Sax, U., Schapranow, M.-P.: Knowledge bases and software support for variant interpretation in precision oncology. Briefings in Bioinformatics. 22, (2021).

[ Abstract ] [ BibTeX ] [ URL ]

@article{10.1093/bib/bbab134,
  abstract = {Precision oncology is a rapidly evolving interdisciplinary medical specialty. Comprehensive cancer panels are becoming increasingly available at pathology departments worldwide, creating the urgent need for scalable cancer variant annotation and molecularly informed treatment recommendations. A wealth of mainly academia-driven knowledge bases calls for software tools supporting the multi-step diagnostic process. We derive a comprehensive list of knowledge bases relevant for variant interpretation by a review of existing literature followed by a survey among medical experts from university hospitals in Germany. In addition, we review cancer variant interpretation tools, which integrate multiple knowledge bases. We categorize the knowledge bases along the diagnostic process in precision oncology and analyze programmatic access options as well as the integration of knowledge bases into software tools. The most commonly used knowledge bases provide good programmatic access options and have been integrated into a range of software tools. For the wider set of knowledge bases, access options vary across different parts of the diagnostic process. Programmatic access is limited for information regarding clinical classifications of variants and for therapy recommendations. The main issue for databases used for biological classification of pathogenic variants and pathway context information is the lack of standardized interfaces. There is no single cancer variant interpretation tool that integrates all identified knowledge bases. Specialized tools are available and need to be further developed for different steps in the diagnostic process.},
  author = {Borchert, Florian and Mock, Andreas and Tomczak, Aurelie and Hügel, Jonas and Alkarkoukly, Samer and Knurr, Alexander and Volckmar, Anna-Lena and Stenzinger, Albrecht and Schirmacher, Peter and Debus, Jürgen and Jäger, Dirk and Longerich, Thomas and Fröhling, Stefan and Eils, Roland and Bougatf, Nina and Sax, Ulrich and Schapranow, Matthieu-P},
  journal = {Briefings in Bioinformatics},
  keywords = {HiGHmed myown sys:relevantfor:dhc fb-boettinger},
  month = {05},
  note = {bbab134},
  number = 6,
  title = {Knowledge bases and software support for variant interpretation in precision oncology},
  volume = 22,
  year = 2021
}

Rasheed, A., Borchert, F., Kohlmeyer, L., Henkenjohann, R., Schapranow, M.-P.: A Comparison of Concept Embeddings for German Clinical Corpora. 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). bll. 2314–2321 (2021).

[ Abstract ] [ BibTeX ] [ URL ]

2020

Borchert, F., Lohr, C., Modersohn, L., Hahn, U., Langer, T., Wenzel, G., Follmann, M., Schapranow, M.-P.: "Herr Doktor, verstehen Sie mich?“: Wie lernende Systeme helfen medizinische Fachsprache zu verstehen und welche Rolle klinische Leitlinien dabei spielen. gesundhyte.de: Das Magazin für Digitale Gesundheit in Deutschland. 13, 19–22 (2020).

[ BibTeX ]

Borchert, F., Lohr, C., Modersohn, L., Langer, T., Follmann, M., Sachs, J.P., Hahn, U., Schapranow, M.-P.: GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines. Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis. bll. 38–48. Association for Computational Linguistics, Online (2020).

[ Abstract ] [ BibTeX ] [ URL ]

Florian Borchert, M.Sc.

Research Assistant, PhD Candidate

Research Interests

Projects

Teaching

Awards & Competitions

Publications

Contact

Chair Representative:

Office:

Visiting address:

Termine

07.10.2024 | Digital Health Partnership Workshop and Hackathon - October 2024

27.06.2024 | Digital Health Partnership Information Session

15.06.2024 | HPI/Mount Sinai Partnership Call for Proposals