Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
Login
 

Integrating knowledge and graph-based strategies for text

Margarita Bugueño Pérez

Artificial Intelligence and Intelligent Systems 
Main supervisor: Prof. Dr. Gerard de Melo
Secondary supervisor: Prof. Dr. Bert Arnrich


Office: G-3.2.08
Tel.: +49 331 5509 3454   
Email: margarita.bugueno(at)hpi.de 
Languages: Spanish (mother tongue) | English (fluent) | Chinese (beginner)   
Links: Google Scholar | Github | ResearchGate
Research interests: Learning on Graphs | Explainable Artificial Intelligence (XAI) | Unbalanced data | Astroinformatics

 

Overview

Natural Language Processing (NLP) involves applying algorithms to face natural language in a rule-based approach. To extract the meaning associated with each sentence, NLP transforms the language data into a structure that computers can understand. However, it is considered a challenge not only by the nature of the language but because each language has a unique set of morphological and syntax rules and colloquial terms that could affect communication.

Understanding human language requires understanding both the words and the way concepts are connected. For this reason, it is common to use different techniques to handle the various challenges, each establishing learning restrictions. 

 

    Research Problem

    Natural language processing models have been increasingly deployed in real-world applications for different tasks. Recently, given that Graph Neural Networks (GNNs) have proven to handle complex structures well and preserve global information, several researchers have explored the application of these techniques for text classification by proposing an alternative to traditional feature representation models, such as vector representation, which most of the time fail to map the full richness of the text. 

    To date, a number of graph-based models for text representation, document summarization as well as question-answering have been proposed and have provided a considerable boost in those tasks. However, most of the strategies were each proposed for a specific domain and validated on very specific data collections with particular textual features, making it difficult to compare and extend them to new scenarios. Furthermore, the models proposed to date generally base their graph construction method on the co-occurrence of terms, leaving aside critical factors such as syntax and co-reference.

    All this reflects that the graph-based text representation area requires further study and in-depth exploration.

     

      Research Projects


      Document Graphs for Explainable AI - Master Project Summer Semester 2022 


      • As Natural Language Processing models become increasingly accurate, they are employed in multiple use cases across many fields. Meanwhile, the complexity of models is increasing, making it harder for humans to comprehend the model’s decisions. As many tasks require a deeper understanding of why a specific prediction was made, advances in explainability are needed since current approaches mainly focus on measuring the impact of individual parts of the text. Given that graphs naturally capture complex relations in the input data, the project objective is to study how document graphs can be used to explain models as global and local explanations by analyzing the model’s learning and single predictions, respectively. 
      • Additionally, the application of document graphs for a more traceable and explainable classification using a GNN model is studied by implementing two different graphing approaches: TextGCN, a heterogeneous word-document graph based on TF-IDF and word co-occurrence, and a KNN graph based on node embedding distance in vector space. 

       


      Graph-based text representation for document summarization - Master thesis co-supervision


      • Graph-based document summarization is a growing research area, as classical sequential models often fail when faced with long documents. By contrast, graph models do not have this limitation and have demonstrated state-of-the-art performance for specific settings. Such strategies usually follow an intuitive construction method incorporating multiple semantic units (for example, sentences, words, n-grams) as nodes in order to represent the existing complex associations among them and thus improve performance. Then, it is possible to include well-known graph explainer models to recognize the most and least influential graph components for the final summary generated.

       

      Academic CV

      2022 - presentPh.D student at the chair for Artificial Intelligence and Intelligent Systems at Hasso Plattner Institute in Potsdam, Germany
      2018 - 2020Master of Science in Informatics Engineering at Federico Santa María Technical University in Valparaíso, Chile
      2013 - 2020Informatics Engineering at Federico Santa María Technical University in Santiago, Chile
      2013 - 2017Bachelor of Science in Informatics Engineering at Federico Santa María Technical University in Santiago, Chile

       

      Peer-reviewed Publications


      Journal articles


      • Margarita Bugueño, Gabriel Molina, Francisco Mena, Patricio Olivares, Mauricio Araya (2021). Harnessing the power of CNNs for unevenly-sampled light-curves using Markov Transition Field. Astronomy and Computing. 10.1016/j.ascom.2021.100461
      • Francisco Mena, Patricio Olivares, Margarita Bugueño, Gabriel Molina, Mauricio Araya (2021). On the Quality of Deep Representations for Kepler Light Curves Using Variational Auto-Encoders. Signals, MDPI. 10.3390/signals2040042
      • Margarita Bugueño, Marcelo Mendoza (2020). Learning to combine classifiers outputs with the transformer for text classification. Intelligent Data Analysis, IOS Press. 10.3233/IDA-200007
      • Margarita Bugueño & Francisco Mena, Mauricio Araya (2019). Classical machine learning techniques in the search of extrasolar planets. CLEI Electronic Journal. 10.19153/cleiej.22.3.3

      Conference Proceedings


      • Gabriel Molina, Francisco Mena, Margarita Bugueño, Mauricio Solar (2020). Can we interpret machine learning? An analysis of exoplanet detection problem. Astronomical Data Analysis Software and Systems (ADASS), ASPCS [link]
      • Margarita Bugueño, Marcelo Mendoza (2019). Learning to Detect Online Harassment on Twitter with the Transformer. Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), Springer. 10.1007/978-3-030-43887-6_23
      • Margarita Bugueño, Marcelo Mendoza (2019). Applying Self-attention for Stance Classification. Ibero-American Congress on Pattern Recognition (CIARP), Springer. 10.1007/978-3-030-33904-3_5
      • Margarita Bugueño, Gabriel Sepúlveda, Marcelo Mendoza (2019). An empirical analysis of rumor detection on microblogs with recurrent neural networks. International Conference on Human-Computer Interaction (HCII), Springer. 10.1007/978-3-030-21902-4_21
      • Margarita Bugueno, Francisco Mena, Mauricio Araya (2018). Refining exoplanet detection using supervised learning and feature engineering. Latin American Computer Conference (CLEI), IEEE. 10.1109/CLEI.2018.00041
      • Humberto Farias, Daniel Ortiz, Camilo Ñuñez, Mauricio Solar, Margarita Bugueno (2018). ChiVOLabs: cloud service that offer interactive environment for reprocessing astronomical data. Software and Cyber infrastructure for Astronomy. 10.1117/12.2313304