Prof. Dr. Felix Naumann

Gerardo Vitagliano

Former PhD Student

Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam

Phone: +49 331 5509 427
Room: F-2.05

Email: Gerardo Vitagliano

Personal website

Research: ResearchGate, GitHub

As a Ph.D. student at the Information Systems Group and member of the HPI Research School, my research interests are on data preparation and information extraction.

I am most active in the research projects of the Data Preparation group, currently focusing on benchmarking data loading and embedding the structure of tabular data files.

Feel free to contact me for collaboration, thesis proposals, or anything closely or loosely related to these research interests:

Research Interests

  • Structural Data Preparation
  • Data Pollution
  • Representation Learning
  • Layout Inference in multiregion files


My research projects are all named after famous painters:

  • Pollock: A benchmark for CSV data loading
  • MaGRiTTE: Learning structural embeddings of data files
  • Mondrian: An approach for automatic recognition of layout templates in multiregion files



  • G. Vitagliano, M. Hameed, A. Sierra-Múnera,  F. Naumann: Embedding File Structure for Data Preparation. Under revision.
  • G. Vitagliano, L. Reisener, M. Hameed, L. Jiang, E. Wu, F. Naumann: Pollock: A Data Loading Benchmark. PVLDB, 16(8):1870-1882, 2023. doi: 10.14778/3594512.3594518
  • G. Vitagliano, M. Hameed, F. Naumann: Structural embedding of data files with MaGRiTTE. Table Representation Learning Workshop at NeurIPS (TRL@NIPS), 2022.
  • G. Vitagliano, L. Reisener, L. Jiang, M. Hameed, F. Naumann: Mondrian: Spreadsheet Layout Detection. Proceedings of the International Conference on Management of Data (SIGMOD), 2022
  • M. Hameed, G. Vitagliano, L. Jiang, F. Naumann: SURAGH: Syntactic Pattern Matching to Identify Ill-Formed Records. Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
  • L. Jiang, G. Vitagliano, M. Hameed, F. Naumann: Aggregation Detection in CSV Files. Proceedings of the International Conference on Extending Database Technology (EDBT), 2022
  • G. Vitagliano, L. Jiang, F. Naumann: Detecting Layout Templates in Complex Multiregion Files. PVLDB 15(3):646-658, 2021. doi: 10.14778/3494124.3494145
  • L. Jiang, G. Vitagliano, F. Naumann: Structure Detection in Verbose CSV Files. Proceedings of the International Conference on Extending Database Technology (EDBT), 2021
  • L. Jiang, G. Vitagliano, F. Naumann: A Scoring-based Approach for Data Preparator Suggestion. Lernen, Wissen, Daten, Analysen (LWDA), 2019