Papers for Presentation
[1] Banda Ramadan, Peter Christen, Huizhi Liang, and Ross W. Gayler. 2015. Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution. J. Data and Information Quality 6, 4, Article 15 (October 2015), 29 pages. DOI=http://dx.doi.org/10.1145/2816821
[2] Li, Shouheng, Liang, Huizhi (Elly), Ramadan, Banda. (2013). Two Stage Similarity-aware Indexing for Large-scale Real-time Entity Resolution, 146.
[3] Indrajit Bhattacharya, Lise Getoor, and Louis Licamele. 2006. Query-time entity resolution. InProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '06). ACM, New York, NY, USA, 529-534. DOI=http://dx.doi.org/10.1145/1150402.1150463
[4] Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. 2014. Incremental record linkage. Proc. VLDB Endow.7, 9 (May 2014), 697-708. DOI=http://dx.doi.org/10.14778/2732939.2732943
[5] Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang, and Jennifer Widom. 2009. Swoosh: a generic approach to entity resolution. The VLDB Journal 18, 1 (January 2009), 255-276. DOI=http://dx.doi.org/10.1007/s00778-008-0098-x
[6] Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, and Rajeev Motwani. 2003. Robust and efficient fuzzy match for online data cleaning. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data (SIGMOD '03). ACM, New York, NY, USA, 313-324. DOI=http://dx.doi.org/10.1145/872757.872796
[7] Parag Singla and Pedro Domingos. 2006. Entity Resolution with Markov Logic. In Proceedings of the Sixth International Conference on Data Mining (ICDM '06). IEEE Computer Society, Washington, DC, USA, 572-582. DOI=http://dx.doi.org/10.1109/ICDM.2006.65
Further reading
[8] P. Christen, "A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication," in IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 9, pp. 1537-1555, Sept. 2012. doi: 10.1109/TKDE.2011.127 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5887335&isnumber=6247403
Relevant courses
Data Profiling and Data Cleansing (Lecture, Master)