CORA Dataset
Description
This dataset includes bibliographical information about scientific papers. It provides 1878 objects.
Special thanks to Andrew McCallum, who permits us to use and modify the data.
Corresponding Data
- Dataset
- Cora dataset in XML format.
- Also available in tab separated value (TSV) format. (1,879 objects - TSV format)
- Same version, with lower-cased values and with special character removed. (1,879 objects - TSV format)
- Duplicates
- A list of all provided duplicates. (64,578 objects - TSV format)
- Non-duplicates
- We generate non-duplicate pairs by following a systematic approach. (179,125 objects - TSV format)
- Using an updated, further simplified approach across datasets. (268,082 objects - TSV format)
- We generate non-duplicate pairs by following a systematic approach. (179,125 objects - TSV format)