Grading formula

where i is the current subtask, t_max,i is maximum runtime (over all teams) for this subtask, t_i individual team's runtime for this subtask, w_i is the weight of this subtask. If your algorithm solves n subtasks in one run, the total runtime is split up evenly into n different t_i. This may result in longer runtimes for one subtask compared to other teams, but should yield better results for other subtasks and a higher score overall. If it doesn't, your code should be revised...
|
Requirement | Description | Weight (the lower the easier) | Comment |
Count Triples | - count the number of triples (or rather quadruples, as data is in NQuad format)
| 1
| easy peasy |
Cluster Datasets | - identify datasets by URIs, per dataset:
- identify dataset location (e.g. "http://dbpedia.org/")
- identify good dataset sample resource (e.g. "http://dbpedia.org/Berlin")
- identify URI regular expression patterns
- suggest a textual description
| 3
| clustering + text extraction + regexp detection |
Cluster Datasets II | - identify datasets by different means than solely using the URI, per dataset:
- identify good dataset sample resource (e.g. "http://dbpedia.org/Berlin")
- identify URI regular expression patterns
- suggest a textual description
| up to 4, depending on "Wow"-Factor
| clustering + text extraction + regexp detection |
Identify vocabularies | - identify RDF namespace of all predicates (e.g. "http://xmlns.com/foaf/0.1")
| 2
| clustering of namespaces (similar to numberOfDistinctPredicates?) + regexp detection ("http://xmlns.com/foaf/0.1/name" => "http://xmlns.com/foaf/0.1" etc.) |
Compute RDF statistics | - identify numberOfDistinctSubjects, numberOfDistinctObjects, numberOfDistinctPredicates, numberOfDistinctContexts, numberOfResources
| 2
| all similar to WordCount (except for when blank nodes occur) |
Detect Linksets | - identify linksets among datasets
- count number of links within a linkset
| 2
| once you have datasets, this is easy (same principle applied to precomputed sets) |
Detect similar subjects/contexts (subjects and contexts are mostly identical) | - for two subject/context combinations
and  find all quadruples
and  where and  but and  - count the number k of identical
and pairs (number of aforementioned quadruple combinations) and derive k-similarity of subjects/contexts (where k is number of pairs: the higher k is the more similar subjects/contexts are) - detect subjects/contexts which are at least 1-similar but which are not directly referenced, for example by
 - for k > 0 are there any k-similar
and  where or ?
| 5 | note that contexts are identical to subjects in most of the quadruples, but not all (cf. last subtask) |