This page describes how the scores of reconciliation candidates are computed.
The scoring mechanism used in this reconciliation service can change, the specifics of its computation should not be relied on by users. Instead, we recommend that individual scoring features are used instead.
Global matching formula
The score of each candidate is obtained as a weighted sum of the scores of individual features. It ranges from 0 to 100. When no candidates can be found matching the target type, candidates of wrong or no types are also returned, with their score divided by two.
For each supplied property, all query values are matched against reference values and the maximum matching score of all pairs is used as the similarity score for this property.
Two names (such as an item label and a query) are matched by token-based fuzzy matching.
Values of properties which hold identifiers are matched to the queries using exact string equality (100 score if the strings are equal, 0 otherwise).
Geographical coordinate matching
Geographical coordinates are expected to be supplied in lat,long format (such as 53.3175,-4.6204). The matching score peaks at 100 when the position is exactly the same and decreases linearly as the distance between the points increase, reaching 0 when the points are 1 km apart.
The precision of Wikibase dates is taken into account when matching them against strings. Query dates are expected to be supplied in ISO format (YYYY-MM-DD) and will match the Wikibase date perfectly if they fall into the range described by the precision. It is also possible to supply query dates in YYYY-MM or YYYY format.
Integer quantities are matched (score 100) if they are equal, and have a 0 score otherwise. For floating-point numbers, the score peaks at 100 for exact equality and follows otherwise this formula:
URLs are canonicalized before being matched. Differences in scheme (HTTPS vs HTTP) are ignored.