Phase1
Phase 1: multiple identifiers for a single record
(Note that this builds on work done in Phase 0)
Phase 1: Multiple Identifiers for a single record
Often, you'll have at your disposal several supposedly-unique identifiers; with the phase 1 code, you can send them all and get a set of scored records in the response.
Phase 1 Input Example
We start with a query that could easily have come from an OPAC web page (see below for all possible input parameters).
http://mirlyn.lib.umich.edu/cgi-bin/sdrsmd?id=1&oclc=6861637&lccn=80024367&
isbn=0060404531&isbn=9780060404536
Here, we throw everything we know about this record -- oclc, lccn, and both the 10- and 13-character ISBNs -- at the srdsmd. What we get is a record with a score:
{ "error" : null, "id" : "1", "result" : { "1" : [ { "oclc" : [ "6861637" ], "lccn" : [ "80024367" ], "sdr" : { "rights" : "searchonly", "handle" : "mdp.39015000000482", "mburl" : "http://hdl.handle.net/2027/mdp.39015000000482" }, "isbn" : [ "0060404531", "9780060404536" ], "score" : 225, "matchPercentage" : 100, "matchedItems" : 4 } ] } }
In addition to all the other information we know and love from Phase 0, we get three more items:
- score is the total score, as explained below in the Scoring section.
- matchedItems is the total number of items matched (in this case, one oclc number, one lccn, and two isbn's).
- matchPercentage notes how many of the data you sent match this record -- in this case, a perfect 4/4 for a percentage of 100.
Phase 1 Input -- contradictory input data
What if we got the wrong lccn? And, by some really bad luck, it's actually a valid lccn in the system?
{ "error" : null, "id" : "1", "result" : { "1" : [ { "oclc" : [ "6861637" ], "lccn" : [ "80024367" ], "sdr" : { "rights" : "searchonly", "handle" : "mdp.39015000000482", "mburl" : "http://hdl.handle.net/2027/mdp.39015000000482" }, "isbn" : [ "0060404531", "9780060404536" ], "score" : 150, "matchedItems" : 3, "matchPercentage" : "75" }, { "oclc" : [ "4667523" ], "lccn" : [ "77906307" ], "sdr" : { "rights" : "searchonly", "handle" : "mdp.39015000000490", "mburl" : "http://hdl.handle.net/2027/mdp.39015000000490" }, "score" : 75, "matchedItems" : 1, "matchPercentage" : "25" } ] } }
Here we get two records, pre-sorted by the server based on score, then (if necessary) by matchedItems.
The first has both a higher score and a higher number (and thus percentage) of matched items, and is therefore considered by the serve to be the "best" match. The second has one good match -- lccn -- and is included just in case.
It's up to the client to determine what threshold should represent the "worst still-usable" data. The server will always return all matches.
Input values and Scoring
The scoring process is essentially completely arbitrary at this point -- any feedback would be much appreciated.
Index | Score | Example | Description |
handle | 100 | mdp.39015000000482 | MDP Handle |
oclc | 100 | 4667523 | OCLC Number |
sdr | 100 | wu1000063 | SDR Member organization submitted code |
lccn | 75 | 80024367 | Library of Congress Control Number |
isbn | 25 | 0060404531 | 10 or 13 character ISBN |
issn | 25 | 10000453 | ISSN |