Available Indexes

Add new comment

Thanks for the feedback David,

Sorry I didn't explain the algorithm very well. I should have probably put a code snippet in my previous response. We aren't splitting on punctuation, we are constructing tokens where white space replaces punctuation.
Examples:
"l'art"=>"l art"
"can't"=>"can t".

Our problem was that the WDF was splitting on punctuation and therefore making "l'art" into two tokens which resulted in a phrase query for the token "l" followed by the token "art". Our filter would just make it a query (boolean clause) for the token "l art" instead of the token "l'art"

Tom

You are browsing an archive of the HathiTrust website. In July 2023, we launched a new site at www.hathitrust.org.