Available Indexes

Add new comment

Hi Tom,

As far as the Simplified/Traditional Chinese issue this is what I am thinking about:

Solr makes available the ICUTransform mappings both from Simplified to Traditional and Traditional to Simplified (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTransformFilterFactory). Since the transform is likely to be lossy and/or inaccurate, I think what I will probably have to do is have 3 fields:

  1. Input data without a transform
  2. Data mapped from Simplified to Traditional
  3. Data mapped from Traditional to Simplified

Then I can query across all three fields and give a much higher boost to matches without the transformation/mapping.

I'll have to do some testing to see how much this increases the index size. I'll also want to see how searching across those 3 fields affects performance.

Tom

You are browsing an archive of the HathiTrust website. In July 2023, we launched a new site at www.hathitrust.org.