Then I can query across all three fields and give a much higher boost to matches without the transformation/mapping.
I'll have to do some testing to see how much this increases the index size. I'll also want to see how searching across those 3 fields affects performance.
Re:Hanzi variants? and Simplified/Traditional Chinese
Hi Tom,
As far as the Simplified/Traditional Chinese issue this is what I am thinking about:
Solr makes available the ICUTransform mappings both from Simplified to Traditional and Traditional to Simplified (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTransformFilterFactory). Since the transform is likely to be lossy and/or inaccurate, I think what I will probably have to do is have 3 fields:
Then I can query across all three fields and give a much higher boost to matches without the transformation/mapping.
I'll have to do some testing to see how much this increases the index size. I'll also want to see how searching across those 3 fields affects performance.
Tom