Permalink Submitted by rcmuir (not verified) on September 27, 2009
If you use this technique, I think you should try setting DefaultSimilarity.setDiscountOverlaps(true).
I did some tests which showed that if you use commongrams, it will punish relevance somewhat, because these injected tokens adversely influence lengthNorm.
if you discount these tokens with positionIncrement=0 by setting that parameter, then this problem goes away.
commongrams versus stopwords