Permalink Submitted by Tom Burton-West on January 21, 2010
Hi Gabrial,
The relevant portion of the schema.xml is here: http://www.hathitrust.org/node/181. I added a link to the list of 400 words to the post, but its also available here: http://www.hathitrust.org/node/180 .Please note that the list is tailored to our content as explained in the post.
After we initially ported the CommonGrams filter from Nutch to Solr, several people contributed extensive improvements and it was committed and became an official part of Solr in September 2009. It now comes with Solr 1.4. If you are planning to use CommonGrams, you should consider the recent patch by one of the Solr committers, Robert Muir who did extensive work to make it work with the new Lucene TokenStream API and to make the underlying code more efficient. The patch is available at https://issues.apache.org/jira/browse/SOLR-1657
Re: Schema and stopwords.txt
Hi Gabrial,
The relevant portion of the schema.xml is here: http://www.hathitrust.org/node/181. I added a link to the list of 400 words to the post, but its also available here: http://www.hathitrust.org/node/180 .Please note that the list is tailored to our content as explained in the post.
After we initially ported the CommonGrams filter from Nutch to Solr, several people contributed extensive improvements and it was committed and became an official part of Solr in September 2009. It now comes with Solr 1.4. If you are planning to use CommonGrams, you should consider the recent patch by one of the Solr committers, Robert Muir who did extensive work to make it work with the new Lucene TokenStream API and to make the underlying code more efficient. The patch is available at https://issues.apache.org/jira/browse/SOLR-1657
Tom