Available Indexes

Add new comment

Hi Jonathan, It should reduce memory use, but might make performance worse. You probably would want to run some performance comparison tests to be sure the trade-off is worth it. Let me know what you find out.

Our case is probably an extreme one. Even though we have a large index, we have an exceptionally large number of unique terms in proportion to the index size. This is due partially to having 400+ languages, but primarily due to dirty OCR and how it interacts with our filtering/analysis process. (More details in a blog post to follow).

In our case, our performance bottleneck is disk I/O because of the size of our indexes. So if we increase the index divisor by the 16, the potentially 16x longer scan of the tis file may have an impact, but its not observable due to the dominance of disk I/O in our performance. ( We suspect that even 128 * 16 or 2048 entries from the tis file only take one disk seek, and once the data is in memory, the in-memory linear scan is extremely fast compared to the disk I/O required to get the *frq and *prx data into memory. )

If you have a small enough index and/or high enough request rate (qps) so that you are CPU bound rather than I/O bound, you may notice a performance impact. Best way to find out is to run some tests.

BTW: there is also an indexing time setting (termIndexDefaults) that sets the ratio of the tii to the tis file, but I really like the search time setting since we can tweak it without re-indexing. Also the flex branch (trunk/4.x) is much more efficient in handling the tii in-memory data structures. Some more detailed discussion about the trade-offs and various options here:http://lucene.472066.n3.nabble.com/Solr-memory-use-jmap-and-TermInfos-tii-tc1455421.html#a1455421

Tom

You are browsing an archive of the HathiTrust website. In July 2023, we launched a new site at www.hathitrust.org.