Thanks for this; I really enjoyed it!
If someone on the team has a little extra time to scratch another language-related curiosity itch of mine sometime... How does one tokenize and/or stem words in agglutinative languages such as Finnish, Hungarian, and Quechua?
Fascinating!