Solr slovenian lemmatizer updated with easier installation

Jernej Virag

December 15, 2013

I've just uploaded 1.1 update for Lemmagen lemmatizer for Solr, which is now a pure Java .JAR library and does not require installation of any additional files on your server. New version also updates package name and configuration attribute to be more consistent.

Installation

1. Download library

Download the library JAR from BitBucket: Lemmatizer

2. Add library to Solr's Java path

Copy library JAR to your application's server lib dir or copy it to your core's lib dir. E.g. if your core is located in /var/solr/core, create a lib folder next to conf and data folders of the core and copy the lemmatizer_solr_1.1.jar there.

3. Add lemmatizer to schema.xml

Add lemmatizer filter to your Solr schema and pass desired language to it:

   <filter class="si.virag.solr.LemmagenLemmatizerFactory" language="slovenian" />

That's it. Suppored languages are: english, french, estonian, bulgarian, czech, slovakian, slovenian, serbian, russian, romanian, hungarian, macedonian and polish.

This version is based on Michal Hlaváč‘s excellent jLemmaGen Java port of Lemmagen library. It was tested with Solr 4.3 and newer.