= uea-stemmer
Ruby implementation of the UEA-Lite stemmer for conservative stemming in
search and indexing workloads.
UEA-Lite[https://web.archive.org/web/20120728132949/http://www.uea.ac.uk/cmp/research/graphicsvisionspeech/speech/WordStemming]
uses a rule set to normalize suffixes while avoiding aggressive stemming.
== Behavior Notes
The stemmer operates on a single token at a time and returns a stemmed token.
Notable behavior of this implementation:
- possessive apostrophes are removed
- contractions are expanded by default (for example, don't becomes
do not) - tokens beginning with uppercase letters are preserved, and pluralized
acronyms ending in a lowercase s are singularized - pure numbers, and tokens containing hyphens/underscores, are passed through
unchanged
This is a port to Ruby from the Java port of the original Perl script by
Marie-Claire Jenkins and Dr. Dan J. Smith at the University of East Anglia.
== Installation
Install the gem:
gem install uea-stemmer
Install from source:
git clone https://github.com/ealdent/uea-stemmer.git
cd uea-stemmer
bundle install
bundle exec rake test
bundle exec rake install
== Example Usage
Basic usage:
require "uea-stemmer"
stemmer = UEAStemmer.new
stemmer.stem("helpers") # => "helper"
stemmer.stem("dying") # => "die"
stemmer.stem("scarred") # => "scar"
You can extract the matching rule with +stem_with_rule+:
result = stemmer.stem_with_rule("invited")
result.word # => "invite"
result.rule_num # => 22.3
result.rule # => #<UEAStemmer::Rule ...>
Disable contraction expansion:
UEAStemmer.new(nil, nil, skip_contractions: true).stem("don't")
=> "don't"
Use the singleton instance:
DefaultUEAStemmer.instance.stem("running") # => "run"
== Contributing
- Fork the project.
- Make your feature addition or bug fix.
- Add or update tests.
- Run +bundle exec rake test+.
- Send me a pull request. Bonus points for topic branches.
== Relevant Web Pages
- https://web.archive.org/web/20120728132949/http://www.uea.ac.uk/cmp/research/graphicsvisionspeech/speech/WordStemming
- Stemming[https://en.wikipedia.org/wiki/Stemming]
== Copyright
Copyright (c) 2005 by the University of East Anglia and authored by Marie-Claire Jenkins and Dr. Dan J Smith. This port to Ruby was done by Jason Adams using the port to Java by Richard Churchill.
This project is distributed under the Apache 2.0
License[https://www.apache.org/licenses/LICENSE-2.0]. See LICENSE for details.