GitHunt
EA

ealdent/uea-stemmer

Ruby port of UEALite Stemmer - a conservative stemmer for search and indexing

= uea-stemmer

Ruby implementation of the UEA-Lite stemmer for conservative stemming in
search and indexing workloads.

UEA-Lite[https://web.archive.org/web/20120728132949/http://www.uea.ac.uk/cmp/research/graphicsvisionspeech/speech/WordStemming]
uses a rule set to normalize suffixes while avoiding aggressive stemming.

== Behavior Notes

The stemmer operates on a single token at a time and returns a stemmed token.

Notable behavior of this implementation:

  • possessive apostrophes are removed
  • contractions are expanded by default (for example, don't becomes
    do not)
  • tokens beginning with uppercase letters are preserved, and pluralized
    acronyms ending in a lowercase s are singularized
  • pure numbers, and tokens containing hyphens/underscores, are passed through
    unchanged

This is a port to Ruby from the Java port of the original Perl script by
Marie-Claire Jenkins and Dr. Dan J. Smith at the University of East Anglia.

== Installation

Install the gem:

gem install uea-stemmer

Install from source:

git clone https://github.com/ealdent/uea-stemmer.git
cd uea-stemmer
bundle install
bundle exec rake test
bundle exec rake install

== Example Usage

Basic usage:

require "uea-stemmer"
stemmer = UEAStemmer.new

stemmer.stem("helpers") # => "helper"
stemmer.stem("dying") # => "die"
stemmer.stem("scarred") # => "scar"

You can extract the matching rule with +stem_with_rule+:

result = stemmer.stem_with_rule("invited")
result.word # => "invite"
result.rule_num # => 22.3
result.rule # => #<UEAStemmer::Rule ...>

Disable contraction expansion:

UEAStemmer.new(nil, nil, skip_contractions: true).stem("don't")

=> "don't"

Use the singleton instance:

DefaultUEAStemmer.instance.stem("running") # => "run"

== Contributing

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add or update tests.
  • Run +bundle exec rake test+.
  • Send me a pull request. Bonus points for topic branches.

== Relevant Web Pages

== Copyright

Copyright (c) 2005 by the University of East Anglia and authored by Marie-Claire Jenkins and Dr. Dan J Smith. This port to Ruby was done by Jason Adams using the port to Java by Richard Churchill.

This project is distributed under the Apache 2.0
License[https://www.apache.org/licenses/LICENSE-2.0]. See LICENSE for details.

Contributors

Apache License 2.0
Created July 15, 2009
Updated February 18, 2026
ealdent/uea-stemmer | GitHunt