BMI local Implementation

Installation

git clone https://github.com/HTAustin/CAL.git
Intall Sofia-ML package: https://code.google.com/archive/p/sofia-ml/
Make the kisssdb indexer. cd CAL && make
Change the path for Sofia-ML in doAll_Baseline

SOFIA="/the/path/to/sofia-ml-read-only/src/sofia-ml"

Usage

Run CAL Auto TAR: bash doAll_Baseline
Configure behaviour through environment variables

MODE            - default tfidf. Valid values: 4gram, tfidf
MAXTHREADS      - number of threads. Default: 4
SOFIA           - path to sofia ml binary. Default: ./sofia-ml/src/sofia-ml
CORP            - corpus to use. Default: oldreut
CACHE           - if set, enable caching of corpus specific pre computations. Default not set.

eg.
$ MODE=4gram MAXTHREADS=16 SOFIA=/home/nghelani/sofia-ml/sofia-ml CORP=aquaint bash doAll_Baseline

Important files assumed by the script

Corpus/<CORP>.tgz                       - Corpus
judgement/<CORP>.topic.stemming.txt     - Topics separated by newline (each line is "<topic_id>:<query>")
judgement/qrels.<JUDGECLASS>.list       - Relevance judgements for topics (each line is "<topic> 0 <doc> <score>")

The output of BMI are stored in result/ folder.
The gain curve can be plotted by analyzing result/baseline/<corp>/<topic>/<topic>.record.list
Plot gain curves with gainCurve.py (see python2 gainCurve.py -h)

Speedup Tips

Comment out the ./dofast line if you already completed fine the last time
If using qrels for assessment, consider quitting the iterations when you have found the desired number of relevant documents (See the sample snippet)

    NUM_REL=$(cat rel.$TOPIC.fil | sort | uniq | wc -l)
    TOT_REL=$(grep "^$TOPIC.*[1-9]$" ../judgement/qrels.$JUDGECLASS.list | cut -d' ' -f3 | sort | uniq | wc -l)
    if [ $NUM_REL -eq $TOT_REL ]; then
        break
    fi

Lower the number of iterations. The default number of iterations (=100) might be too high for your purpose.

Contribute

Please feel free to open issues and report bugs.

nims11/CAL

BMI local Implementation

Installation

Usage

Speedup Tips

Contribute

License

On this page

Contributors