AL
alexeyev/apertium2ud
tag parser and converter between the two tagsets: Apertium (enhanced Leipzig?) and the one used in UD
apertium2ud
Obtaining the mapping between the two tagsets based
on the information from Apertium Wiki.
Loosely based on this code,
hence the GPLv3 license.
To install, run
python -m pip install apertium2udThe latest uploaded version is 0.0.8.
NB!
- The instrument is far from being perfect.
- It was originally developed for working with
apertium-kir, i.e. with Kyrgyz language. - The latest version from PyPI is equipped with the apertium-kir
.udxfile rules. For other languages, you may need to make some updates.
To build the machine-readable mapping, run
python apertium_wiki_parser.pyApertium to Universal tags
>>> from apertium2ud.convert import a2ud
>>> tags = ["n", "pl", "acc"]
>>> a2ud(tags)
(['NOUN'], ['Number=Plur', 'Case=Acc'])
>>> tags_sophisticated = ["v", "tv", "ger", "nom", "cop", "aor", "p3", "pl"]
>>> a2ud(tags_sophisticated)
(['VERB', 'AUX'], ['Subcat=Tran', 'VerbForm=Vnoun', 'Case=Nom', 'Tense=Past', 'Person=3', 'Number=Plur'])
Universal tags to Apertium
So far the conversion is far from perfect
Кыз NOUN {'Number[psor]=Sing', 'Number=Sing', 'Case=Nom', 'Person[psor]=3', 'Person=3'} ->
<px3sg><n><subj?nom?><sg><p3><px3sp>
досуна NOUN {'Number[psor]=Sing', 'Number=Sing', 'Person[psor]=3', 'Case=Dat', 'Person=3'} ->
<px3sg><n><sg><dat><p3><px3sp>
кат NOUN {'Case=Nom', 'Person=3', 'Number=Sing'} ->
<n><subj?nom?><sg><p3>
жазган VERB {'Aspect=Perf', 'Polarity=Pos', 'Number=Sing', 'Tense=Past', 'Person=3', 'Evident=Fh'} ->
<past3p><vblex?v?vbmod?><sg><aff><aor?past?pret?><perf><p3>
. PUNCT set() ->
<sent?apos?percent?clb?punct?>
TODO
- Should sections
chunksand XML tags be added? No. - Tests: Apertium -> UD -> Apertium, UD -> Apertium -> UD (sometimes losses are inevitable)
- Add the possibility to add the rules based on a
.udxfile, which usually describes custom tags
How to cite
Greatly appreciated, if you use this work.
@misc{apertium2ud2023alekseev,
title = {{alexeyev/apertium2ud: mapping tagsets}},
year = {2023},
url = {https://github.com/alexeyev/apertium2ud}
}
On this page
Languages
Python97.7%Shell2.3%
Contributors
GNU General Public License v3.0
Created May 19, 2023
Updated March 10, 2025