You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Phuntsok Drak-pa 1cf39c38f5 added cache for CI 3 years ago
src removed debug println 3 years ago
.gitignore neat output for now 3 years ago
.gitlab-ci.yml added cache for CI 3 years ago
Cargo.lock neat output for now 3 years ago
Cargo.toml neat output for now 3 years ago
LICENSE added licence 3 years ago
README.org Added README 3 years ago
matter-dict.csv Nouns now fully handled 3 years ago
matter.txt Nouns now fully handled 3 years ago

README.org

Simple Mattér Parser

file:<img src=" title="file:https://cdn.rawgit.com/syl20bnr/spacemacs/442d025779da2f62fc86c2082703697714db6514/assets/spacemacs-badge.svg" />

Simple Mattér parser

Simple Mattér Parser, or SMP for short is exactly what you think it is: a simple parser for Mattér. But you might be wondering what Mattér is: it is a constructed language I am working on, inspired by Nordic languages, especially Old Norse and Icelandic.

What it does

SMP will first load a gloss dictionary from a csv file named matter-dict.csv. This file contains two columns:

  1. the clitics of the language, or if you prefer the roots of its words

  2. the linguistic gloss of these roots

Based on this, the program will then try to detect which words appear in an input text fed to the program as its firt (and only) argument. If several clitics can be detected in a single word, the longest one will be used by the program. It then separates for each word the root with its suffixes which will be analized, detecting the number suffix of the word, the possessive suffix and its declension. If any of these is detected, it will be added to the word’s gloss.

Verbs can also be analized to a certain extent: prefixes are analized separately, and irregular words are marked as being unknown to the program.

Usage example

To compile this program, you will need Rust >= 1.30 with cargo. To compile and run this program, you can execute the following command in your shell:

  cargo run --release -- matter.txt

matter.txt being any text file you want containing some Mattér text. This will generate a file, output.xml, which will contain the result parse of the text and its gloss.

Example

As an example, the following text will be fed to the program:

Em meþ Gunnarac annéðant þynea. An ænant caupage, ar annéð caupe. Fe en eppelant etano Éþtrið fent etano? Þror eppelant feþ geffo? Du feċ gei? Hint fec fém gér? Fon landytoċ beþt bƿand? Feren Mattérant frégei? Ferve Mattérant frégei? Eppeleþant eða cirþabérant, fertið y caupei? Ferden urbyþ gon? Fertið bryðdegdynant haþt? Fertiðoċ Mattérant frégei? Fertiðac y ċilde?

And here is the result of the parse on this text:

  <?xml version="1.0" encoding="utf-8"?>
  <text>
    <sentence>
      <word text="em" morpheme="em" gloss="art.dem.sg" />
      <word text="meþ" morpheme="meþ" gloss="n" />
      <word text="gunnarac" morpheme="gunnar" gloss="np-ABL" />
      <word text="annéðant" morpheme="annéð" gloss="adj-ACC" />
      <word text="þynea" morpheme="þyn" gloss="vt-3sg.imperf" />
    </sentence>
    <sentence>
      <word text="an" morpheme="an" gloss="art.dem.sg.near" />
      <word text="ænant" morpheme="æn" gloss="nbr-ACC" />
      <word text="caupage" morpheme="caup" gloss="vt-sg.IMPER" />
      <word text="ar" morpheme="ar" gloss="conj" />
      <word text="annéð" morpheme="annéð" gloss="adj" />
      <word text="caupe" morpheme="caup" gloss="vt-1sg.imperf" />
    </sentence>
    <sentence>
      <word text="fe" morpheme="fe" gloss="pron.q.nom" />
      <word text="en" morpheme="en" gloss="art.def.sg.nhum" />
      <word text="eppelant" morpheme="eppel" gloss="n-ACC" />
      <word text="etano" morpheme="et" gloss="vt" />
      <word text="éþtrið" morpheme="éþtrið" gloss="np" />
      <word text="fent" morpheme="fent" gloss="pron.q.acc" />
      <word text="etano" morpheme="et" gloss="vt" />
    </sentence>
    <sentence>
      <word text="þror" morpheme="þror" gloss="np" />
      <word text="eppelant" morpheme="eppel" gloss="n-ACC" />
      <word text="feþ" morpheme="feþ" gloss="pron.q.dat" />
      <word text="geffo" morpheme="geff" gloss="vt-1+3sg.perf" />
    </sentence>
    <sentence>
      <word text="du" morpheme="du" gloss="pron.2sg.nom" />
      <word text="feċ" morpheme="feċ" gloss="pron.q.loc" />
      <word text="gei" morpheme="g" gloss="vi-2sg.imperf" />
    </sentence>
    <sentence>
      <word text="hint" morpheme="hint" gloss="pron.3sg.n.acc" />
      <word text="fec" morpheme="fec" gloss="pron.q.abl" />
      <word text="fém" morpheme="fém" gloss="pron.q.limit" />
      <word text="gér" morpheme="gér" gloss="vt" />
    </sentence>
    <sentence>
      <word text="fon" morpheme="fon" gloss="pron.q.gen" />
      <word text="landytoċ" morpheme="landyt" gloss="n-LOC" />
      <word text="beþt" morpheme="beþt" gloss="unknown" />
      <word text="bƿand" morpheme="bƿ" gloss="vi-part.prog" />
    </sentence>
    <sentence>
      <word text="feren" morpheme="feren" gloss="pron.q.goal" />
      <word text="mattérant" morpheme="mattér" gloss="np-ACC" />
      <word text="frégei" morpheme="frég" gloss="vt-2sg.imperf" />
    </sentence>
    <sentence>
      <word text="ferve" morpheme="ferve" gloss="pron.q.motivation" />
      <word text="mattérant" morpheme="mattér" gloss="np-ACC" />
      <word text="frégei" morpheme="frég" gloss="vt-2sg.imperf" />
    </sentence>
    <sentence>
      <word text="eppeleþant" morpheme="eppel" gloss="n-pl-ACC" />
      <word text="eða" morpheme="eða" gloss="adv" />
      <word text="cirþabérant" morpheme="cirþabér" gloss="n-ACC" />
      <word text="fertið" morpheme="fertið" gloss="pron.q.loc.temp" />
      <word text="y" morpheme="y" gloss="aux.fut" />
      <word text="caupei" morpheme="caup" gloss="vt-2sg.imperf" />
    </sentence>
    <sentence>
      <word text="ferden" morpheme="ferden" gloss="pron.q.instr" />
      <word text="urbyþ" morpheme="urby" gloss="n-pl" />
      <word text="gon" morpheme="g" gloss="vi-3pl.perf" />
    </sentence>
    <sentence>
      <word text="fertið" morpheme="fertið" gloss="pron.q.loc.temp" />
      <word text="bryðdegdynant" morpheme="bryðdeg" gloss="n-POSS.2sg-ACC" />
      <word text="haþt" morpheme="haþt" gloss="unknown" />
    </sentence>
    <sentence>
      <word text="fertiðoċ" morpheme="fertiðoċ" gloss="pron.q.abl.temp" />
      <word text="mattérant" morpheme="mattér" gloss="np-ACC" />
      <word text="frégei" morpheme="frég" gloss="vt-2sg.imperf" />
    </sentence>
    <sentence>
      <word text="fertiðac" morpheme="fertiðac" gloss="pron.q.limit.temp" />
      <word text="y" morpheme="y" gloss="aux.fut" />
      <word text="ċilde" morpheme="ċild" gloss="vi-1sg.imperf" />
    </sentence>
  </text>