Skip to content

korp_config_templates

This module keeps templates of the contents of various files in the korp-backend corpus config folder. The folder looks like this:

config/
  attributes/
    structural/                one yaml file per structural attribute
      text_date.yaml           things that are common to a text. date,
      text_title.yaml          lang, author, title, etc...
      ...
    positional/                one yaml file per positional attribute
      lemma.yaml               properties of each word.
      pos.yaml
      ...
  corpora/
    LANG_CATEOGRY_DATE.yaml       one file per CATEGORY
    ...
  modes/                         the modes, shown in the top right on
    default.yaml                 the korp frontend webpage
    ...maybe-other.yaml
    ...

For most of our corpora, the structural and positional attributes are the same, and therefore, we have a template of this file tree, in

korp_config_template/

The compile_cwb_mono.py script will copy this template tree to

<workdir>/korp_configs/LANG/

and then fill out with how much it can. All corpora/ files should be filled in, as well as modes/default.yaml. Anything else will have to be manually adjusted, and of course, before production, the contents should be manually inspected.