Skip to content

add_files_to_corpus

add_files_to_corpus = corpustools.adder:main"

The complete help text from the program is as follows:

usage: add_files_to_corpus [-h] [-v] [-p PARALLEL_FILE] [-l LANG]
                           [-d DIRECTORY]
                           origs [origs ...]

Add file(s) to a corpus directory. The filenames are converted to ascii only
names. Metadata files containing the original name, the main language, the
genre and possibly parallel files are also made. The files are added to the
working copy.

positional arguments:
  origs                 The original files, urls or directories where the
                        original files reside (not the corpus repo)

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

parallel:
  -p PARALLEL_FILE, --parallel PARALLEL_FILE
                        Path to an existing file in the corpus that will be
                        parallel to the orig that is about to be added
  -l LANG, --lang LANG  Language of the file to be added

no_parallel:
  -d DIRECTORY, --directory DIRECTORY
                        The directory where the origs should be placed

Examples: Download and add parallel files from the net to the corpus:

cd $GTFREE

Adding the first file

The command

add_files_to_corpus -d orig/sme/admin/sd/other_files http://www.samediggi.no/content/download/5407/50892/version/2/file/Sametingets+%C3%A5rsmelding+2013+-+nordsamisk.pdf

Gives the message:

Added orig/sme/admin/sd/other_files/sametingets_ay-rsmelding_2013_-_nordsamisk.pdf

Adding the parallel file

add_files_to_corpus -p orig/sme/admin/sd/other_files/sametingets_ay-rsmelding_2013_-_nordsamisk.pdf -l nob  http://www.samediggi.no/content/download/5406/50888/version/2/file/Sametingets+%C3%A5rsmelding+2013+-+norsk.pdf

Gives the message:

Added orig/nob/admin/sd/other_files/sametingets_ay-rsmelding_2013_-_norsk.pdf

After this is done, you will have to commit the files to the working copy, like this:

git commit