pick_titles
Program to pick out documents to be saved to the corpus from samediggi.se.
The documents have been fetched using wget.
DocumentPicker
Pick documents from samediggi.se to be added to the corpus.
Source code in /home/anders/projects/CorpusTools/corpustools/pick_titles.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
classify_file(file_)
Identify the language of the file
Source code in /home/anders/projects/CorpusTools/corpustools/pick_titles.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
classify_files()
Iterate through all files, classify them according to language
Source code in /home/anders/projects/CorpusTools/corpustools/pick_titles.py
55 56 57 58 59 60 61 |
|