corpusxmlfile
Classes and functions to sentence align two files.
CorpusXMLFile
A class to handle all the info of a corpus xml file.
Source code in /home/anders/projects/CorpusTools/corpustools/corpusxmlfile.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
lang
property
Get the lang of the file.
ocr
property
Check if the ocr element exists.
:returns: the ocr element or None
translated_from
property
Get the translated_from element from the orig doc.
word_count
property
Return the word count of the file.
__init__(name)
Initialise the CorpusXMLFile class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
path to the xml file. |
required |
Source code in /home/anders/projects/CorpusTools/corpustools/corpusxmlfile.py
30 31 32 33 34 35 36 37 38 39 |
|
move_later()
Move the later elements to the end of the body element.
Source code in /home/anders/projects/CorpusTools/corpustools/corpusxmlfile.py
101 102 103 104 105 106 107 108 |
|
remove_skip()
Remove the skip element.
This contains text that is not wanted in e.g. sentence alignment
Source code in /home/anders/projects/CorpusTools/corpustools/corpusxmlfile.py
91 92 93 94 95 96 97 98 99 |
|
remove_version()
Remove the version element.
This is often the only difference between the otherwise identical files in converted and prestable/converted
Source code in /home/anders/projects/CorpusTools/corpustools/corpusxmlfile.py
82 83 84 85 86 87 88 89 |
|
sanity_check()
Check if the file really is a corpus xml file.
Source code in /home/anders/projects/CorpusTools/corpustools/corpusxmlfile.py
41 42 43 44 45 46 47 48 49 50 |
|
set_body(new_body)
Replace the body element with new_body element.
Source code in /home/anders/projects/CorpusTools/corpustools/corpusxmlfile.py
110 111 112 113 114 |
|
write(file_name=None)
Write self.etree.
Source code in /home/anders/projects/CorpusTools/corpustools/corpusxmlfile.py
116 117 118 119 120 121 122 123 |
|