util
Utility functions and classes used by other modules in CorpusTools.
ArgumentError
Bases: Exception
This exception is raised when argument errors occur.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
44 45 |
|
ConversionError
Bases: Exception
Raise this exception when conversions error occur.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
48 49 |
|
ExecutableMissingError
Bases: Exception
This exception is raised when wanted executables are missing.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
40 41 |
|
ExternalCommandRunner
Class to run external command through subprocess.
Attributes:
Name | Type | Description |
---|---|---|
stdout |
save the stdout of the command here. |
|
stderr |
save the stderr of the command here. |
|
returncode |
save the returncode of the command here. |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 |
|
__init__()
Initialise the ExternalCommandRunner class.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
301 302 303 304 305 |
|
run(command, cwd=None, to_stdin=None)
Run the command, save the result.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 |
|
SetupError
Bases: Exception
This exception is raised when setup is faulty.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
36 37 |
|
basename_noext(fname, ext)
Get the basename without the extension.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fname |
str
|
path to the file. |
required |
ext |
str
|
the extension that should be removed. |
required |
Returns:
Type | Description |
---|---|
str
|
fname without the extension. |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
65 66 67 68 69 70 71 72 73 74 75 |
|
executable_in_path(program)
Check if program is in path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
program |
str
|
name of the program |
required |
Returns:
Type | Description |
---|---|
bool
|
True if program is found, False otherwise. |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
get_lang_resource(lang, resource, fallback=None)
Get a language resource.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lang |
str
|
the language of the resource. |
required |
resource |
str
|
the resource that is needed. |
required |
fallback |
str or None
|
the fallback resource. Default is None. |
None
|
Returns:
Type | Description |
---|---|
str
|
path to the resource or fallback. |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|
get_preprocess_command(lang)
Get the complete proprocess command for lang.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lang |
str
|
the language |
required |
Returns:
Type | Description |
---|---|
list[str]
|
the complete preprocess command. |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
|
human_readable_filesize(num, suffix='B')
Returns human readable filesize
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
326 327 328 329 330 331 332 333 |
|
ignored(*exceptions)
Ignore exceptions.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
283 284 285 286 287 288 289 |
|
is_executable(fullpath)
Check if the program in fullpath is executable.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fullpath |
str
|
the path to the program or script. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if fullpath contains a executable, False otherwise. |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
107 108 109 110 111 112 113 114 115 116 |
|
lineno()
Return the current line number in our program.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
204 205 206 |
|
name_to_unicode(filename)
Turn a filename to a unicode string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str
|
name of the file |
required |
Returns:
Type | Description |
---|---|
str
|
A unicode string. |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
259 260 261 262 263 264 265 266 267 268 269 270 271 |
|
note(msg)
Print msg to stderr.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
msg |
str
|
the message |
required |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
274 275 276 277 278 279 280 |
|
path_possibilities(program)
Check if program is found in $PATH.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
program |
str
|
name of program of script. |
required |
Yields:
Type | Description |
---|---|
str
|
possible fullpath to the program |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
print_element(element, level, indent, out)
Format an html document.
This function formats html documents for readability, to see the structure of the given document. It ruins white space in text parts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
element |
etree._Element
|
the element to format. |
required |
level |
int
|
indicate at what level this element is. |
required |
indent |
int
|
indicate how many spaces this element should be indented |
required |
out |
stream
|
a buffer where the formatted element is written. |
required |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 |
|
print_frame(debug='', *args)
Print debug output.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
52 53 54 55 56 57 58 59 60 61 62 |
|
replace_all(replacements, string)
Replace unwanted strings with wanted strings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
replacements |
list of tuple
|
unwanted:wanted string pairs. |
required |
string |
str
|
the string where replacements should be done. |
required |
Returns:
Type | Description |
---|---|
str
|
string with replaced strings. |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
|
run_in_parallel(function, max_workers, file_list, msg_format=_PARA_DEFAULT_MSG_FORMAT, *args, **kwargs)
Run function as many times as there are files in the file_list
,
in parallel. Each invocation gets one element of the file_list
.
Conceptually, it's like function(file) for file in file_list
, but
in parallel. Uses a ProcessPoolExecutor with max_workers
.
Any additional arguments (positional or keyword) given to
run_in_parallel
, will be passed along to the function
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
function |
Callable
|
The function to call. The first argument to the function is the file path. |
required |
max_workers |
int
|
How many worker processes to use |
required |
file_list |
list[str]
|
The list of files (full paths) |
required |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 |
|
sanity_check(program_list)
Look for programs and files that are needed to do the analysis.
If they don't exist, raise an exception.
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
|
sort_by_value(table, reverse=False)
Sort the table by value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
dict
|
the dictionary that should be sorted. |
required |
reverse |
bool
|
whether or not to sort in reverse |
False
|
Returns:
Type | Description |
---|---|
dict
|
sorted by value. |
Source code in /home/anders/projects/CorpusTools/corpustools/util.py
78 79 80 81 82 83 84 85 86 87 88 |
|