Skip to content

html_cleaner

Script to write a nicely indented html doc.

Mainly used to debug the input to the converter.HTMLContentConverter.

main()

Convert an html file, and print the result to outfile.

Source code in /home/anders/projects/CorpusTools/corpustools/html_cleaner.py
50
51
52
53
54
55
56
def main():
    """Convert an html file, and print the result to outfile."""
    args = parse_args()

    c = htmlconverter.to_html_elt(args.inhtml)
    with open(args.outhtml, "w") as outfile:
        util.print_element(c, 0, 4, outfile)

parse_args()

Parse the commandline options.

Returns:

Type Description
argparse.Namespace

the parsed commandline arguments

Source code in /home/anders/projects/CorpusTools/corpustools/html_cleaner.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def parse_args():
    """Parse the commandline options.

    Returns:
        (argparse.Namespace): the parsed commandline arguments
    """
    parser = argparse.ArgumentParser(
        parents=[argparse_version.parser],
        description="Program to print out a nicely indented html document. "
        "This makes it easier to see the structure of it. This eases "
        "debugging the conversion of html documents.",
    )

    parser.add_argument("inhtml", help="The path of the html to indent.")
    parser.add_argument(
        "outhtml", help="The place where the indented html doc is written"
    )

    return parser.parse_args()