Skip to content

convert_using_pandoc

Convert files supported by pandoc to the html format.

to_html_elt(filename)

Convert the content of the give file to an lxml element.

Parameters:

Name Type Description Default
filename str

path to the document

required

Returns:

Type Description
lxml.etree.Element

An lxml element containing the html version of the given file.

Source code in /home/anders/projects/CorpusTools/corpustools/convert_using_pandoc.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def to_html_elt(filename):
    """Convert the content of the give file to an lxml element.

    Args:
        filename (str): path to the document

    Returns:
        (lxml.etree.Element): An lxml element containing the html
            version of the given file.
    """
    html_body = subprocess.run(
        ["pandoc", filename], encoding="utf-8", capture_output=True
    ).stdout

    return html.document_fromstring(f"<html><body>{html_body}</body></html>")