Skip to content

convert_using_pandoc

Convert files supported by pandoc to the html format.

to_html_elt(filename)

Convert the content of the give file to an lxml element.

Parameters:

Name Type Description Default
filename Path

path to the document

required

Returns:

Type Description
Element

An lxml element containing the html version of the given file.

Source code in corpustools/convert_using_pandoc.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def to_html_elt(filename: Path) -> etree.Element:
    """Convert the content of the give file to an lxml element.

    Args:
        filename: path to the document

    Returns:
        An lxml element containing the html version of the given file.
    """
    html_body = subprocess.run(
        ["pandoc", filename.as_posix()], encoding="utf-8", capture_output=True, check=False
    ).stdout

    return html.document_fromstring(f"<html><body>{html_body}</body></html>")