How do I parse/extract information from an HWPX file in Python without Hancom Office?

11 hours ago 1
ARTICLE AD BOX

As stated in the title. I have .hwpx files (Hancom Office documents) that I need to process in Python, extract the text, tables, and metadata for a search indexing pipeline.

The problem:

I don't have Hancom Office installed on my computer.

Is there a pure-Python library that can read .hwpx files directly and give me the text content, or export to Markdown or JSON?

For context: hwpx is Hancom's XML-based format, used widely in Korean academic and workplace documents.

Read Entire Article