OutOfMemoryError when converting docx to PDF with Docx4J

3 weeks ago 14
ARTICLE AD BOX

Having a .docx file, I need to convert it to a .pdf format. I use Docx4J to do the conversion; more specifically:

<dependency> <groupId>org.docx4j</groupId> <artifactId>docx4j-JAXB-ReferenceImpl</artifactId> <version>11.5.7</version> </dependency> <dependency> <groupId>org.docx4j</groupId> <artifactId>docx4j-export-fo</artifactId> <version>11.5.9</version> </dependency>

(11.5.7 for docx4j-JAXB-ReferenceImpl is a constraint from another framework that I am forced to use, while 11.5.9 is the latest that I am allowed to use for the support dependency docx4j-export-fo; the different versions between the 2 artifacts from the same group are not an issue.)

Constraints:

I am open to alternatives, but cannot use any alternative solution that would rely on iTextPdf. When developing I can use any amount of memory, but when deploying the application I have a hard limit of only 512 MB as -Xmx for the JVM try (final var docxInputStream = new ByteArrayInputStream(fileAsDocx); final var baos = new ByteArrayOutputStream()) { final var wordMlPackage = WordprocessingMLPackage.load(docxInputStream); Docx4J.toPDF(wordMlPackage, baos); // <-- issue happening right here return baos.toByteArray(); }

Docx4J.toPDF(...) consumes a lot of heap memory, leading the execution to an OutOfMemoryError in the deployment scenario if the file to print becomes too big.

So far, the maximum amounts of allocated heap memory registered with jProfiler (an Intellij IDEA tool) is around 2 GB, when testing the application in local.

analysis of memory consumed by the application, in local, with jProfiler

The line Docx4J.toPDF(...) is the bottleneck of the block of code; in the same test case that consumed 2 GB, the execution time of that single line.

Additionally:

inputs and outputs of the block already use streams, so as to avoid loading and writing entire files: this part is therefore already optimised I cannot use xDocReport as a solution, because xDocReportinternally uses iTextPdf; this question is therefore not a duplicate of Converting Docx to image using Docx4j and PdfBox causes OutOfMemoryError
Read Entire Article