DOCX to PDF: SaveAs2 vs ExportAsFixedFormat vs PrintOut

て烟熏妆下的殇ゞ 提交于 2021-02-08 10:23:22

问题


I have the tiny goal to convert a huge load of docx files to pdfs using C# and .NET without opening Word (visibly) and without using any third party library (less components to manage and less money to spend). At the moment, I am trying to correctly convert a single document, which has to be as efficient as possible in order to quickly convert the huge load mentioned before (several thousand docx files, each between 100 and 300 kb).

I am

using Word = Microsoft.Office.Interop.Word;

and I am creating an instance of the Word application as follows
(explicitly without ScreenUpdating = false due to this option causing custom borders not to be converted into the result):

private static Word.Application word = new Word.Application()
{
    Visible = false
};

as well as opening the document to be converted the following way

Word.Document docxFile = word.Documents.Open(sourcePath, Visible: false);

There are 3 ways I have successfully used to convert a docx file of 100 kb to a pdf:

Microsoft.Office.Interop.Word.Document.SaveAs2

docxFile.SaveAs2(FileName: outputPath,
                FileFormat: Word.WdSaveFormat.wdFormatPDF);
Duration     |  pdf size
————————————————————————
757,5307 ms  |  277 kb

Microsoft.Office.Interop.Word.Document.ExportAsFixedFormat

docxFile.ExportAsFixedFormat(OutputFileName: outputPath,
                            ExportFormat: Word.WdExportFormat.wdExportFormatPDF,
                            OptimizeFor: Word.WdExportOptimizeFor.wdExportOptimizeForOnScreen);
Duration      |  pdf size
————————————————————————
783,51333 ms  |  285 kb

Microsoft.Office.Interop.Word.Document.PrintOut

docxFile.Activate();
docxFile.PrintOut(
    OutputFileName: outputPath,
    PrintToFile: true
);
Duration     |  pdf size
————————————————————————
998,5403 ms  |  290 kb

This last option has two additional downsides:

  • it opens a small dialog or popup that shows the printing progress, I guess that causes the slightly extended run time
  • the document has to be activated before by docxFile.Activate(), otherwise a COMException is thrown

Time measurement / estimation:
I simply took a DateTime.Now before starting the conversion on an already opened document and took another DateTime.Now after having closed that document. Then I subtracted the first from the second one:

DateTime conversionBegin = DateTime.Now;
// conversion followed by closing the document
...
DateTime conversionEnd = DateTime.Now;

TimeSpan conversionTime = conversionEnd.Subtract(conversionBegin);
Console.WriteLine("Conversion time: " + conversionTime.TotalMilliseconds + " ms");

I am aware of this not being a very reliable time measurement, but I don't really need one. I just wanted to see if there are significant differences.

That all leads to the (one) question...

Why does the content of each of the resulting pdfs look exactly the same but the conversion time is different and the resulting files have different sizes?

Maybe I remove the following text to avoid close votes, but for now:
This is just a request for additional information, the question to be answered is the one above
:
I would love to additionally read opinions, hints, recommendations or advices targeting the questions

What is the preferred way to convert a docx to a pdf when it comes to several thousand conversions in one run?
and
What parameters or parameter values of the methods might improve the conversion time?

来源:https://stackoverflow.com/questions/60991409/docx-to-pdf-saveas2-vs-exportasfixedformat-vs-printout

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!