Convert DOC / DOCX to PNG [closed]

拥有回忆 提交于 2019-12-18 18:53:47

问题


I am trying to create a web service that will convert a doc/docx to png format.

The problem I seem to have is I can't find any library or something close to it that will do what I need, considering I am looking for something free and not Office dependent (the server where the app will run does not have Office installed).

Is there anything that can help me in obtaining this? Or must I choose between using something office dependant (like Interop - which btw I read is really bad to be used on server) or something that isn't free?

Thanks


回答1:


I know this is most likely not what you want, since it is not free.

But Aspose can do what you need.

Spire.doc too. Again, not free.

Aspose:

string exeDir = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) + Path.DirectorySeparatorChar;
string dataDir = new Uri(new Uri(exeDir), @"../../Data/").LocalPath;

// Open the document.
Document doc = new Document(dataDir + "SaveAsPNG.doc");

//Create an ImageSaveOptions object to pass to the Save method
ImageSaveOptions options = new ImageSaveOptions(SaveFormat.Png);
options.Resolution = 160;

// Save each page of the document as Png.
for (int i = 0; i < doc.PageCount; i++)
{
    options.PageIndex = i;
    doc.Save(string.Format(dataDir+i+"SaveAsPNG out.Png", i), options);
}

Spire.doc (WPF):

using Spire.Doc;
using Spire.Doc.Documents;

namespace Word2Image
{
    /// 
    /// Interaction logic for MainWindow.xaml
    /// 
    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, RoutedEventArgs e)
        {
            Document doc = new Document("sample.docx", FileFormat.Docx2010);
            BitmapSource[] bss = doc.SaveToImages(ImageType.Bitmap);
            for (int i = 0; i < bss.Length; i++)
            {
                SourceToBitmap(bss[i]).Save(string.Format("img-{0}.png", i));
            }
        }

        private Bitmap SourceToBitmap(BitmapSource source)
        {        

            Bitmap bmp;
            using (MemoryStream ms = new MemoryStream())
            {
                PngBitmapEncoder encoder = new PngBitmapEncoder();
                encoder.Frames.Add(BitmapFrame.Create(source));
                encoder.Save(ms);
                bmp = new Bitmap(ms);
            }
            return bmp;
        }
    }
}



回答2:


Yes, such complex file types conversions are usually well implemented in the specialized / 3-rd party libraries (like in the aforementioned one), or, for example, in the DevExpress Document Automation:

using System;
using System.Drawing.Imaging;
using System.IO;
using DevExpress.XtraPrinting;
using DevExpress.XtraRichEdit;

using(MemoryStream streamWithWordFileContent = new MemoryStream()) {
    //Populate the streamWithWordFileContent object with your DOC / DOCX file content

    RichEditDocumentServer richContentConverter = new RichEditDocumentServer();
    richContentConverter.LoadDocument(streamWithWordFileContent, DocumentFormat.Doc);

    //Save
    PrintableComponentLink pcl = new PrintableComponentLink(new PrintingSystem());
    pcl.Component = richContentConverter;
    pcl.CreateDocument();

    ImageExportOptions options = new ImageExportOptions(ImageFormat.Png);

    //Paging
    //options.ExportMode = ImageExportMode.SingleFilePageByPage;
    //options.PageRange = "1";

    pcl.ExportToImage(MapPath(@"~/DocumentAsImageOnDisk.png"), options);
}



回答3:


Install LibreOffice on your server. The latest versions of LibreOffice have a command line interface that will work for saving your document as a PDF. (libreoffice --headless --convert-to pdf filename.doc[x])

Then use e.g. imagemagick or for example the LibreOffice Draw conversion options to convert the PDF to an image.




回答4:


I think the best way to do it for free and without an office client will require a 3-step process: Convert doc/docx to html - Convert html to PDF - convert PDF to PNG.

Open XML will get you past the first post. This does not require any installed Office clients and there is a really good resource that can help you put together the code to solve this first step (http://openxmldeveloper.org/). However I don't think it can solve the PDF/PNG problem. Hence,

iTextSharp will do the free PDF conversion for you. But it can't go from PDF to PNG. So lastly,

GhostScript.NET will get you over the finish line.

These are the links I collated which seem to be the most useful:

  • Half-working way of converting docx to html: How to convert docx to html file using open xml with formatting
  • Off-topic question with example on how to use Ghostscript to convert png: Convert PDF to JPG / Images without using a specific C# Library
  • Another link that uses Ghostscript: Is it possible to convert PDF page to Image using itextSharp?

I get the feeling no one has ever done this using free tools. If you succeed, please share your code on Github :)




回答5:


If it's an option for you to install a PNG virtual printer on your system you could consider some software as PDFCreator (print to PNG, too), or something similar.




回答6:


Consider dynamic convertion docx to html using powertools (or even using office VSTO, it will be fast) and then using wkhtmltopdf (directly or with pechkin or similar) to render png from html. I've wrote why wkhtmltopdf is better then for ex. iTextSharp here. By the way, I think that the best commercial library to work with doc/docx is TxText - its really awesome, you can do anything you want.



来源:https://stackoverflow.com/questions/33217591/convert-doc-docx-to-png

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!