When decoding an image within a PDF as FlateDecode via iTextSharp the image is distorted and I can\'t seem to figure out why.
The recognized bpp is
If you're able to use the latest version (5.1.3), the API to extract FlateDecode and other image types has been simplified using the iTextSharp.text.pdf.parser namespace. Basically you use a PdfReaderContentParser to help you parse the PDF document, then you implement the IRenderListener interface specifically (in this case) to deal with images. Here's a working example HTTP handler:
<%@ WebHandler Language="C#" Class="bmpExtract" %>
using System;
using System.Collections.Generic;
using System.IO;
using System.Web;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
public class bmpExtract : IHttpHandler {
public void ProcessRequest (HttpContext context) {
HttpServerUtility Server = context.Server;
HttpResponse Response = context.Response;
PdfReader reader = new PdfReader(Server.MapPath("./bmp.pdf"));
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener();
for (int i = 1; i <= reader.NumberOfPages; i++) {
parser.ProcessContent(i, listener);
}
for (int i = 0; i < listener.Images.Count; ++i) {
string path = Server.MapPath("./" + listener.ImageNames[i]);
using (FileStream fs = new FileStream(
path, FileMode.Create, FileAccess.Write
))
{
fs.Write(listener.Images[i], 0, listener.Images[i].Length);
}
}
}
public bool IsReusable { get { return false; } }
public class MyImageRenderListener : IRenderListener {
public void RenderText(TextRenderInfo renderInfo) { }
public void BeginTextBlock() { }
public void EndTextBlock() { }
public List<byte[]> Images = new List<byte[]>();
public List<string> ImageNames = new List<string>();
public void RenderImage(ImageRenderInfo renderInfo) {
PdfImageObject image = null;
try {
image = renderInfo.GetImage();
if (image == null) return;
ImageNames.Add(string.Format(
"Image{0}.{1}", renderInfo.GetRef().Number, image.GetFileType()
));
using (MemoryStream ms = new MemoryStream(image.GetImageAsBytes())) {
Images.Add(ms.ToArray());
}
}
catch (IOException ie) {
/*
* pass-through; image type not supported by iText[Sharp]; e.g. jbig2
*/
}
}
}
}
The iText[Sharp] development team is still working on the implementation, so I can't say for sure if it will work in your case. But it does work on this simple example PDF. (used above and with a couple of other PDFs I tried with bitmap images)
EDIT: I've been experimenting with the new API too and made a mistake in the original code example above. Should have initialized the PdfImageObject to null outside the try..catch block. Correction made above.
Also, when I use the above code on an unsupported image type, (e.g. jbig2) I get a different Exception - "The color depth XX is not supported", where "XX" is a number. And iTextSharp does support FlateDecode in all the examples I've tried. (but that's not helping you in this case, I know)
Is the PDF produced by third-party software? (non-Adobe) From what I've read in the book, some third-party vendors produce PDFs that aren't completely up to spec, and iText[Sharp] can't deal with some of these PDFs, while Adobe products can. IIRC I've seen cases specific to some PDFs generated by Crystal Reports on the iText mailing list that caused problems, here's one thread.
Is there any way you can generate a test PDF with the software you're using with some non-sensitive FlateDecode image(s)? Then maybe someone here could help a little better.
Try copy your data row by row, maybe it will solve the problem.
int w = imgObj.GetAsNumber(PdfName.WIDTH).IntValue;
int h = imgObj.GetAsNumber(PdfName.HEIGHT).IntValue;
int bpp = imgObj.GetAsNumber(PdfName.BITSPERCOMPONENT).IntValue;
var pixelFormat = PixelFormat.Format1bppIndexed;
byte[] rawBytes = PdfReader.GetStreamBytesRaw((PRStream)imgObj);
byte[] decodedBytes = PdfReader.FlateDecode(rawBytes);
byte[] streamBytes = PdfReader.DecodePredictor(decodedBytes, imgObj.GetAsDict(PdfName.DECODEPARMS));
// byte[] streamBytes = PdfReader.GetStreamBytes((PRStream)imgObj); // same result as above 3 lines of code.
using (Bitmap bmp = new Bitmap(w, h, pixelFormat))
{
var bmpData = bmp.LockBits(new Rectangle(0, 0, w, h), ImageLockMode.WriteOnly, pixelFormat);
int length = (int)Math.Ceiling(w * bpp / 8.0);
for (int i = 0; i < h; i++)
{
int offset = i * length;
int scanOffset = i * bmpData.Stride;
Marshal.Copy(streamBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
}
bmp.UnlockBits(bmpData);
bmp.Save(fileName);
}