pdfbox

convert pdf to svg

五迷三道 提交于 2019-11-27 10:23:46
I want to convert PDF to SVG please suggest some libraries/executable that will be able to do this efficiently. I have written my own java program using the apache PDFBox and Batik libraries - PDDocument document = PDDocument.load( pdfFile ); DOMImplementation domImpl = GenericDOMImplementation.getDOMImplementation(); // Create an instance of org.w3c.dom.Document. String svgNS = "http://www.w3.org/2000/svg"; Document svgDocument = domImpl.createDocument(svgNS, "svg", null); SVGGeneratorContext ctx = SVGGeneratorContext.createDefault(svgDocument); ctx.setEmbeddedFontsOn(true); // Ask the test

Converting PDF to multipage tiff (Group 4)

跟風遠走 提交于 2019-11-27 09:31:38
I'm trying to convert PDFs as represented by the org.apache.pdfbox.pdmodel.PDDocument class and the icafe library ( https://github.com/dragon66/icafe/ ) to a multipage tiff with group 4 compression and 300 dpi. The sample code works for me for 288 dpi but strangely NOT for 300 dpi, the exported tiff remains just white. Has anybody an idea what the issue is here? The sample pdf which I use in the example is located here: http://www.bergophil.ch/a.pdf import java.awt.image.BufferedImage; import java.io.FileOutputStream; import java.io.IOException; import org.apache.pdfbox.pdmodel.PDDocument;

Writing Arabic with PDFBOX with correct characters presentation form without being separated

扶醉桌前 提交于 2019-11-27 09:09:38
I'm trying to generate a PDF that contains Arabic text using PDFBox Apache but the text is generated as separated characters because Apache parses given Arabic string to a sequence of general 'official' Unicode characters that is equivalent to the isolated form of Arabic characters. Here is an example: Target text to Write in PDF "Should be expected output in PDF File" -> جملة بالعربي What I get in PDF File -> I tried some methods but it's no use here are some of them: 1. Converting String to Stream of bits and trying to extract right values 2. Treating String a sequence of bytes with UTF-8 &&

PDFBox Pdf to Image losing QR Code “ColorSpace Pattern doesn't provide a non-stroking color”

落花浮王杯 提交于 2019-11-27 08:43:22
问题 Similar to this SO PDFBox - PDF to Image losing barcode The PDF in question: https://drive.google.com/file/d/0B13zTPQR9uxscXRMWjhsZ0doa00/view?usp=sharing There is minimal text, and a medium sized QR Code. I have tried many different solutions to convert this PDF page to an image using PDFBox/ImageIO, but so far the QR Code is always missing from the result. When I use PDFBox's PDFImageWriter I get this log: ColorSpace Pattern doesn't provide a non-stroking color, using white instead! I'm

Taking a screenshot of a scene or a portion of a scene in JavaFx 2.2

守給你的承諾、 提交于 2019-11-27 08:14:08
问题 I've managed to make a WritableImage using WritableImage snapshot = obj.getScene().snapshot(null); Now I would like to output this screenshot on a pdf file. I've already managed to output text to a pdf using Apache pdfbox library using the following code: PDDocument doc = null; PDPage page = null; try{ doc = new PDDocument(); page = new PDPage(); doc.addPage(page); PDFont font = PDType1Font.HELVETICA_BOLD; PDPageContentStream content = new PDPageContentStream(doc, page); content.beginText();

PDFBox: Problem with converting pdf page into image

杀马特。学长 韩版系。学妹 提交于 2019-11-27 07:54:41
My mission is pretty simple: converting every single page of a pdf file into images. I tried using icepdf open source version to generate the images but they don't generate the image with the correct font. So I start using PDFBox instead. The code is the following: PDDocument document = PDDocument.load(new File("testing.pdf")); List<PDPage> pages = document.getDocumentCatalog().getAllPages(); for (int i = 0; i < pages.size(); i++) { PDPage singlePage = pages.get(i); BufferedImage buffImage = convertToImage(singlePage, 8, 12); ImageIO.write(buffImage, "png", new File(PdfUtil.DATA_OUTPUT_DIR+

PDFBox : Maintaining PDF structure when extracting text

耗尽温柔 提交于 2019-11-27 06:23:13
问题 I'm trying to extract text from a PDF which is full of tables. In some cases, a column is empty. When I extract the text from the PDF, the emptys columns are skiped and replaced by a whitespace, therefore, my regulars expressions can't figure out that there was a column with no information at this spot. Image to a better understanding : We can see that the columns aren't respected in the extracted text Sample of my code that extract the text from PDF : PDFTextStripper reader = new

PDFBox : PDPageContentStream's append mode misbehaving

十年热恋 提交于 2019-11-27 05:21:29
I am drawing an image on one of the PDF page.. when I use PDPageContentStream stream = new PDPageContentStream(doc, page); to draw image, everything works fine.. see below image. but when I use constructor PDPageContentStream(doc, page, true, true); to create PDPageContentStream and draw image, the newly added image gets inverted upside down.. not getting what's going wrong here.. PS. I am using library PdfBox-Android Use the constructor that has a fifth parameter, so to reset the graphic context. public PDPageContentStream(PDDocument document, PDPage sourcePage, boolean appendContent, boolean

remove invisible text from pdf using pdfbox

筅森魡賤 提交于 2019-11-27 03:49:52
问题 Link to pdf When I try to extract the text from the pdf above, I get a mixture of text that was invisible in the evince viewer as well as text that was visible. In addition, some of the desired text is missing characters that were not missing in the viewer, such as, the 'S' in 'FALCONS' and the many missing '½' characters. I believe this is due to interference from the invisible text because when highlighting the pdf in the viewer, the invisible text can be seen overlapping visible text. Is

How to add PDFBox to an Android project or suggest alternative

跟風遠走 提交于 2019-11-27 03:45:55
I'm attempting to open an existing pdf file and then add another page to the pdf document from within an Android application. On the added page, I need to add some text and an image. I am wanting to give PDFBox a try. Other solutions such as iTextPDF aren't suitable for our company because of the licencing terms/price. I have a library project with the main code base, and also full and lite projects that reference the library project. I have downloaded the jar from http://pdfbox.apache.org/download.html and copied it into the library projects lib folder and added the pdfbox-app-1.6.0.jar file