pdfbox

PDFBox 2.0 RC3 — Find and replace text

£可爱£侵袭症+ 提交于 2019-11-27 19:31:42
问题 How can one find and replace text inside a PDF document using PDFBox 2.0, they pulled the old example and it's syntax no longer works so I am wondering if it's still possible and if so what the best way to go about it is. Thanks! 回答1: You can try like this: public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException { if (Strings.isEmpty(searchString) || Strings.isEmpty(replacement)) { return document; } PDPageTree pages = document

Add page as layer from separate pdf(different page size) using pdfbox

杀马特。学长 韩版系。学妹 提交于 2019-11-27 18:43:52
问题 How can I add a page from external pdf doc to destination pdf if pages have different sizes? Here is what I'd like to accomplish: I tried to use LayerUtility (like in this example PDFBox LayerUtility - Importing layers into existing PDF), but once I import page from external pdf the process hangs: PDDocument destinationPdfDoc = PDDocument.load(fileInputStream); PDDocument externalPdf = PDDocument.load(EXTERNAL PDF); List<PDPage> destinationPages = destinationPdfDoc.getDocumentCatalog()

How to get raw text from pdf file using java

雨燕双飞 提交于 2019-11-27 17:37:46
I have some pdf files, Using pdfbox i have converted them into text and stored into text files, Now from the text files i want to remove Hyperlinks All special characters Blank lines headers footers of pdf files “1)”,“2)”, “a)”, “bullets”, etc. I want to get valid text line by line like this: We propose OntoGain, a method for ontology learning from multi-word concept terms extracted from plain text. OntoGain follows an ontology learning process dened by distinct processing layers. Building upon plain term extraction a con-cept hierarchy is formed by clustering the extracted concepts. The

How to down scale content of a pdf?

岁酱吖の 提交于 2019-11-27 16:32:15
i have a pdf which I need to down scale. The pdf is in A4 portrait mode, what I need is to shrink the content of the pdf to 5 % and put this into a new PDF also in size A4 and portrait mode. Its not an option to convert the pdf to images, scale them and put it back to a pdf. I am looking for a way to solve this in java. Is there a way to solve this with pdfbox or itext? If you use iText 7 , then this is an option: public void manipulatePdf(String src, String dest) throws IOException { PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest)); int n = pdfDoc.getNumberOfPages

How to sign pdf in Java using pdfbox

半腔热情 提交于 2019-11-27 16:12:23
问题 I am trying to sign pdf using pdfbox libraries. I have stuck now and realy need a help. This is my code: private static void signPdf(PDDocument document) throws Exception { PDSignature sig = new PDSignature(); sig.setFilter(COSName.ADOBE_PPKLITE); sig.setSubFilter(COSName.ADBE_PKCS7_DETACHED); sig.setByteRange(new int[] {'a','a','a','a'}); sig.setContents(new byte[]{(byte) 23, (byte) 23, (byte) 23, (byte) 23}); SignatureOptions options = new SignatureOptions(); document.addSignature(sig, new

Identifying the text based on the output in PDF using PDFBOX

走远了吗. 提交于 2019-11-27 15:54:49
Iam using the PDF BOX for getting color information of the text in PDF. I could able to get the output by using the following code. But my doubt is what StrokingColor represents, what Non stroking color represents. Based on this how will i decide which text is having which color. Anyone suggest me? My cuurent output is like this:DeviceRGB DeviceCMYK java.awt.Color[r=63,g=240,b=0] java.awt.Color[r=35,g=31,b=32] 34.934998 31.11 31.875 PDDocument doc = null; try { doc = PDDocument.load(strFilepath); PDFStreamEngine engine = new PDFStreamEngine(ResourceLoader.loadProperties("org/apache/pdfbox

Extract Image from PDF using Java

徘徊边缘 提交于 2019-11-27 14:24:39
I need to extract bar-code from PDF only (using rectangle), not converting the whole PDF into image. The image format can be jpg/png. zawhtut You can use Pdfbox List pages = document.getDocumentCatalog().getAllPages(); Iterator iter = pages.iterator(); while( iter.hasNext() ) { PDPage page = (PDPage)iter.next(); PDResources resources = page.getResources(); Map images = resources.getImages(); if( images != null ) { Iterator imageIter = images.keySet().iterator(); while( imageIter.hasNext() ) { String key = (String)imageIter.next(); PDXObjectImage image = (PDXObjectImage)images.get( key );

Create pkcs7 signature from file digest

纵饮孤独 提交于 2019-11-27 14:15:13
问题 Currently i have a client-server application that, given a PDF file, signs it (with the server certificate), attachs the signature with the original file and returns the output back to the client (all of this is achieved with PDFBox). I have a Signature handler, which is my External Signing Support (where content is the PDF file) public byte[] sign(InputStream content) throws IOException { try { System.out.println("Generating CMS signed data"); CMSSignedDataGenerator generator = new

Combining XFA with PDFBox

℡╲_俬逩灬. 提交于 2019-11-27 13:21:17
问题 I would like to fill a PDF form with the PDFBox java library. The PDF form is created with Adobe Live Designer, so it uses the XFA format. I try to find resources about filling XFA PDF forms with PDFBox, but i haven't any luck so far. I saw that a PDAcroForm.setXFA method is available in the API, but i don't see how to use it. Do you know if it is possible to fill a PDF Form with PDFBox ? If yes, is there anywhere a code sample or a tutorial to achieve this ? If no, what are the best

How to extract text from a PDF file with Apache PDFBox

跟風遠走 提交于 2019-11-27 10:51:13
问题 I would like to extract text from a given PDF file with Apache PDFBox. I wrote this code: PDFTextStripper pdfStripper = null; PDDocument pdDoc = null; COSDocument cosDoc = null; File file = new File(filepath); PDFParser parser = new PDFParser(new FileInputStream(file)); parser.parse(); cosDoc = parser.getDocument(); pdfStripper = new PDFTextStripper(); pdDoc = new PDDocument(cosDoc); pdfStripper.setStartPage(1); pdfStripper.setEndPage(5); String parsedText = pdfStripper.getText(pdDoc); System