I am using PdfBox in Java to extract text from PDF files. Some of the input files provided are not valid and PDFTextStripper halts on these files. Is there a clean way to ch
You have to try this....
public boolean isPDF(File file){ file = new File("Demo.pdf"); Scanner input = new Scanner(new FileReader(file)); while (input.hasNextLine()) { final String checkline = input.nextLine(); if(checkline.contains("%PDF-")) { // a match! return true; } } return false; }