PDF content stream “TJ /Tj” split without messing the remaining text matrices?

↘锁芯ラ 提交于 2020-06-13 09:36:33

问题


I want to split TJ/Tj operator's COSString using the PDFBOX.

My pdf current content stream looks like below.

Desired output

or

what I tried?

 public static void SplitTj_TJ(int tj_ind, PDDocument document) throws IOException{
      PDPage page = document.getPage(0);
      PDFStreamParser parser = new PDFStreamParser(page);
      parser.parse();
      List tokens = parser.getTokens();
      Operator op = (Operator) tokens.get(tj_ind);
      COSFloat dest_x = new COSFloat((float) 90.81199646);
      COSFloat dest_y = new COSFloat((float) 0);
      if ( tokens.get(tj_ind) instanceof Operator && (op.getName().equals("TJ") || op.getName().equals("Tj"))){
          COSArray tj_array = (COSArray) tokens.get(tj_ind-1);
          tokens.remove(tj_ind);
          tokens.remove(tj_ind-1);
          tokens.add((int) (tj_ind-1),  tj_array.get(0));
          tokens.add((int) (tj_ind),  Operator.getOperator("Tj"));
          tj_array.remove(0);
          tokens.add((int) (tj_ind+1), dest_x);
          tokens.add((int) (tj_ind+2), dest_y);
          tokens.add((int) (tj_ind+3), Operator.getOperator("Td"));
          tokens.add((int) (tj_ind+4),  tj_array.get(1));
          tokens.add((int) (tj_ind+5), Operator.getOperator("Tj"));
          tokens.remove(tj_ind+9);
          tokens.add((int) (tj_ind+9), new COSFloat((float) -90.81199646));

          System.out.println("!@#$%^&*(*&^@#$%^&^$#@#$%^&^$#@#$%^%$#@#$%^%#@#$%^%#@#^");
          PDStream newContents = new PDStream(document);
          OutputStream out = newContents.createOutputStream(COSName.FLATE_DECODE);
          ContentStreamWriter writer = new ContentStreamWriter(out);
          writer.writeTokens(tokens);
          System.out.println("Count at end :::::"+tokens.size());
          out.close();
          document.getPage(0).setContents(newContents);
          PDDocument pdf = new PDDocument();
          pdf.addPage(document.getPage(0));
          pdf.save("D:/Testfiles/brigs11.pdf");
          pdf.close();


      }
  }

I am not sure this will work for all cases. What is the generic code to make it work .

How can I achieve this using PDFBOX. I can able to split all the TJ/Tj's under the all type of text position operators without messing up the existing stream?

来源:https://stackoverflow.com/questions/61927174/pdf-content-stream-tj-tj-split-without-messing-the-remaining-text-matrices

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!