Edit an existing PDF file using iTextSharp

后端未结

关注

 2  1007

死守一世寂寞 2021-01-17 06:59

I have a pdf file which I am processing by converting it into text using the following coding..

ITextExtractionStrategy strategy = new SimpleTextExtractionSt


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   自闭症患者
                                             
                
                
                (楼主)
            
              
              
                2021-01-17 07:33
              

            
            
                        
Too long to be a comment; added as answer.

My good fellow and peer Adi, It depends a lot on your PDF contents. It's kind of hard to do a generic solution to something like this. What does currentText contain? Can you give an example of it? Also, if you have a lot of these PDFs to check, you need to get currentText of a few of them, just to make sure that your current PDF to string conversion produces the same result every time. If it is same every time from different PDFs; then you can start to automate.

The automation also depends a lot on your content, for example if current Text is something like this: Value: 10\nValue: 11\nValue: 9Value\n15 then what I recommend is going through every line, extracting the value and checking it against what you need it to be. This is untested semi-pseudo code that gives you an idea of what I mean:

var lines = new List(currentText.Split('\n'));
var newlines = new List();
foreach (var line in lines) {
    if (line != "Value: 10") {
        newLines.Add(line); // This line is correct, no marking needed
    } else {
        newlines.Add("THIS IS WRONG: " + line); // Mark as incorrect; use whatever you need here
    }
}

// Next, return newlines to the user showing them which lines are bad so they can edit the PDF


If you need to automatically edit the existing PDF, this will be very, very, very hard. I think it's beyond the scope of my answer - I was answering how to identify the wrong lines and not how to mark them - sorry! Someone else please add that answer.

By the way; PDF is NOT a good format for doing something like this. If you have access to any other source of information, most likely the other one will be better.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复