How to read the text from image (captcha) by using Selenium WebDriver with Java

后端 未结 7 997
囚心锁ツ
囚心锁ツ 2020-12-15 02:52

I have registration webpage but in last captcha is displaying..

I am not able to read the text from image. I am going to mention the code and output ..

<         


        
相关标签:
7条回答
  • 2020-12-15 03:14

    One can not read from CAPTCHA. If you can read from CAPTCHA, there is no point in using CAPTCHA.

    0 讨论(0)
  • 2020-12-15 03:19

    The whole purpose of CAPTCHA is to prevent automation from the UI! You may wanna use internal APIs for verifying the action.

    0 讨论(0)
  • 2020-12-15 03:21

    Just to elaborate the previous answers, CAPTCHA as an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart". So, if "machine" can solve it, it's not really do it's job.

    In order to solve it, there is something you can do - to use API of external services such as http://www.deathbycaptcha.com. You implementing their API, passing them the CAPTCHA and get in return the text. The average solving time i have observed is around 10-15 seconds.

    Example for implementation (taken from here)

    import com.DeathByCaptcha.AccessDeniedException;
    import com.DeathByCaptcha.Captcha;
    import com.DeathByCaptcha.Client;
    import com.DeathByCaptcha.SocketClient;
    import com.DeathByCaptcha.HttpClient;
    
    /* Put your DeathByCaptcha account username and password here.
       Use HttpClient for HTTP API. */
    Client client = (Client)new SocketClient(username, password);
    try {
        double balance = client.getBalance();
    
        /* Put your CAPTCHA file name, or file object, or arbitrary input stream,
           or an array of bytes, and optional solving timeout (in seconds) here: */
        Captcha captcha = client.decode(captchaFileName, timeout);
        if (null != captcha) {
            /* The CAPTCHA was solved; captcha.id property holds its numeric ID,
               and captcha.text holds its text. */
            System.out.println("CAPTCHA " + captcha.id + " solved: " + captcha.text);
    
            if (/* check if the CAPTCHA was incorrectly solved */) {
                client.report(captcha);
            }
        }
    } catch (AccessDeniedException e) {
        /* Access to DBC API denied, check your credentials and/or balance */
    }
    
    0 讨论(0)
  • 2020-12-15 03:28

    And here is the sample code to read the text from above image :

    import java.awt.Image;
    import java.awt.image.BufferedImage;
    import java.awt.image.RenderedImage;
    import java.io.File;
    import java.io.IOException;
    import java.net.URL;
    import javax.imageio.ImageIO;
    import org.openqa.selenium.WebDriver;
    import org.openqa.selenium.firefox.FirefoxDriver;
    import org.testng.annotations.BeforeTest;
    import org.testng.annotations.Test;
    import com.asprise.util.ocr.OCR;
    
    public class ExtractImage {
    
     WebDriver driver;
    
     @BeforeTest
      public void setUpDriver() {
       driver = new FirefoxDriver();
      }
    
     @Test
     public void start() throws IOException{
    
     /*Navigate to http://www.mythoughts.co.in/2013/10/extract-and-verify-text-from-image.html page
      * and get the image source attribute
      *  
      */  
     driver.get("http://www.mythoughts.co.in/2013/10/extract-and-verify-text-from-image.html");
     String imageUrl=driver.findElement(By.xpath("//*[@id='post-body-5614451749129773593']/div[1]/div[1]/div/a/img")).getAttribute("src");
     System.out.println("Image source path : \n"+ imageUrl);
    
     URL url = new URL(imageUrl);
     Image image = ImageIO.read(url);
     String s = new OCR().recognizeCharacters((RenderedImage) image);
     System.out.println("Text From Image : \n"+ s);
     System.out.println("Length of total text : \n"+ s.length());
     driver.quit();
    
     /* Use below code If you want to read image location from your hard disk   
      *   
       BufferedImage image = ImageIO.read(new File("Image location"));   
       String imageText = new OCR().recognizeCharacters((RenderedImage) image);  
       System.out.println("Text From Image : \n"+ imageText);  
       System.out.println("Length of total text : \n"+ imageText.length());   
    
       */ 
    }
    
    }
    

    Here is the output of the above program:

    Image source path : http://2.bp.blogspot.com/-42SgMHAeF8U/Uk8QlYCoy-I/AAAAAAAADSA/TTAVAAgDhio/s1600/love.jpg

    Never M2suse the O, ne Who Likes You Never Say Busy To Th,e One Who Needs You Never cheat The One Who ReaZZy Trust You, Never foJnget The One Who Zways Remember You.

    Length of total text : 175

    0 讨论(0)
  • 2020-12-15 03:30

    Two problems.

    1. You have the wrong xpath so you getting a NoSuchElement exception.

    2. Even you had the right xpath, you would not be able to extract the text, as that would defeat the point if CAPTCHA

    0 讨论(0)
  • 2020-12-15 03:32

    I have a solution which will work for a specific website. You can get a snapshot of the whole page and get the image of captcha. Then divide the whole width of the captcha image by total number of characters (in a captcha generally it's usually constant). Now we have the individual characters of the captcha image. Collect all the possible characters of the captcha by reloading the page.

    Once you have all the possible characters then given any captcha image you can compare its characters with the images that we have and decide which letter or number it is.

    Steps to follow:

    1. Collect captcha image and divide it into individual characters.

      private static BufferedImage cropImage(File filePath, int x, int y, int w,
                  int h) {
      
              try {
                  BufferedImage originalImgage = ImageIO.read(filePath);
                  BufferedImage subImgage = originalImgage.getSubimage(x, y, w, h);
      
                  return subImgage;
              } catch (IOException e) {
                  e.printStackTrace();
                  return null;
              }
          }
      
      1. Keep all possible images in a folder

      2. Now read each character image of the captcha and compare it with all other images in above folder. You can compare two images using pixel values public static float getDiff(File f1, File f2, int width, int height) throws IOException { BufferedImage bi1 = null; BufferedImage bi2 = null; bi1 = new BufferedImage(width, height, BufferedImage.TYPE_INT_ARGB); bi2 = new BufferedImage(width, height, BufferedImage.TYPE_INT_ARGB);

                bi1 = ImageIO.read(f1);
                bi2 = ImageIO.read(f2);
                float diff = 0;
                for (int i = 0; i < width; i++) {
                    for (int j = 0; j < height; j++) {
                        int rgb1 = bi1.getRGB(i, j);
                        int rgb2 = bi2.getRGB(i, j);
        
                        int b1 = rgb1 & 0xff;
                        int g1 = (rgb1 & 0xff00) >> 8;
                        int r1 = (rgb1 & 0xff0000) >> 16;
        
                        int b2 = rgb2 & 0xff;
                        int g2 = (rgb2 & 0xff00) >> 8;
                        int r2 = (rgb2 & 0xff0000) >> 16;
        
                        diff += Math.abs(b1 - b2);
                        diff += Math.abs(g1 - g2);
                        diff += Math.abs(r1 - r2);
                    }
                }
                return diff;
            }
        
    2. Whichever images having less diff value that is the actual match. Append its name to a string.
    3. After reading all images of the captcha return string 1: https://i.stack.imgur.com/FYPhd.png

    In above picture image name specifies the digit or character.

    This works only for simple captcha like [enter image description here1

    0 讨论(0)
提交回复
热议问题