Entering an input, clicking it and retrieving particular information with Selenium [on hold]



  • 问题

    I\'m new to web scraping with Python. My intent is to retrieve the verb for a word of interest. For e.g. dictionary.com has definitions for different parts of speech for word, I would like to enter a word of interest and then hit the search icon, in the resulting page I would like to extract the information under the header \'verb\'.

    回答1:

    1

    To extract the information under verb header Induce WebDriverWait and presence_of_all_elements_located()

    Here is the code.

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver=webdriver.Chrome()
    driver.get("https://www.dictionary.com/")
    WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(.,'Accept Cookies')]"))).click()
    elementsearch=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input[title='Search']")))
    elementsearch.send_keys("interest")
    elementsearch.submit()
    results=WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.XPATH,"//span[@class='luna-pos'][contains(.,'verb')]/following::div[1]//div[@class='default-content']//div")))
    for item in results:
        print(item.text)
    
    results1=WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.XPATH,"//span[@class='luna-pos'][contains(.,'verb')]/following::div[1]//div[@class='expandable-content']//div")))
    for item in results1:
        print(item.get_attribute("textContent"))
    
    

    Output on console:

    to engage or excite the attention or curiosity of:
    Mystery stories interested him greatly.
    to concern (a person, nation, etc.) in something; involve:
    The fight for peace interests all nations.
    to cause to take a personal concern or share; induce to participate: to interest a person in an enterprise.
    to cause to be concerned; affect.
    
    

    share|improve this answer

    answered 20 hours ago

    KunduKKunduK 11.4k22 gold badges33 silver badges2424 bronze badges

    • Why should the `waiting' happen? – Schlator 20 hours ago

    • 1

      Browser is taking time to load properly Not the code.The explicit wait is waiting until selenium find the required webelement. – KunduK 20 hours ago

    add a comment |



最新内容

  • I have a div "box" which fades gradually using ".fp-viewing" as an anchor to start the transition effect when a user scrolls to the next page. The thing is the page starts scrolling when .fp-viewing is triggered and scrolls box out of view before the finish of the animation.

    How can I delay the start of the scrolling when .fp-viewing is triggered till box has done its animation in 4s?

    .box{ transition: all 4s ease-out; -webkit-transition: all 4s ease-out; } .fp-viewing-2 .box{ opacity: 0; }

    You can play with the option fullpage.js provides to cancel a movement before it takes place.

    Reproduction online

    var delay = 2000; //milliseconds var timeoutId; var animationIsFinished = false; new fullpage('#fullpage', { sectionsColor: ['yellow', 'orange', '#C0C0C0', '#ADD8E6'], onLeave: function(origin, destination, direction){ var curTime = new Date().getTime(); //animating my element $('#element').addClass('animate'); clearTimeout(timeoutId); timeoutId = setTimeout(function(){ animationIsFinished = true; fullpage_api.moveTo(destination.index + 1); }, delay); return animationIsFinished; }, }); <pre class="snippet-code-css lang-css prettyprint-override">``` #fullpage { transition-delay: 1s !important; } or modify function addAnimation in fullpage.js 来源:`https://stackoverflow.com/questions/36176677/fullpage-js-adding-a-scroll-delay`

    read more
  • So why is Copy constructor not being invoked in "const Integer operator+(const Integer &rv)" function. Is it because of RVO. If Yes what do I need to do to prevent it?

    #include <iostream> using namespace std; class Integer { int i; public: Integer(int ii = 0) : i(ii) { cout << "Integer()" << endl; } Integer(const Integer &I) { cout << "Integer(const Integer &)" << endl; } ~Integer() { cout << "~Integer()" << endl; } const Integer operator+(const Integer &rv) const { cout << "operator+" << endl; Integer I(i + rv.i); I.print(); return I; } Integer &operator+=(const Integer &rv) { cout << "operator+=" << endl; i + rv.i; return *this; } void print() { cout << "i: " << i << endl; } }; int main() { cout << "built-in tpes:" << endl; int i = 1, j = 2, k = 3; k += i + j; cout << "user-defined types:" << endl; Integer ii(1), jj(2), kk(3); kk += ii + jj; }

    I do get an error If I'll comment out copy constructor. I'm expecting copy constructor to be called when operator+ returns. Following is the output of the program

    built-in tpes: user-defined types: Integer() Integer() Integer() operator+ Integer() i: 3 // EXPECTING Copy Constructor to be called after this operator+= ~Integer() ~Integer() ~Integer() ~Integer()

    Is it because of RVO. If Yes what do I need to do to prevent it?

    Yes. But it didn't get called because of Return Value Optimization by the compiler.

    If you're using GCC, then use -fno-elide-constructors option to avoid it.

    GCC 4.6.1 manual says,

    -fno-elide-constructors

    The C++ standard allows an implementation to omit creating a temporary which is only used to initialize another object of the same type. Specifying this option disables that optimization, and forces G++ to call the copy constructor in all cases.

    (N)RVO is one of the easiest to implement optimizations. In most calling conventions for return by value the caller reserves the space for the returned object and then passes a hidden pointer to the function. The function then constructs the object in the address that is given. That is, kk += ii + jj; is translated into something like:

    Integer __tmp; // __rtn this arg Integer::operator+( &tmp, &ii, jj ); kk += __tmp;

    The function (in this case Integer::operator+ takes a first hidden argument __rtn that is a pointer to an uninitialized block of memory of sizeof(Integer) bytes, where the object is to be constructed, a second hidden argument this, and then the argument to the function in the code.

    Then the implementation of the function is translated into:

    Integer::operator+( Integer* __rtn, Integer const * this, const Integer &rv) { cout << "operator+" << endl; new (__rtn) Integer(i + rv.i); __rtn->print(); }

    Because the calling convention passes the pointer, there function does not need to reserve extra space for a local integer that would then be copied, as it can just build the I in your code straight into the received pointer, and avoid the copy.

    Note that not in all circumstances the compiler can perform NRVO, in particular, if you have two local objects in the function and you return either one depending on a condition that is not inferable from the code (say the value of an argument to the function). While you could do that to avoid RVO, the fact is that it will make your code more complex, less efficient and harder to maintain.

    来源:https://stackoverflow.com/questions/7779827/why-is-copy-constructor-not-being-called-in-this-code

    read more
  • This question relates to a machine learning feature selection procedure.

    I have a large matrix of features - columns are the features of the subjects (rows):

    set.seed(1) features.mat <- matrix(rnorm(10*100),ncol=100) colnames(features.mat) <- paste("F",1:100,sep="") rownames(features.mat) <- paste("S",1:10,sep="")

    The response was measured for each subject (S) under different conditions (C) and therefore looks like this:

    response.df <- data.frame(S = c(sapply(1:10, function(x) rep(paste("S", x, sep = ""),100))), C = rep(paste("C", 1:100, sep = ""), 10), response = rnorm(1000), stringsAsFactors = F)

    So I match the subjects in response.df:

    match.idx <- match(response.df$S, rownames(features.mat))

    I'm looking for a fast way to compute the univariate regression of each feature and the response.

    Anything faster than this?:

    fun <- function(f){ fit <- lm(response.df$response ~ features.mat[match.idx,f]) beta <- coef(summary(fit)) data.frame(feature = colnames(features.mat)[f], effect = beta[2,1], p.val = beta[2,4], stringsAsFactors = F)) } res <- do.call(rbind, lapply(1:ncol(features.mat), fun))

    I am interested in marginal boost, i.e., methods other than using parallel computing via mclapply or mclapply2.

    I would provide a light-weighed toy routine for estimation of a simple regression model: y ~ x, i.e., a regression line with only an intercept and slope. As will be seen, this is 36 times faster than lm + summary.lm.

    ## toy data set.seed(0) x <- runif(50) y <- 0.3 * x + 0.1 + rnorm(50, sd = 0.05) ## fast estimation of simple linear regression: y ~ x simplelm <- function (x, y) { ## number of data n <- length(x) ## centring y0 <- sum(y) / length(y); yc <- y - y0 x0 <- sum(x) / length(x); xc <- x - x0 ## fitting an intercept-free model: yc ~ xc + 0 xty <- c(crossprod(xc, yc)) xtx <- c(crossprod(xc)) slope <- xty / xtx rc <- yc - xc * slope ## Pearson estimate of residual standard error sigma2 <- c(crossprod(rc)) / (n - 2) ## standard error for slope slope_se <- sqrt(sigma2 / xtx) ## t-score and p-value for slope tscore <- slope / slope_se pvalue <- 2 * pt(abs(tscore), n - 2, lower.tail = FALSE) ## return estimation summary for slope c("Estimate" = slope, "Std. Error" = slope_se, "t value" = tscore, "Pr(>|t|)" = pvalue) }

    Let's have a test:

    simplelm(x, y) # Estimate Std. Error t value Pr(>|t|) #2.656737e-01 2.279663e-02 1.165408e+01 1.337380e-15

    On the other hand, lm + summary.lm gives:

    coef(summary(lm(y ~ x))) # Estimate Std. Error t value Pr(>|t|) #(Intercept) 0.1154549 0.01373051 8.408633 5.350248e-11 #x 0.2656737 0.02279663 11.654079 1.337380e-15

    So the result matches. If you require R-squared and adjusted R-squared, it can be easily computed, too.

    Let's have a benchmark:

    set.seed(0) x <- runif(10000) y <- 0.3 * x + 0.1 + rnorm(10000, sd = 0.05) library(microbenchmark) microbenchmark(coef(summary(lm(y ~ x))), simplelm(x, y)) #Unit: microseconds # expr min lq mean median uq # coef(summary(lm(y ~ x))) 14158.28 14305.28 17545.1544 14444.34 17089.00 # simplelm(x, y) 235.08 265.72 485.4076 288.20 319.46 # max neval cld # 114662.2 100 b # 3409.6 100 a

    Holy!!! We have 36 times boost!

    Remark-1 (solving normal equation)

    The simplelm is based on solving normal equation via Cholesky factorization. But since it is simple, no actual matrix computation is involved. If we need regression with multiple covariates, we can use the lm.chol defined in my this answer.

    Normal equation can also be solved by using LU factorization. I will not touch on this, but if you feel interested, here is it: Solving normal equation gives different coefficients from using lm?.

    Remark-2 (alternative via cor.test)

    The simplelm is an extension to the fastsim in my answer Monte Carlo simulation of correlation between two Brownian motion (continuous random walk). An alternative way is based on cor.test. It is also much faster than lm + summary.lm, but as shown in that answer, it is yet slower than my proposal above.

    Remark-3 (alternative via QR method)

    QR based method is also possible, in which case we want to use .lm.fit, a light-weighed wrapper for qr.default, qr.coef, qr.fitted and qr.resid at C-level. Here is how we can add this option to our simplelm:

    ## fast estimation of simple linear regression: y ~ x simplelm <- function (x, y, QR = FALSE) { ## number of data n <- length(x) ## centring y0 <- sum(y) / length(y); yc <- y - y0 x0 <- sum(x) / length(x); xc <- x - x0 ## fitting intercept free model: yc ~ xc + 0 if (QR) { fit <- .lm.fit(matrix(xc), yc) slope <- fit$coefficients rc <- fit$residuals } else { xty <- c(crossprod(xc, yc)) xtx <- c(crossprod(xc)) slope <- xty / xtx rc <- yc - xc * slope } ## Pearson estimate of residual standard error sigma2 <- c(crossprod(rc)) / (n - 2) ## standard error for slope if (QR) { slope_se <- sqrt(sigma2) / abs(fit$qr[1]) } else { slope_se <- sqrt(sigma2 / xtx) } ## t-score and p-value for slope tscore <- slope / slope_se pvalue <- 2 * pt(abs(tscore), n - 2, lower.tail = FALSE) ## return estimation summary for slope c("Estimate" = slope, "Std. Error" = slope_se, "t value" = tscore, "Pr(>|t|)" = pvalue) }

    For our toy data, both QR method and Cholesky method give the same result:

    set.seed(0) x <- runif(50) y <- 0.3 * x + 0.1 + rnorm(50, sd = 0.05) simplelm(x, y, TRUE) # Estimate Std. Error t value Pr(>|t|) #2.656737e-01 2.279663e-02 1.165408e+01 1.337380e-15 simplelm(x, y, FALSE) # Estimate Std. Error t value Pr(>|t|) #2.656737e-01 2.279663e-02 1.165408e+01 1.337380e-15

    QR methods is known to be 2 ~ 3 times slower than Cholesky method (Read my answer Why the built-in lm function is so slow in R? for detailed explanation). Here is a quick check:

    set.seed(0) x <- runif(10000) y <- 0.3 * x + 0.1 + rnorm(10000, sd = 0.05) library(microbenchmark) microbenchmark(simplelm(x, y, TRUE), simplelm(x, y)) #Unit: microseconds # expr min lq mean median uq max neval cld # simplelm(x, y, TRUE) 776.88 873.26 1073.1944 908.72 933.82 3420.92 100 b # simplelm(x, y) 238.32 292.02 441.9292 310.44 319.32 3515.08 100 a

    So indeed, 908 / 310 = 2.93.

    Remark-4 (simple regression for GLM)

    If we move on to GLM, there is also a fast, light-weighed version based on glm.fit. You can read my answer R loop help: leave out one observation and run glm one variable at a time and use function f defined there. At the moment f is customized to logistic regression, but we can generalize it to other response easily.

    来源:https://stackoverflow.com/questions/40141738/is-there-a-fast-estimation-of-simple-regression-a-regression-line-with-only-int

    read more

最新主题

309
Online

338
Users

226.0k
Topics

379.3k
Posts

在线用户

推荐阅读

本站部分内容来自互联网,其发布内容言论不代表本站观点,如果其链接、内容的侵犯您的权益,烦请联系我们,我们将及时予以处理。
Powered by NodeBB | 备案号:宁ICP备15000671号