wget

Web crawling and robots.txt

送分小仙女□ 提交于 2019-12-25 12:26:12
问题 I used wget to 'download' a site. wget -r http://www.xyz.com i) It returns a .css file, a .js file, and index.php and an image img1.jpg ii) However, there exist more images under xyz.com . I typed www.xyz.com/Img2.jpg and hence got an image. iii) But index.php refers to a single image, i.e. img1.jpg . iv) A robot file accompanies it that contains Disallow: What change should be made in the command line to return everything under xyz.com , that are not referenced in index.php , but are static

How do I transfer wget output to a file or DB?

北战南征 提交于 2019-12-25 11:26:56
问题 I'm trying to use a small script to download a field from multiple pages. For one thing, I'm only able to get it from one page..., but the real problem I'm having is that I don't know how to hand the output off to a database table? How can I take the output from curl/lynx|grep (which is going to be all the list items) and move it, list item by list item, to a table in my DB or to a CSV where it will be ready for import to the DB? #!/bin/bash lynx --source "http://www.thewebsite.com"|cut -d\"

How do I transfer wget output to a file or DB?

廉价感情. 提交于 2019-12-25 11:26:56
问题 I'm trying to use a small script to download a field from multiple pages. For one thing, I'm only able to get it from one page..., but the real problem I'm having is that I don't know how to hand the output off to a database table? How can I take the output from curl/lynx|grep (which is going to be all the list items) and move it, list item by list item, to a table in my DB or to a CSV where it will be ready for import to the DB? #!/bin/bash lynx --source "http://www.thewebsite.com"|cut -d\"

wget not working properly. [closed]

痴心易碎 提交于 2019-12-25 08:59:20
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . I have some doubt on wget command. Here is the thing I want to achieve. I want to download a tar package from this link "http://snapshots.linaro.org/oneiric/lt-origen-oneiric/20120321/0/images/hwpack/hwpack_linaro-lt-origen_20120321-0_armel_supported.tar.gz" .. This link is working fine when I am using it in

Download file from Google Drive via HTTP GET request

好久不见. 提交于 2019-12-25 08:57:21
问题 Please help me. How I can get a file from Google Drive via HTTP GET request (using wget/curl/postman/etc)? What I need? File ID, access token and what? How correctly write this HTTP request? Thanks! 回答1: Thank you, @pinoyyid! Instructions are at developers.google.com/drive/v3/reference/files/get You can use the form on the right to try it, and use the Chrome Dev Console to capture the http request to a curl command 来源: https://stackoverflow.com/questions/46510380/download-file-from-google

Curl and wget return error 500 for helloworld.php on new install but browser is fine

ぐ巨炮叔叔 提交于 2019-12-25 08:17:16
问题 I have no .htaccess file. I have index.php which has the following content and works beautifully in a browser like Chrome, or Safari: <?php print "hello world"; ?> When I load it in a browser I get: hello world. When I try any of the following I get ERROR 500: Internal Server Error. /usr/bin/wget http://example.com/index.php /usr/bin/wget -nv -t 5 --connect-timeout=4 -w 4 --connect-timeout=20 -nd --no-cache --no-cookies http://example.com/index.pp /usr/bin/wget --content-on-error http:/

Is there any reason for a successful yum update to cause a subsequent wget to fail?

好久不见. 提交于 2019-12-25 06:05:29
问题 I'm working on a bash setup script for CentOS 6.4 machines. On a brand new install I'm running into an issue that seems to be reproducible, but the scenario is unusual. The setup script is run with root. The first step is to run yum update with no options: yum update This completes successfully with a zero exit code. The next step is to retrieve the EPEL rpm using wget: wget http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm However, this is consistently failing when

Is there any reason for a successful yum update to cause a subsequent wget to fail?

廉价感情. 提交于 2019-12-25 06:03:30
问题 I'm working on a bash setup script for CentOS 6.4 machines. On a brand new install I'm running into an issue that seems to be reproducible, but the scenario is unusual. The setup script is run with root. The first step is to run yum update with no options: yum update This completes successfully with a zero exit code. The next step is to retrieve the EPEL rpm using wget: wget http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm However, this is consistently failing when

curl not available by default, so what can my script use instead to GET/POST/PUT?

ぐ巨炮叔叔 提交于 2019-12-25 03:44:15
问题 I am writing a Nautilus script that currently uses curl to GET/POST/PUT to a REST service. Installing my script should be as simple as dropping a single file to ~/.gnome2/nautilus-scripts My concern is that many computers do not have curl installed. I need to use a tool that is available nearly everywhere. What would be a more widespread (should be installed by default on most distros) yet usable alternative to curl ? wget does not allow PUT. Maybe telnet ? 来源: https://stackoverflow.com

wget to download image, link get truncated

妖精的绣舞 提交于 2019-12-25 02:22:34
问题 I have an IP camera and the link below takes a snap shot and show you the picture in your broswer: http://192.168.5.10:81/snapshot.cgi?user=admin&pwd=888888 I am trying to write a script, using wget to download a snap shot to my local repeatably after a certain period. However, when i use wget -m -p -k http://192.168.5.10:81/snapshot.cgi?user=admin&pwd=888888 I get the following respond: => `192.168.5.10:81/snapshot.cgi?user=admin' Connecting to 192.168.5.10:81... connected. HTTP request sent