wget

How to download links in the html of a url? [closed]

杀马特。学长 韩版系。学妹 提交于 2020-01-03 05:38:11
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . For example, when I open https://stackoverflow.com/ in browser, the browser will download not only the main page, but also images, js, css. But when I do curl https://stackoverflow.com/ , only the main page html is downloaded. Is there any options of curl or wget that can download images/js/css also? Or any

Unescape the ampersand (&) via XMLStarlet - Bugging &

橙三吉。 提交于 2020-01-02 13:45:05
问题 This a quite annoying but rather a much simpler task. According to this guide, I wrote this: #!/bin/bash content=$(wget "https://example.com/" -O -) ampersand=$(echo '\&') xmllint --html --xpath '//*[@id="table"]/tbody' - <<<"$content" 2>/dev/null | xmlstarlet sel -t \ -m "/tbody/tr/td" \ -o "https://example.com" \ -v "a//@href" \ -o "/?A=1" \ -o "$ampersand" \ -o "B=2" -n \ I successfully extract each link from the table and everything gets concatenated correctly, however, instead of

Is there a way to download partial part of a webpage, rather than the whole HTML body, programmatically?

半腔热情 提交于 2020-01-02 08:24:24
问题 We only want a particular element from the HTML document at nytimes.com/technology. This page contains many articles, but we only want the article's title, which is in a . If we use wget, cURL, or any other tools or some package like requests in Python , whole HTML document is returned. Can we limite the returned data to specific element, such as the 's? 回答1: The HTTP protocol knows nothing about HTML or DOM. Using HTTP you can fetch partial documents from supporting web servers using the

Downloading a tarball from github without curl

戏子无情 提交于 2020-01-02 03:30:31
问题 I have an embedded system where I cannot install anything and the only tool I could potentially use to fetch something is wget. It turns out you cannot do the same things with wget that you can with curl. I cannot cross-compile for this system either, so I need to resort to Python or shell scripts. There pure-Python implementation of git called Dulwich actually has some C code that I'd need to cross-compile... So I even resorted looking into that, FYI. What I need is get code from github

how to wait wget finished to get more resources

落爺英雄遲暮 提交于 2020-01-01 17:08:20
问题 I am new to bash. I want to wget some resources in parallel. What is the problem in the following code: for item in $list do if [ $i -le 10 ];then wget -b $item let "i++" else wait i=1 fi When I execute this shell. Error throwed: fork: Resource temporarily unavailable My question is how to use wget right way. Edit: My problem is there is about four thousands of url to download, if I let all these jobs work in parallel, fork: Resource temporarily unavailable will throw out. I don't know how to

如何用wget指定位置?

丶灬走出姿态 提交于 2020-01-01 14:33:01
我需要将文件下载到/ tmp / cron_test /。 我的 wget 代码是 wget --random-wait -r -p -nd -e robots=off -A".pdf" -U mozilla http://math.stanford.edu/undergrad/ 那么是否有一些参数来指定目录? #1楼 从手册页: -P prefix --directory-prefix=prefix Set directory prefix to prefix. The directory prefix is the directory where all other files and sub-directories will be saved to, i.e. the top of the retrieval tree. The default is . (the current directory). 因此,您需要在命令中添加 -P /tmp/cron_test/ (简短格式)或 --directory-prefix=/tmp/cron_test/ (长格式)。 另请注意,如果目录不存在,则会创建该目录。 #2楼 试试这个方法 - import os path = raw_input("enter the url:") fold = raw_input("enter the

Using wget via Ruby on Rails

强颜欢笑 提交于 2020-01-01 05:14:05
问题 I want to build a simple website that can download a webpage www.example.com/index.html and store its snapshot on the server when the client requests. I'm thinking about using the command wget to download the webpage. Would Ruby on Rails be able to handle this task? 回答1: Yes. You can perform shell commands in Ruby via back ticks, exec and system. Note that each one returns something slightly different: back ticks wget http://www.yahoo.com exec : exec('wget http://www.yahoo.com') system :

wget reject still downloads file

瘦欲@ 提交于 2020-01-01 04:07:09
问题 I only want the folder structure, but I couldn't figure out how with wget. Instead I am using this: wget -R pdf,css,gif,txt,png -np -r http://example.com Which should reject all the files after -R, but it seems to me wget still downloads the file, then deletes it. Is there a better way to just get the folder structure? TTP request sent, awaiting response... 200 OK Length: 136796 (134K) [application/x-download] Saving to: “example.com/file.pdf” 100%[=====================================>] 136

Script to Mobile-Friendly test

喜夏-厌秋 提交于 2019-12-31 17:25:14
问题 I wanted to write a shell/python script which will check if a website is mobile friendly or not. Using browser this can be easily done by visiting- https://www.google.com/webmasters/tools/mobile-friendly/?url=<website_addr> For eg.- https://www.google.com/webmasters/tools/mobile-friendly/?url=http://facebook.com I tried fetching the content through curl, wget , lynx commands but it did not worked. How can I do so? 回答1: Sanchit, I suggest you look at the requests library for retrieving the url

Script to Mobile-Friendly test

旧街凉风 提交于 2019-12-31 17:24:34
问题 I wanted to write a shell/python script which will check if a website is mobile friendly or not. Using browser this can be easily done by visiting- https://www.google.com/webmasters/tools/mobile-friendly/?url=<website_addr> For eg.- https://www.google.com/webmasters/tools/mobile-friendly/?url=http://facebook.com I tried fetching the content through curl, wget , lynx commands but it did not worked. How can I do so? 回答1: Sanchit, I suggest you look at the requests library for retrieving the url