How to wget the more recent file of a directory

让人想犯罪 __ 提交于 2019-12-22 11:04:10

问题


I would like to write a bash script that downloads and install the latest daily build of program (RStudio). Is it possible to make wget to download only the most recent file in the directory http://www.rstudio.org/download/daily/desktop/ ?


回答1:


The files seem to be sorted by the release date, with each new release being a new entry with a new name reflecting the version number change, so checking timestamps of a certain file seems unnecessary.

Also, you have provided a link to a "directory", which essentially is a web page. AFAIK, there is no such thing as a directory in http (which is a communication protocol serving you data at the given address). What you see is a listing generated by the server that resembles windows folders for the ease of use, though it's still a web page.

Having that said, you can scrape that web page. The following code downloads the file at first position on the listing (assuming the first one is the most recent one):

#!/bin/bash

wget -q -O tmp.html http://www.rstudio.org/download/daily/desktop/ubuntu64/
RELEASE_URL=`cat tmp.html | grep -m 1 -o -E "https[^<>]*?amd64.deb" | head -1`
rm tmp.html

# TODO Check if the old package name is the same as in RELEASE_URL.

# If not, then get the new version.
wget -q $RELEASE_URL

Now you can check it against your local most-recent version, and install if necessary.

EDIT: Updated version, which does simple version checking and installs the package.

#!/bin/bash

MY_PATH=`dirname "$0"`
RES_DIR="$MY_PATH/res"

# Piping from stdout suggested by Chirlo.
RELEASE_URL=`wget -q -O - http://www.rstudio.org/download/daily/desktop/ubuntu64/ | grep -m 1 -o "https[^\']*"`

if [ "$RELEASE_URL" == "" ]; then
    echo "Package index not found. Maybe the server is down?"
    exit 1
fi

mkdir -p "$RES_DIR"
NEW_PACKAGE=${RELEASE_URL##https*/}
OLD_PACKAGE=`ls "$RES_DIR"`

if [ "$OLD_PACKAGE" == "" ] || [ "$OLD_PACKAGE" != "$NEW_PACKAGE" ]; then

    cd "$RES_DIR"
    rm -f $OLD_PACKAGE

    echo "New version found. Downloading..."
    wget -q $RELEASE_URL

    if [ ! -e "$NEW_PACKAGE" ]; then
        echo "Package not found."
        exit 1
    fi

    echo "Installing..."
    sudo dpkg -i $NEW_PACKAGE

else
    echo "rstudio up to date."
fi

And a couple of comments:

  • The script keeps a local res/ dir with the latest version (exactly one file) and compares it's name with the newly scraped package name. This is dirty (having a file doesn't mean that it has been successfully installed in the past). It would be better to parse the output of dpkg -l, but the name of the package might slightly differ from the scraped one.
  • You will still need to enter the password for sudo, so it won't be 100% automatic. There are a few ways around this, though without supervision you might encounter the previously stated problem.



回答2:


A slightly cleaner variation of @Richard Pumps:

RELEASE_URL=$(wget -q -O -  http://www.rstudio.org/download/daily/desktop/ubuntu64 | grep -o -m 1 "https[^\']*" )

# check version from name ...


wget ${RELEASE_URL}

this avoids creating a tmp file by outputing the html file to stdout and filtering it.




回答3:


The -N option will tell wget to only get a file if it's a newer version. However, using wget alone, you cannot do something as broad as downloading the newest file of all files in some remote directory. You'll need to write a bash script or something that does the checking and then calls wget to grab it.



来源:https://stackoverflow.com/questions/15040132/how-to-wget-the-more-recent-file-of-a-directory

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!