Why can't I play the MIDI files I have downloaded programmatically, but I can play them when I download them manually?

試著忘記壹切 提交于 2019-12-11 04:27:55

问题


I want to download the MIDI files from this website for a project. I have written the following code to download the files:

from bs4 import BeautifulSoup
import requests
import re, os
import urllib.request
import string

base_url = "http://www.midiworld.com/files/"

base_path = 'path/where/I/will/save/the/downloaded/MIDI/files'
os.chdir(base_path + '/MIDI Files')

for i in range(1,2386):
    page = requests.get(base_url + str(i))
    soup = BeautifulSoup(page.text, "html.parser")

    li_box = soup.select("div ul li a")
    urllib.request.urlretrieve(base_url+str(i), str(i)+'.mid')

This is downloading the files, but when I click on them to play, they don't play; I get this error:

But if I download the files manually (I checked for a couple of them), I can play the files. In case its relevant, those files also have different names, not numbers like how I am saving them. Could it be the cause for this? The files are not empty too, as can be seen from this screenshot below:

EDIT: When I tried to load a programmatically downloaded MIDI file to compare it to its corresponding manually downloaded MIDI file in this website, I got this error:

Failed to load data=error

But no such error when loading the manually downloaded one.

EDIT 2: These are the first 50 bytes of the hex dump:

For the programmatically downloaded file:

file name: 1.mid
mime type: 

0000-0010:  3c 21 44 4f-43 54 59 50-45 20 68 74-6d 6c 20 50  <!DOCTYP E.html.P
0000-0020:  55 42 4c 49-43 20 22 2d-2f 2f 57 33-43 2f 2f 44  UBLIC."- //W3C//D
0000-0030:  54 44 20 58-48 54 4d 4c-20 31 2e 30-20 53 74 72  TD.XHTML .1.0.Str
0000-0032:  69 63

For the corresponding manually downloaded file:

file name: Adson_John_-_Courtly_Masquing_Ayres.mid
mime type: 

0000-0010:  4d 54 68 64-00 00 00 06-00 01 00 0b-00 f0 4d 54  MThd.... ......MT
0000-0020:  72 6b 00 00-00 7b 00 ff-58 04 04 02-18 08 00 ff  rk...{.. X.......
0000-0030:  59 02 00 00-00 ff 51 03-07 a1 20 f0-40 ff 51 03  Y.....Q. ....@.Q.
0000-0032:  09 27

回答1:


Your code works fine, just change base_url to

base_url = "http://www.midiworld.com/download/"

Right now, i.e. "1.mid" contains the HTML for this site: http://www.midiworld.com/files/1 (You can open it with a text editor.)

The MIDI-files can be downloaded the url http://www.midiworld.com/download/{insert number}

I downloaded the first 100 but it seems there are currently 4992 downloadable midi files, so if you want more files, just change

for i in range(1,4992):

As a side-note, the site gives you download "_-_.mid" which is 0 bytes, if the requested .mid doesn't exist. So, if you are going to repeat downloading the files and you want all the files they have, consider setting range to for example 100 000 and break the loop if downloaded file-size is 0 bytes.

for i in range(1,100000):
    if (urllib.request.urlopen(base_url+str(i)).length == 0):
        break


来源:https://stackoverflow.com/questions/50636524/why-cant-i-play-the-midi-files-i-have-downloaded-programmatically-but-i-can-pl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!