How do I download a file over HTTP using Python?
I have a small utility that I use to download an MP3 file from a website on a schedule and then builds/updates a podcast XML file which I've added to iTunes.
The text processing that creates/updates the XML file is written in Python. However, I use wget inside a Windows
.bat file to download the actual MP3 file. I would prefer to have the entire utility written in Python.
I struggled to find a way to actually download the file in Python, thus why I resorted to using
So, how do I download the file using Python?
import urllib.request with urllib.request.urlopen('http://www.example.com/') as f: html = f.read().decode('utf-8')
This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers.
On Python 2, the method is in
import urllib2 response = urllib2.urlopen('http://www.example.com/') html = response.read()
One more, using
import urllib urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
(for Python 3+ use
import urllib.request and
Yet another one, with a "progressbar"
import urllib2 url = "http://download.thinkbroadband.com/10MB.zip" file_name = url.split('/')[-1] u = urllib2.urlopen(url) f = open(file_name, 'wb') meta = u.info() file_size = int(meta.getheaders("Content-Length")) print "Downloading: %s Bytes: %s" % (file_name, file_size) file_size_dl = 0 block_sz = 8192 while True: buffer = u.read(block_sz) if not buffer: break file_size_dl += len(buffer) f.write(buffer) status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size) status = status + chr(8)*(len(status)+1) print status, f.close()
Read more... Read less...
In 2012, use the python requests library
>>> import requests >>> >>> url = "http://download.thinkbroadband.com/10MB.zip" >>> r = requests.get(url) >>> print len(r.content) 10485760
You can run
pip install requests to get it.
Requests has many advantages over the alternatives because the API is much simpler. This is especially true if you have to do authentication. urllib and urllib2 are pretty unintuitive and painful in this case.
People have expressed admiration for the progress bar. It's cool, sure. There are several off-the-shelf solutions now, including
from tqdm import tqdm import requests url = "http://download.thinkbroadband.com/10MB.zip" response = requests.get(url, stream=True) with open("10MB", "wb") as handle: for data in tqdm(response.iter_content()): handle.write(data)
This is essentially the implementation @kvance described 30 months ago.
import urllib2 mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3") with open('test.mp3','wb') as output: output.write(mp3file.read())
open('test.mp3','wb') opens a file (and erases any existing file) in binary mode so you can save data with it instead of just text.
import urllib.request response = urllib.request.urlopen('http://www.example.com/') html = response.read()
import urllib.request urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')
Note: According to the documentation,
urllib.request.urlretrieveis a "legacy interface" and "might become deprecated in the future" (thanks gerrit)
import os,requests def download(url): get_response = requests.get(url,stream=True) file_name = url.split("/")[-1] with open(file_name, 'wb') as f: for chunk in get_response.iter_content(chunk_size=1024): if chunk: # filter out keep-alive new chunks f.write(chunk) download("https://example.com/example.jpg")