Download file from web in Python 3
I am creating a program that will download a .jar (java) file from a web server, by reading the URL that is specified in the .jad file of the same game/application. I'm using Python 3.2.1
I've managed to extract the URL of the JAR file from the JAD file (every JAD file contains the URL to the JAR file), but as you may imagine, the extracted value is type() string.
Here's the relevant function:
def downloadFile(URL=None): import httplib2 h = httplib2.Http(".cache") resp, content = h.request(URL, "GET") return content downloadFile(URL_from_file)
However I always get an error saying that the type in the function above has to be bytes, and not string. I've tried using the URL.encode('utf-8'), and also bytes(URL,encoding='utf-8'), but I'd always get the same or similar error.
So basically my question is how to download a file from a server when the URL is stored in a string type?
If you want to obtain the contents of a web page into a variable, just
read the response of
import urllib.request ... url = 'http://example.com/' response = urllib.request.urlopen(url) data = response.read() # a `bytes` object text = data.decode('utf-8') # a `str`; this step can't be used if data is binary
The easiest way to download and save a file is to use the
import urllib.request ... # Download the file from `url` and save it locally under `file_name`: urllib.request.urlretrieve(url, file_name)
import urllib.request ... # Download the file from `url`, save it in a temporary directory and get the # path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable: file_name, headers = urllib.request.urlretrieve(url)
But keep in mind that
urlretrieve is considered legacy and might become deprecated (not sure why, though).
So the most correct way to do this would be to use the
urllib.request.urlopen function to return a file-like object that represents an HTTP response and copy it to a real file using
import urllib.request import shutil ... # Download the file from `url` and save it locally under `file_name`: with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: shutil.copyfileobj(response, out_file)
If this seems too complicated, you may want to go simpler and store the whole download in a
bytes object and then write it to a file. But this works well only for small files.
import urllib.request ... # Download the file from `url` and save it locally under `file_name`: with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: data = response.read() # a `bytes` object out_file.write(data)
It is possible to extract
.gz (and maybe other formats) compressed data on the fly, but such an operation probably requires the HTTP server to support random access to the file.
import urllib.request import gzip ... # Read the first 64 bytes of the file inside the .gz archive located at `url` url = 'http://example.com/something.gz' with urllib.request.urlopen(url) as response: with gzip.GzipFile(fileobj=response) as uncompressed: file_header = uncompressed.read(64) # a `bytes` object # Or do anything shown above using `uncompressed` instead of `response`.
Read more... Read less...
requests package whenever I want something related to HTTP requests because its API is very easy to start with:
$ pip install requests
then the code:
from requests import get # to make GET request def download(url, file_name): # open in binary mode with open(file_name, "wb") as file: # get request response = get(url) # write to file file.write(response.content)
I hope I understood the question right, which is: how to download a file from a server when the URL is stored in a string type?
I download files and save it locally using the below code:
import requests url = 'https://www.python.org/static/img/python-logo.png' fileName = 'D:\Python\dwnldPythonLogo.png' req = requests.get(url) file = open(fileName, 'wb') for chunk in req.iter_content(100000): file.write(chunk) file.close()
You can use wget which is popular downloading shell tool for that. https://pypi.python.org/pypi/wget This will be the simplest method since it does not need to open up the destination file. Here is an example.
import wget url = 'https://i1.wp.com/python3.codes/wp-content/uploads/2015/06/Python3-powered.png?fit=650%2C350' wget.download(url, '/Users/scott/Downloads/cat4.jpg')
Here we can use urllib's Legacy interface in Python3:
The following functions and classes are ported from the Python 2 module urllib (as opposed to urllib2). They might become deprecated at some point in the future.
Example (2 lines code):
import urllib.request url = 'https://www.python.org/static/img/python-logo.png' urllib.request.urlretrieve(url, "logo.png")
Yes, definietly requests is great package to use in something related to HTTP requests. but we need to be careful with the encoding type of the incoming data as well below is an example which explains the difference
from requests import get # case when the response is byte array url = 'some_image_url' response = get(url) with open('output', 'wb') as file: file.write(response.content) # case when the response is text # Here unlikely if the reponse content is of type **iso-8859-1** we will have to override the response encoding url = 'some_page_url' response = get(url) # override encoding by real educated guess as provided by chardet r.encoding = r.apparent_encoding with open('output', 'w', encoding='utf-8') as file: file.write(response.content)