Fetching Data from the Web¶
workflow.web
provides a simple API for retrieving data from the Web
modelled on the excellent requests library.
The purpose of workflow.web
is to cover trivial cases at just 0.5% of
the size of requests.
Features¶
- JSON requests and responses
- Form data submission
- File uploads
- Redirection support
The main API consists of the get()
and post()
functions and
the Response
instances they return.
Warning
As workflow.web
is based on Python 2’s standard HTTP libraries, it
does not verify SSL certificates when establishing HTTPS
connections.
As a result, you must not use this module for sensitive connections.
If you require certificate verification for HTTPS connections (which you
really should), you should use the excellent requests library
(upon which the workflow.web
API is based) or the command-line tool
cURL, which is installed by default on OS X, instead.
Examples¶
There are some examples of using workflow.web
in other parts of the
documentation:
API¶
get()
and post()
are wrappers around request()
. They all
return Response
objects.
-
workflow.web.
get
(url, params=None, headers=None, cookies=None, auth=None, timeout=60, allow_redirects=True)¶ Initiate a GET request. Arguments as for
request()
.Returns: Response
instance
-
workflow.web.
post
(url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=60, allow_redirects=False)¶ Initiate a POST request. Arguments as for
request()
.Returns: Response
instance
-
workflow.web.
request
(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=60, allow_redirects=False)¶ Initiate an HTTP(S) request. Returns
Response
object.Parameters: - method (
unicode
) – ‘GET’ or ‘POST’ - url (
unicode
) – URL to open - params (
dict
) – mapping of URL parameters - data (
dict
orstr
) – mapping of form data{'field_name': 'value'}
orstr
- headers (
dict
) – HTTP headers - cookies (
dict
) – cookies to send to server - files (
dict
) – files to upload (see below). - auth (
tuple
) – username, password - timeout (
int
) – connection timeout limit in seconds - allow_redirects (
Boolean
) – follow redirections
Returns: Response
objectThe
files
argument is a dictionary:{'fieldname' : { 'filename': 'blah.txt', 'content': '<binary data>', 'mimetype': 'text/plain'} }
fieldname
is the name of the field in the HTML form.mimetype
is optional. If not provided,mimetypes
will be used to guess the mimetype, orapplication/octet-stream
will be used.
- method (
The Response object¶
-
class
workflow.web.
Response
(request)¶ Returned by
request()
/get()
/post()
functions.A simplified version of the
Response
object in therequests
library.>>> r = request('http://www.google.com') >>> r.status_code 200 >>> r.encoding ISO-8859-1 >>> r.content # bytes <html> ... >>> r.text # unicode, decoded according to charset in HTTP header/meta tag u'<html> ...' >>> r.json() # content parsed as JSON
-
iter_content
(chunk_size=4096, decode_unicode=False)¶ Iterate over response data.
New in version 1.6.
Parameters: - chunk_size (
int
) – Number of bytes to read into memory - decode_unicode (
Boolean
) – Decode to Unicode using detected encoding
Returns: iterator
- chunk_size (
-
raise_for_status
()¶ Raise stored error if one occurred.
error will be instance of
urllib2.HTTPError
-
save_to_path
(filepath)¶ Save retrieved data to file at
filepath
Parameters: filepath – Path to save retrieved data.
-