Fetching Data from the Web

workflow.web provides a simple API for retrieving data from the Web modelled on the excellent requests library.

The purpose of workflow.web is to cover trivial cases at just 0.5% of the size of requests.

Features

  • JSON requests and responses
  • Form data submission
  • File uploads
  • Redirection support

The main API consists of the get() and post() functions and the Response instances they return.

Warning

As workflow.web is based on Python 2’s standard HTTP libraries, it does not verify SSL certificates when establishing HTTPS connections.

As a result, you must not use this module for sensitive connections.

If you require certificate verification for HTTPS connections (which you really should), you should use the excellent requests library (upon which the workflow.web API is based) or the command-line tool cURL, which is installed by default on OS X, instead.

Examples

There are some examples of using workflow.web in other parts of the documentation:

API

get() and post() are wrappers around request(). They all return Response objects.

workflow.web.get(url, params=None, headers=None, cookies=None, auth=None, timeout=60, allow_redirects=True)

Initiate a GET request. Arguments as for request().

Returns:Response instance
workflow.web.post(url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=60, allow_redirects=False)

Initiate a POST request. Arguments as for request().

Returns:Response instance
workflow.web.request(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=60, allow_redirects=False)

Initiate an HTTP(S) request. Returns Response object.

Parameters:
  • method (unicode) – ‘GET’ or ‘POST’
  • url (unicode) – URL to open
  • params (dict) – mapping of URL parameters
  • data (dict or str) – mapping of form data {'field_name': 'value'} or str
  • headers (dict) – HTTP headers
  • cookies (dict) – cookies to send to server
  • files (dict) – files to upload (see below).
  • auth (tuple) – username, password
  • timeout (int) – connection timeout limit in seconds
  • allow_redirects (Boolean) – follow redirections
Returns:

Response object

The files argument is a dictionary:

{'fieldname' : { 'filename': 'blah.txt',
                 'content': '<binary data>',
                 'mimetype': 'text/plain'}
}
  • fieldname is the name of the field in the HTML form.
  • mimetype is optional. If not provided, mimetypes will be used to guess the mimetype, or application/octet-stream will be used.

The Response object

class workflow.web.Response(request)

Returned by request() / get() / post() functions.

A simplified version of the Response object in the requests library.

>>> r = request('http://www.google.com')
>>> r.status_code
200
>>> r.encoding
ISO-8859-1
>>> r.content  # bytes
<html> ...
>>> r.text  # unicode, decoded according to charset in HTTP header/meta tag
u'<html> ...'
>>> r.json()  # content parsed as JSON
content

Raw content of response (i.e. bytes)

Returns:Body of HTTP response
Return type:str
encoding

Text encoding of document or None

Returns:str or None
iter_content(chunk_size=4096, decode_unicode=False)

Iterate over response data.

New in version 1.6.

Parameters:
  • chunk_size (int) – Number of bytes to read into memory
  • decode_unicode (Boolean) – Decode to Unicode using detected encoding
Returns:

iterator

json()

Decode response contents as JSON.

Returns:object decoded from JSON
Return type:list / dict
raise_for_status()

Raise stored error if one occurred.

error will be instance of urllib2.HTTPError

save_to_path(filepath)

Save retrieved data to file at filepath

Parameters:filepath – Path to save retrieved data.
text

Unicode-decoded content of response body.

If no encoding can be determined from HTTP headers or the content itself, the encoded response body will be returned instead.

Returns:Body of HTTP response
Return type:unicode or str