Fetching Data from the Web¶
workflow.web provides a simple API for retrieving data from the Web
modelled on the excellent requests library.
The purpose of workflow.web is to cover trivial cases at just 0.5% of
the size of requests.
Features¶
- JSON requests and responses
- Form data submission
- File uploads
- Redirection support
The main API consists of the get() and post() functions and
the Response instances they return.
Warning
As workflow.web is based on Python 2’s standard HTTP libraries, it
does not verify SSL certificates when establishing HTTPS
connections.
As a result, you must not use this module for sensitive connections.
If you require certificate verification for HTTPS connections (which you
really should), you should use the excellent requests library
(upon which the workflow.web API is based) or the command-line tool
cURL, which is installed by default on OS X, instead.
Examples¶
There are some examples of using workflow.web in other parts of the
documentation:
API¶
get() and post() are wrappers around request(). They all
return Response objects.
-
workflow.web.get(url, params=None, headers=None, cookies=None, auth=None, timeout=60, allow_redirects=True)¶ Initiate a GET request. Arguments as for
request().Returns: Responseinstance
-
workflow.web.post(url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=60, allow_redirects=False)¶ Initiate a POST request. Arguments as for
request().Returns: Responseinstance
-
workflow.web.request(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=60, allow_redirects=False)¶ Initiate an HTTP(S) request. Returns
Responseobject.Parameters: - method (
unicode) – ‘GET’ or ‘POST’ - url (
unicode) – URL to open - params (
dict) – mapping of URL parameters - data (
dictorstr) – mapping of form data{'field_name': 'value'}orstr - headers (
dict) – HTTP headers - cookies (
dict) – cookies to send to server - files (
dict) – files to upload (see below). - auth (
tuple) – username, password - timeout (
int) – connection timeout limit in seconds - allow_redirects (
Boolean) – follow redirections
Returns: ResponseobjectThe
filesargument is a dictionary:{'fieldname' : { 'filename': 'blah.txt', 'content': '<binary data>', 'mimetype': 'text/plain'} }
fieldnameis the name of the field in the HTML form.mimetypeis optional. If not provided,mimetypeswill be used to guess the mimetype, orapplication/octet-streamwill be used.
- method (
The Response object¶
-
class
workflow.web.Response(request)¶ Returned by
request()/get()/post()functions.A simplified version of the
Responseobject in therequestslibrary.>>> r = request('GET', 'https://github.com/') >>> r.status_code 200 >>> r.encoding 'utf-8' >>> r.content # bytes '<!DOCTYPE ...' >>> r.text # unicode, decoded according to charset in HTTP header/meta tag u'<!DOCTYPE ...'
-
is_stream¶ Trueif this response has been accessed as a stream
-
iter_content(chunk_size=4096, decode_unicode=False)¶ Iterate over response data.
New in version 1.6.
Parameters: - chunk_size (
int) – Number of bytes to read into memory - decode_unicode (
Boolean) – Decode to Unicode using detected encoding
Returns: iterator
- chunk_size (
-
raise_for_status()¶ Raise stored error if one occurred.
error will be instance of
urllib2.HTTPError
-
save_to_path(filepath)¶ Save retrieved data to file at
filepathParameters: filepath – Path to save retrieved data.
-