`tornado.httpclient` — 异步 HTTP 客户端¶

阻塞和非阻塞的 HTTP 客户端接口.

这个模块定义了一个被两种实现方式 simple_httpclient 和 curl_httpclient 共享的通用接口 . 应用程序可以选择直接实例化相对应的实现类, 或使用本模块提供的 AsyncHTTPClient 类, 通过复写 AsyncHTTPClient.configure 方法来选择一种实现 .

默认的实现是 simple_httpclient, 这可以能满足大多数用户的需要 . 然而, 一些应用程序可能会因为以下原因想切换到 curl_httpclient :

curl_httpclient 有一些 simple_httpclient 不具有的功能特性, 包括对 HTTP 代理和使用指定网络接口能力的支持.
curl_httpclient 更有可能与不完全符合 HTTP 规范的网站兼容, 或者与使用很少使用 HTTP 特性的网站兼容.
curl_httpclient 更快.
curl_httpclient 是 Tornado 2.0 之前的默认值.

注意, 如果你正在使用 curl_httpclient, 强力建议你使用最新版本的 libcurl 和 pycurl. 当前 libcurl 能被支持的最小版本是 7.21.1, pycurl 能被支持的最小版本是 7.18.2. 强烈建议你所安装的 libcurl 是和异步 DNS 解析器 (threaded 或 c-ares) 一起构建的, 否则你可能会遇到各种请求超时的问题 (更多信息请查看 http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTCONNECTTIMEOUTMS 和 curl_httpclient.py 里面的注释).

为了选择 curl_httpclient, 只需要在启动的时候调用 AsyncHTTPClient.configure

AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")

HTTP 客户端接口¶

class tornado.httpclient.HTTPClient(async_client_class=None, **kwargs)[源代码]¶

一个阻塞的 HTTP 客户端.

提供这个接口是为了方便使用和测试; 大多数运行于 IOLoop 的应用程序会使用 AsyncHTTPClient 来替代它. 一般的用法就像这样

http_client = httpclient.HTTPClient()
try:
    response = http_client.fetch("http://www.google.com/")
    print response.body
except httpclient.HTTPError as e:
    # HTTPError is raised for non-200 responses; the response
    # can be found in e.response.
    print("Error: " + str(e))
except Exception as e:
    # Other errors are possible, such as IOError.
    print("Error: " + str(e))
http_client.close()

close()[源代码]¶: 关闭该 HTTPClient, 释放所有使用的资源.

fetch(request, **kwargs)[源代码]¶

执行一个请求, 返回一个 HTTPResponse 对象.

该请求可以是一个 URL 字符串或是一个 HTTPRequest 对象. 如果它是一个字符串, 我们会使用任意关键字参数构造一个 HTTPRequest : HTTPRequest(request, **kwargs)

如果在 fetch 过程中发生错误, 我们将抛出一个 HTTPError 除非 raise_error 关键字参数被设置为 False.

class tornado.httpclient.AsyncHTTPClient[源代码]¶

一个非阻塞 HTTP 客户端.

使用示例:

def handle_request(response):
    if response.error:
        print "Error:", response.error
    else:
        print response.body

http_client = AsyncHTTPClient()
http_client.fetch("http://www.google.com/", handle_request)

这个类的构造器有几个比较神奇的考虑: 它实际创建了一个基于特定实现的子类的实例, 并且该实例被作为一种伪单例重用 (每一个 IOLoop ). 使用关键字参数 force_instance=True 可以用来限制这种单例行为. 只有使用了 force_instance=True 时候, 才可以传递 io_loop 以外其他的参数给 AsyncHTTPClient 构造器. 实现的子类以及它的构造器的参数可以通过静态方法 configure() 设置.

所有 AsyncHTTPClient 实现都支持一个 defaults 关键字参数, 可以被用来设置默认 HTTPRequest 属性的值. 例如:

AsyncHTTPClient.configure(
    None, defaults=dict(user_agent="MyUserAgent"))
# or with force_instance:
client = AsyncHTTPClient(force_instance=True,
    defaults=dict(user_agent="MyUserAgent"))

在 4.1 版更改: io_loop 参数被废弃.

close()[源代码]¶

销毁该 HTTP 客户端, 释放所有被使用的文件描述符.

因为 AsyncHTTPClient 对象透明重用的方式, 该方法 在正常使用时并不需要 . close() 一般只有在 IOLoop 也被关闭, 或在创建 AsyncHTTPClient 的时候使用了 force_instance=True 参数才需要.

在 AsyncHTTPClient 调用 close() 方法后, 其他方法就不能被调用了.

fetch(request, callback=None, raise_error=True, **kwargs)[源代码]¶

执行一个请求, 并且异步的返回 HTTPResponse.

request 参数可以是一个 URL 字符串也可以是一个 HTTPRequest 对象. 如果是一个字符串, 我们将使用全部的关键字参数一起构造一个 HTTPRequest 对象: HTTPRequest(request, **kwargs)

这个方法返回一个结果为 HTTPResponse 的 Future 对象. 默认情况下, 如果该请求返回一个非 200 的响应码, 这个 Future 将会抛出一个 HTTPError 错误. 相反, 如果 raise_error 设置为 False, 则无论响应码如何, 都将返回该 response (响应).

如果给定了 callback , 它将被 HTTPResponse 调用. 在回调接口中, HTTPError 不会自动抛出. 相反你必须检查该响应的 error 属性或者调用它的 rethrow 方法.

classmethod configure(impl, **kwargs)[源代码]¶

配置要使用的 AsyncHTTPClient 子类.

AsyncHTTPClient() 实际上是创建一个子类的实例. 此方法可以使用一个类对象或此类的完全限定名称(或为 None 则使用默认的, SimpleAsyncHTTPClient) 调用.

如果给定了额外的关键字参数, 它们将会被传递给创建的每个子类实例的构造函数. 关键字参数 max_clients 确定了可以在每个 IOLoop 上并行执行的 fetch() 操作的最大数量. 根据使用的实现类不同, 可能支持其他参数.

例如:

AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")

Request 对象¶

class tornado.httpclient.HTTPRequest(url, method='GET', headers=None, body=None, auth_username=None, auth_password=None, auth_mode=None, connect_timeout=None, request_timeout=None, if_modified_since=None, follow_redirects=None, max_redirects=None, user_agent=None, use_gzip=None, network_interface=None, streaming_callback=None, header_callback=None, prepare_curl_callback=None, proxy_host=None, proxy_port=None, proxy_username=None, proxy_password=None, allow_nonstandard_methods=None, validate_cert=None, ca_certs=None, allow_ipv6=None, client_key=None, client_cert=None, body_producer=None, expect_100_continue=False, decompress_response=None, ssl_options=None)[源代码]¶

HTTP 客户端请求对象.

除了 url 以外所有参数都是可选的.

参数:

url (string) – fetch 的 URL
method (string) – HTTP 方法, e.g. “GET” or “POST”
headers (HTTPHeaders 或 dict) – 额外的 HTTP 请求头
body – HTTP 请求体字符串 (byte 或 unicode; 如果是 unicode 则使用 utf-8 编码)
body_producer – 可以被用于延迟/异步请求体调用. 它可以被调用, 带有一个参数, 一个 write 函数, 并应该返回一个 Future 对象. 它应该在新的数据可用时调用 write 函数. write 函数返回一个可用于流程控制的 Future 对象. 只能指定 body 和 body_producer 其中之一. body_producer 不被 curl_httpclient 支持. 当使用 body_producer 时, 建议传递一个 Content-Length 头, 否则将使用其他的分块编码, 并且很多服务断不支持请求的分块编码. Tornado 4.0 新增
auth_username (string) – HTTP 认证的用户名
auth_password (string) – HTTP 认证的密码
auth_mode (string) – 认证模式; 默认是 “basic”. 所允许的值是根据实现方式定义的; curl_httpclient 支持 “basic” 和 “digest”; simple_httpclient 只支持 “basic”
connect_timeout (float) – 初始化连接的超时时间
request_timeout (float) – 整个请求的超时时间
if_modified_since (datetime 或 float) – If-Modified-Since 头的时间戳
follow_redirects (bool) – 是否应该自动跟随重定向还是返回 3xx 响应?
max_redirects (int) – follow_redirects 的最大次数限制
user_agent (string) – User-Agent 头
decompress_response (bool) – 从服务器请求一个压缩过的响应, 在下载后对其解压缩. 默认是 True. Tornado 4.0 新增.
use_gzip (bool) – decompress_response 的别名从 Tornado 4.0 已弃用.
network_interface (string) – 请求所使用的网络接口. 只有 curl_httpclient ; 请看下面的备注.
streaming_callback (callable) – 如果设置了, streaming_callback 将用它接收到的数据块执行, 并且 HTTPResponse.body 和 HTTPResponse.buffer 在最后的响应中将为空.
header_callback (callable) – 如果设置了, header_callback 将在接收到每行头信息时运行(包括第一行, e.g. HTTP/1.0 200 OK\r\n, 最后一行只包含 \r\n. 所有行都包含结尾的换行符). HTTPResponse.headers 在最终响应中将为空. 这与 streaming_callback 结合是最有用的, 因为它是在请求正在进行时访问头信息唯一的方法.
prepare_curl_callback (callable) – 如果设置, 将使用 pycurl.Curl 对象调用, 以允许应用程序进行额外的 setopt 调用.
proxy_host (string) – HTTP 代理主机名. 如果想要使用代理, proxy_host 和 proxy_port 必须设置; proxy_username 和 proxy_pass 是可选项. 目前只有 curl_httpclient 支持代理.
proxy_port (int) – HTTP 代理端口
proxy_username (string) – HTTP 代理用户名
proxy_password (string) – HTTP 代理密码
allow_nonstandard_methods (bool) – 允许 method 参数使用未知值?
validate_cert (bool) – 对于 HTTPS 请求, 是否验证服务器的证书?
ca_certs (string) – PEM 格式的 CA 证书的文件名, 或者默认为 None. 当与 curl_httpclient 一起使用时参阅下面的注释.
client_key (string) – 客户端 SSL key 文件名(如果有). 当与 curl_httpclient 一起使用时参阅下面的注释.
client_cert (string) – 客户端 SSL 证书的文件名(如果有). 当与 curl_httpclient 一起使用时参阅下面的注释.
ssl_options (ssl.SSLContext) – 用在 simple_httpclient (curl_httpclient 不支持) 的 ssl.SSLContext 对象. 覆写 validate_cert, ca_certs, client_key, 和 client_cert.
allow_ipv6 (bool) – 当 IPv6 可用时是否使用? 默认是 true.
expect_100_continue (bool) – 如果为 true, 发送 Expect: 100-continue 头并在发送请求体前等待继续响应. 只被 simple_httpclient 支持.

3.1 新版功能: auth_mode 参数.

4.0 新版功能: body_producer 和 expect_100_continue 参数.

4.2 新版功能: ssl_options 参数.

Response 对象¶

class tornado.httpclient.HTTPResponse(request, code, headers=None, buffer=None, effective_url=None, error=None, request_time=None, time_info=None, reason=None)[源代码]¶

HTTP 响应对象.

属性:

request: HTTPRequest 对象
code: HTTP 状态码数值, e.g. 200 或 404
reason: 人类可读的, 对状态码原因的简短描述
headers: tornado.httputil.HTTPHeaders 对象
effective_url: 跟随重定向后资源的最后位置
buffer: 响应体的 cStringIO 对象
body: string 化的响应体 (从 self.buffer 的需求创建)
error: 任何异常对象
request_time: 请求开始到结束的时间(秒)
time_info: 来自请求的诊断时间信息的字典. 可用数据可能会更改, 不过当前在用的时间信息是 http://curl.haxx.se/libcurl/c/curl_easy_getinfo.html, 加上 queue, 这是通过等待在 AsyncHTTPClient 的 max_clients 设置下的插槽引入的延迟(如果有的话).

rethrow()[源代码]¶: 如果请求中有错误发生, 将抛出一个 HTTPError.

异常¶

exception tornado.httpclient.HTTPError(code, message=None, response=None)[源代码]¶

一个 HTTP 请求失败后抛出的异常.

属性:

code - 整数的 HTTP 错误码, e.g. 404. 当没有接收到 HTTP 响应时将会使用 599 错误码, e.g. 超时.
response - 全部的 HTTPResponse 对象.

注意如果 follow_redirects 为 False, 重定向将导致 HTTPErrors, 并且你可以通过 error.response.headers['Location'] 查看重定向的描述.

Command-line 接口¶

This module provides a simple command-line interface to fetch a url using Tornado’s HTTP client. Example usage:

# Fetch the url and print its body
python -m tornado.httpclient http://www.google.com

# Just print the headers
python -m tornado.httpclient --print_headers --print_body=false http://www.google.com

Implementations¶

class tornado.curl_httpclient.CurlAsyncHTTPClient(io_loop, max_clients=10, defaults=None)¶: libcurl-based HTTP client.

tornado.httpclient — 异步 HTTP 客户端¶