requests的一些扩展

request的内部结构

Session对象 与 mount方法

import requests

s = requests.Session()
s.get('https://www.google.com')

查看Session源码中,初始化的动作

class Session(SessionRedirectMixin):
    """A Requests session.

    Provides cookie persistence, connection-pooling, and configuration.

    Basic Usage::

      >>> import requests
      >>> s = requests.Session()
      >>> s.get('https://httpbin.org/get')
      <Response [200]>

    Or as a context manager::

      >>> with requests.Session() as s:
      ...     s.get('https://httpbin.org/get')
      <Response [200]>
    """

    __attrs__ = [
        'headers', 'cookies', 'auth', 'proxies', 'hooks', 'params', 'verify',
        'cert', 'adapters', 'stream', 'trust_env',
        'max_redirects',
    ]

    def __init__(self):
        ...
        # Default connection adapters.
        self.adapters = OrderedDict()
        self.mount('https://', HTTPAdapter()) # class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False)
        self.mount('http://', HTTPAdapter())  # class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False)

        # HTTP适配器所做的只是根据目标URL为不同的请求提供不同的配置  

HTTP Adapter

HTTP适配器 根据目标URL, 为不同的请求提供不同的配置

HTTP适配器中的 pool_connections·一个连接池对应一个主机·

pool_connections

– The number of urllib3 connection pools to cache.

一个连接池对应一个主机

HTTP基于TCP协议。 HTTP连接也是TCP连接,由五个值的元组标识:

(<protocol>, <src addr>, <src port>, <dest addr>, <dest port>)

假设已经与www.example.com建立了HTTP / TCP连接,并假设服务器支持Keep-Alive, 因为上述的五个值均不变,那么下次您将请求发送到www.example.com/awww.example.com/b时, 可以使用相同的连接

例子:

HTTPAdapter(pool_connections = 1)已安装到https ,这意味着一次只能保留一个连接池。 调用s.get('https://www.baidu.com')之后,缓存的连接池为connectionpool('https://www.baidu.com')。 当请求s.get('https://www.zhihu.com')的时候,session 发现它不能使用以前缓存的连接,因为它不是同一主机(一个连接池对应一个主机)。 因此,session必须创建一个新的连接池或连接(如果需要)。 由于pool_connections = 1,会话无法同时容纳两个连接池,因此它放弃了旧的连接池(即connectionpool('https://www.baidu.com')), 并保留了新的连接池(即connectionpool('https: //www.zhihu.com”)。

所以会在在日志中看到三个正在启动新的HTTPS连接


(1)

import requests
from requests.adapters import HTTPAdapter
from http.client import HTTPConnection  # py3

HTTPConnection.debuglevel = 1

s = requests.Session()

s.mount('https://', HTTPAdapter(pool_connections=1))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')


# INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
# DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
# INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
# DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2621
# INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
# DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None

(2)

# 只创建了两次连接,并节省了一个连接建立时间

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=2))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')


# 只创建了两次连接,并节省了一个连接建立时间

# INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
# DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
# INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
# DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623
# DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None



HTTP适配器中的 pool_maxsize,保存的可重复使用的连接数 (多线程环境使用)

只有在多线程环境中使用Session时,才应该关心pool_maxsize,例如使用同一Session从多个线程发出并发请求

pool_maxsize是用于初始化urllib3的HTTPConnectionPool的参数, 即上面提到的连接池

HTTPConnectionPool是一个容器,用于收集到特定主机的连接pool_maxsize是要保存的可重复使用的连接数

如果在一个线程中运行代码,则既不可能也不需要创建与同一主机的多个连接, 这是因为请求库被阻塞,因此HTTP请求总是一个接一个地发送


s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 = Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start()
t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2606

WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com  
#  只允许一个连接

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (3): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57556
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com
s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=2))
s.mount('https://baidu.com', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 =Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start();t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com

DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623

DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57669

» laike9m – Requests’ secret:pool_connections and pool_maxsize urllib3.connectionpool.HTTPConnectionPool

Buy me a 肥仔水!