urllib3的基础知识（总结）-ag真人游戏

一、快速入门

1、提出请求

# 导入urllib3模块
import urllib3
# 创建一个poolmanager对象，用于处理连接池和线程安全的所有详细信息
http = urllib3.poolmanager()
# 提出请求，请使用request()
r = http.request('get','http://httpbin.org/robots.txt')
print(r.data) # b'user-agent: *\ndisallow: /deny\n'

request请求

import urllib3
import json
http = urllib3.poolmanager()
r = http.request('post','https://httpbin.org/post',fields={
  'hello':'world'})
print(json.loads(r.data.decode('utf-8')))

2、响应内容

status ，data 和headers 属性

import urllib3
http = urllib3.poolmanager()
r = http.request('get',"http://httpbin.org/ip")
# 响应状态
print(r.status) # 200
# 响应数据
print(r.data) # b'{\n  "origin": "120.239.165.180"\n}\n'
# 响应头文件
print(r.headers) # httpheaderdict({'date': 'fri, 22 apr 2022 02:34:03 gmt', 'content-type': 'application/json', 'content-length': '34', 'connection': 'keep-alive', 'server': 'gunicorn/19.9.0', 'access-control-allow-origin': '*', 'access-control-allow-credentials': 'true'})

json内容

json内容可以通过解码和反序列化来加载。 data请求的属性：

import json
import urllib3
http = urllib3.poolmanager()
r = http.request('get','https://httpbin.org/ip')
json_data = json.loads(r.data.decode('utf-8'))
print(json_data) # {'origin': '120.239.165.180'}

二进制内容

data 响应的属性始终设置为表示响应内容的字节字符串

import urllib3
import json
http = urllib3.poolmanager()
r = http.request('get','http://httpbin.org/bytes/8')
print(r.data) # b'\xa6\xe6\xb6\x7f\xc1\xd7\xbb9'

对响应内容使用io包装器

有时候你想用 io.textiowrapper 或类似的对象，如直接使用 httpresponse 数据。要使这两个接口很好地结合在一起，需要使用 auto_close 通过将其设置为 false 。默认情况下，读取所有字节后关闭http响应，这将禁用该行为：

import io
import urllib3
http = urllib3.poolmanager()
r = http.request('get', 'https://www.qq.com', preload_content=false)
r.auto_close = false
for line in io.textiowrapper(r):
    print(line)

3、请求数据

3.1 报头

import urllib3
import json
http = urllib3.poolmanager()
r = http.request('get','https://httpbin.org/headers',headers={
  'x-something': 'value'})
json_data_headers = json.loads(r.data.decode('utf-8'))['headers']
print(json_data_headers)

3.2 查询参数

a、为了 get ， head 和 delete 请求，您可以简单地将参数作为字典传递到 fields 参数 request() 。

import urllib3
import json
http = urllib3.poolmanager()
r = http.request('get', 'https://httpbin.org/get', fields={
  'get_key': 'get_value'})
json_data_args = json.loads(r.data.decode('utf-8'))['args']
print(json_data_args) # {'get_key': 'get_value'}

post请求的特殊点：

import urllib3
import json
http = urllib3.poolmanager()
r = http.request('post', 'https://httpbin.org/post', fields={
  'post_key': 'post_value'})
json_data_form = json.loads(r.data.decode('utf-8'))['args']
print(json_data_form)  # {}

b、为了 post 和 put 请求时，需要在url中手动编码查询参数。

import urllib3
import json
from urllib.parse import urlencode
http = urllib3.poolmanager()
encoded_args = urlencode({
  'arg': 'value'})
url = 'https://httpbin.org/post?'   encoded_args
r = http.request('post', url)
print(json.loads(r.data.decode('utf-8'))['args'])  # {'arg': 'value'}

3.3 表单数据

为了 put 和 post 请求时，urllib3将在 fields 参数提供给 request() ：

import urllib3
import json
http = urllib3.poolmanager()
r = http.request('post', 'https://httpbin.org/post', fields={
  'post_key': 'post_value'})
json_data_form = json.loads(r.data.decode('utf-8'))['form']
print(json_data_form) # {'post_key': 'post_value'}

3.4 json

import json
import urllib3
http = urllib3.poolmanager()
data = {
  'attribute': 'value'}
headers = {
  'content-type': 'application/json'}
encoded_data = json.dumps(data).encode('utf-8')
print(encoded_data) # b'{"attribute": "value"}'
r = http.request('post', 'https://httpbin.org/post', body=encoded_data, headers=headers)
print(json.loads(r.data.decode('utf-8'))['json']) # {'attribute': 'value'}

3.5 文件和二进制数据

⑴ 文件为txt文档：

import urllib3
import json
http = urllib3.poolmanager()
with open('example.txt', mode='r', encoding='utf-8') as fp:
    file_data = fp.read()
r = http.request('post', 'https://httpbin.org/post',
                 fields={
  
                 	 # 将文件字段指定为 (file_name, file_data)
                     'filefield': ('example.txt', file_data),
                 })
json_data_files = json.loads(r.data.decode('utf-8'))['files']
print(json_data_files) # {'filefield': '..文本文档中的内容，其中换行显示为“\n”..'}

显式指定文件的mime类型（传递第三个项）：

import urllib3
import json
http = urllib3.poolmanager()
with open('example.txt', mode='r', encoding='utf-8') as fp:
    file_data = fp.read()
r = http.request('post', 'https://httpbin.org/post',
                 fields={
  
                     # 下面的第三个参数，指定文件的mime类型
                     'filefield':('example.txt',file_data,'text/plain'),
                 })
json_data_files = json.loads(r.data.decode('utf-8'))['files']
print(json_data_files) # {'filefield': '..文本文档中的内容，其中换行显示为“\n”..'}

⑵ 文件为图片：

import urllib3
import json
http = urllib3.poolmanager()
with open('img001.jpg', mode='rb') as fp:
    binary_data = fp.read()
r = http.request('post', 'https://httpbin.org/post',
				 # 发送二进制文件时，只需指定body参数
                 body=binary_data,
                 headers={
  
                     # 在content-type，设置指定文件类型
                     'content-type': 'image/jpeg',}
                 )
json_data = json.loads(r.data.decode('utf-8'))['data']
print(json_data)

4、使用超时

4.1 控制请求在中止之前允许运行多长时间（以秒为单位）

import urllib3
http = urllib3.poolmanager()
# timeout 超过设置的4.0秒后，自动断开，并报错
r = http.request('get', 'https://httpbin.org/delay/3', timeout=4.0)
print(r)  # 
# timeout 超过设置的2.5秒后，自动断开，并报错
r1 = http.request('get', 'https://httpbin.org/delay/3', timeout=2.5)
print(r1)  # maxretryerror caused by readtimeouterror

4.2 精细控制，允许您指定单独的连接和读取超时

# python版本：3.6
# -*- coding:utf-8 -*-
import urllib3
http = urllib3.poolmanager()
r = http.request('get', 'http://httpbin.org/delay/3', timeout=urllib3.timeout(connect=1.0))
print(r)  # 
r1 = http.request('get', 'http://httpbin.org/delay/3',
timeout=urllib3.timeout(connect=1.0, read=2.0))
print(r1)  # maxretryerror caused by readtimeouterror

4.3 poolmanager，设置超时

import urllib3
http = urllib3.poolmanager(timeout=3.0)
# 同上，请求受到相同的超时
http = urllib3.poolmanager(timeout=urllib3.timeout(connect=1.0,read=2.0))

5、重试请求和重定向

5.1 重试请求

import urllib3
http = urllib3.poolmanager()
r = http.requests('get', 'http://httpbin.org/ip', retries=10) # 重次10次

5.2 禁用所有重试和重定向逻辑

import urllib3
http = urllib3.poolmanager()
# newconnectionerror,新建连接错误，重试连接错误,因为链接不支持重试
# r = http.request('get', 'http://nxdomain.example.com', retries=false)
# print(r.status) # newconnectionerror错误
r1 = http.request('get', 'https://httpbin.org/redirect/1', retries=false)
print(r1.status)  # 302

5.3 精细控制，分配重试次数和重定向（多重定向）

重定向没有一步到底会报错

比如：下方的代码 redirect=2 代表重定向2次，实际重定向还有1次，故报错

# python版本：3.6
# -*- coding:utf-8 -*-
import urllib3
http = urllib3.poolmanager()
# retry(5, redirect=3)：总共重试 5 次，但仅限于 2 次重定向，没有重定向一步到底（位）
r = http.request('get','http://httpbin.org/redirect/3',retries=urllib3.retry(5, redirect=2))
print(r.status) # 报错，maxretryerror

重定向一步到底

比如：下方的代码 redirect=3 代表重定向3次，实际重定向共3次，故没有报错

# python版本：3.6
# -*- coding:utf-8 -*-
import urllib3
http = urllib3.poolmanager()
# retry(5, redirect=3)：总共重试 5 次，但仅限于 2 次重定向，没有重定向一步到底（位）
r = http.request('get','http://httpbin.org/redirect/3',retries=urllib3.retry(5, redirect=3))
print(r.status) # 200

分析上方的重定向

retry(5, redirect=3)：总共重试 5 次，但仅限于 3 次重定向
重定向由 http://httpbin.org/redirect/3 重定向到 http://httpbin.org/get，中间重定向3次，次数少了会报错
-------------------------------------
次数          重定向到（指定的网页）
1       http://httpbin.org/redirect/2
2       http://httpbin.org/redirect/1
3       http://httpbin.org/get

5.4 禁用过多重定向

import urllib3
http = urllib3.poolmanager()
r = http.request('get', 'http://httpbin.org/redirect/3',
                 retries=urllib3.retry(redirect=2, raise_on_redirect=false))
print(r.status) # 302

5.5 设置重试策略：规定所有请求使用相同的重试策略

规定所有请求，禁用重试请求

import urllib3
# 禁用重试
http = urllib3.poolmanager(retries=false)

规定所有请求，分配重试总次数和重定向

import urllib3
# 精细控制，分配重试总次数和重定向
http = urllib3.poolmanager(retries=urllib3.retry(5, redirect=2))

6、错误和异常

import urllib3
http = urllib3.poolmanager()
try:
    http.request('get','nx.example.com',retries=false)
except urllib3.exceptions.newconnectionerror as e:
    print('连接失败！',e)

7、日志记录

更改 urllib3 记录器的日志级别

logging.getlogger("urllib3").setlevel(logging.warning)

二、高级用法

1、自定义池行为

import urllib3
# 向许多不同的主机发出请求，则增加此数量可能会提高性能，同时增加内存和套接字消耗
http = urllib3.poolmanager(num_pools=50)

最大连接数：同时向同一主机发出许多请求，则增加此数量可能会提高性能

import urllib3
# 二者选一
http = urllib3.poolmanager(maxsize=10)
http = urllib3.httpconnectionpool('https://cn.bing.com', maxsize=10)

2、stream 和 i/o

在处理大型响应时，最好将响应内容流式传输

import urllib3
http = urllib3.poolmanager()
r = http.request(
    'get',
    'https://httpbin.org/bytes/1024',
    preload_content=false)  # 预加载连接为false，即将http连接释放回连接池，以便重新使用
for chunk in r.stream(32): # stream()允许迭代响应内容的块
    print(chunk)

把response当作文件对象，可以直接使用 read() 读取数据

import urllib3
http = urllib3.poolmanager()
response = http.request(
    'get',
    'https://httpbin.org/bytes/1024',
    preload_content=false)
print(response.read(4)) # b'\xae\x95\n\xc2'

调用read()将阻塞，直到有更多响应数据可用

import urllib3
import io
http = urllib3.poolmanager()
r = http.request(
    'get',
    'https://httpbin.org/bytes/1024',
    preload_content=false)  # 预加载连接为false，即将http连接释放回连接池，以便重新使用
reader = io.bufferedreader(r, 8)
print(reader.read(4)) # b'\xcec\x1f\r'
# 释放连接
r.release_conn()

使用这个类似文件的对象来执行诸如解码内容之类的操作 codecs

import urllib3
import json
import codecs
http = urllib3.poolmanager()
reader = codecs.getreader('utf-8')
r = http.request(
    'get',
    'http://httpbin.org/ip',
    preload_content=false)
print(json.load(reader(r)))  # {'origin': '120.239.165.180'}