Python requests库完全指南|StellaCYX

一、环境准备

1.1 安装requests

# 使用pip安装（推荐）
pip install requests

# 验证安装
python -c "import requests; print(requests.__version__)"
# 应输出类似：2.31.0

1.2 开发环境建议

使用Python 3.8+版本
推荐IDE (选择其中一个安装，推荐使用VS code。)：
- VS Code（安装Python扩展）
- PyCharm（社区版即可）
创建虚拟环境(可忽略)：

python -m venv myenv
source myenv/bin/activate  # Linux/Mac
myenv\Scripts\activate    # Windows

二、HTTP协议基础

2.1 关键概念

请求方法：GET（获取）、POST（提交）
状态码：
- 200：成功
- 404：未找到
- 500：服务器错误
请求头（Headers）：
- User-Agent：客户端标识
- Content-Type：数据格式类型
- Cookie：会话保持

2.2 请求流程

客户端构造请求
发送到服务器
服务器处理请求
返回响应
客户端处理响应

三、基础使用

3.1 GET请求

import requests

# 基本GET请求
response = requests.get('https://www.example.com')

# 查看响应内容
print(response.text)        # 文本内容
print(response.status_code) # 状态码
print(response.headers)     # 响应头

3.2 带参数的GET请求

# 方式1：手动拼接URL
response = requests.get('https://httpbin.org/get?name=Alice&age=25')

# 方式2：使用params参数（推荐）
params = {
    'page': 1,
    'per_page': 20,
    'search': 'python'
}
response = requests.get('https://httpbin.org/get', params=params)

3.3 处理响应


if response.status_code == 200:
    # 不同内容类型的处理方式
    content_type = response.headers.get('Content-Type', '')
    
    if 'application/json' in content_type:
        data = response.json()
        print(data['args'])  # 访问JSON数据
    elif 'text/html' in content_type:
        print(response.text[:500])  # 打印前500个字符
    else:
        print(response.content)  # 原始字节数据
else:
    print(f"请求失败，状态码：{response.status_code}")

四、进阶功能

4.1 请求头设置

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Referer': 'https://www.google.com/'
}

response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.json()['headers'])

4.2 POST请求

# 表单提交
data = {
    'username': 'admin',
    'password': 'secret'
}

response = requests.post('https://httpbin.org/post', data=data)

# JSON数据提交
json_data = {
    'title': 'Hello World',
    'content': 'This is a test post'
}

response = requests.post('https://httpbin.org/post', json=json_data)

4.3 文件上传

files = {'file': open('report.xlsx', 'rb')}
response = requests.post('https://httpbin.org/post', files=files)
print(response.json()['files'])

4.4 Cookies处理

# 获取Cookies
response = requests.get('https://www.example.com')
print(response.cookies.get_dict())

# 发送Cookies
cookies = {'session_id': 'abc123'}
response = requests.get('https://httpbin.org/cookies', cookies=cookies)

4.5 超时设置

try:
    # 连接超时3秒，读取超时5秒
    response = requests.get('https://httpbin.org/delay/10', timeout=(3, 5))
except requests.exceptions.Timeout:
    print("请求超时！")

4.6 代理设置

proxies = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080',
}

response = requests.get('https://httpbin.org/ip', proxies=proxies)

五、异常处理

try:
    response = requests.get('https://invalid-url', timeout=5)
    
    response.raise_for_status()  # 自动检查4xx/5xx错误
    
except requests.exceptions.HTTPError as errh:
    print(f"HTTP错误：{errh}")
except requests.exceptions.ConnectionError as errc:
    print(f"连接错误：{errc}")
except requests.exceptions.Timeout as errt:
    print(f"超时错误：{errt}")
except requests.exceptions.RequestException as err:
    print(f"未知请求错误：{err}")

六、实战案例

案例1：抓取网页内容

url = 'https://books.toscrape.com/'
response = requests.get(url)

from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
book_titles = [h3.a['title'] for h3 in soup.find_all('h3')]
print(f"找到 {len(book_titles)} 本书")

案例2：API交互

# 获取GitHub用户信息
username = 'torvalds'
url = f'https://api.github.com/users/{username}'

response = requests.get(url)
data = response.json()

print(f"""
用户名：{data['login']}
姓名：{data.get('name', 'N/A')}
公司：{data.get('company', 'N/A')}
粉丝数：{data['followers']}
仓库数：{data['public_repos']}
""")

案例3：下载文件

def download_file(url, save_path):
    response = requests.get(url, stream=True)
    
    with open(save_path, 'wb') as fd:
        for chunk in response.iter_content(chunk_size=128):
            fd.write(chunk)
    print(f"文件已保存至：{save_path}")

# 示例：下载Python logo
download_file(
    'https://www.python.org/static/community_logos/python-logo.png',
    'python_logo.png'
)

七、最佳实践与注意事项

遵守robots.txt规则
设置合理的请求间隔（建议≥2秒）
使用Session保持连接（适合多次请求）

with requests.Session() as s:
    s.headers.update({'User-Agent': 'My Crawler 1.0'})
    s.get('https://www.example.com/login', auth=('user','pass'))
    # 后续请求会自动保持cookies

处理重定向（默认自动处理，可禁用）

response = requests.get(url, allow_redirects=False)

配置重试策略

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

session = requests.Session()
retries = Retry(
    total=3,
    backoff_factor=0.3,
    status_forcelist=[500, 502, 503, 504]
)
session.mount('https://', HTTPAdapter(max_retries=retries))

八、调试技巧

查看实际请求URL：

print(response.request.url)

查看请求头：

print(response.request.headers)

使用在线测试服务：

使用网络抓包工具：

Chrome开发者工具（F12）
Fiddler
Wireshark

九、推荐学习资源

官方文档：https://docs.python-requests.org
HTTP协议标准：RFC 7230-7235
进阶爬虫框架：
- Scrapy
- Selenium（处理JavaScript）
数据解析库：
- BeautifulSoup
- lxml
- parsel

十、常见问题解答

Q：遇到SSL证书错误怎么办？
A：可临时禁用验证(不太建议)：

requests.get(url, verify=False)

Q：如何处理中文乱码？
A：手动指定编码：

response.encoding = 'gbk'  # 或 'utf-8'

Q：如何保持登录状态？
A：使用Session对象：

session = requests.Session()
session.post(login_url, data=credentials)
session.get(protected_page_url)

Q：ModuleNotFoundError：模块找不到？？？
A：检查一下几点

有没有为这个程序设置Python解释器？一般来说使用pycharm会需要配置这个。vscode直接使用系统变量里的。
你设置的python解释器是否安装了这个模块？
你安装的模块名称对吗？有一些模块名字和安装名字是不一样的。
模块名字是区分大写小写的，但是将模块下载在python的lib里的之后，会把所有大写字母改成小写，你得改回来。

一、环境准备

1.1 安装requests

1.2 开发环境建议

二、HTTP协议基础

2.1 关键概念

2.2 请求流程

三、基础使用

3.1 GET请求

3.2 带参数的GET请求

3.3 处理响应

四、进阶功能

4.1 请求头设置

4.2 POST请求

4.3 文件上传

4.4 Cookies处理

4.5 超时设置

4.6 代理设置

五、异常处理

六、实战案例

案例1：抓取网页内容

案例2：API交互

案例3：下载文件

七、最佳实践与注意事项

八、调试技巧

九、推荐学习资源

十、常见问题解答

免费的2D、3D游戏素材网站收集

StellaCYX