发布时间:2023-09-16 09:30
在setting.py中添加可用ip代理池:
PROXIES=[
'http://182.149.82.74:9999',
'http://121.237.25.238:3000',
'http://61.183.176.122:57210',
'http://175.43.84.29:9999',
]
在中间件middlewares.py
中添加如下类:
import scrapy
from scrapy import signals
import random
class ProxyMiddleware(object):
def __init__(self, ip):
self.ip = ip
@classmethod
def from_crawler(cls, crawler):
return cls(ip=crawler.settings.get('PROXIES'))
def process_request(self, request, spider):
ip = random.choice(self.ip)
request.meta['http_proxy'] = ip
print("当前ip为:"+ip)
在setting.py
文件的DOWNLOADER_MIDDLEWARES
属性中添加中间件:
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddleware.useragent.UserAgentMiddleware': None,
'myproject.middlewares.MyUserAgentMiddleware': 400,
}
一开始我在中间件中添加代理IP部分的代码为:
request.meta['proxy'] = ip
我的python版本为3.7,Scrapy为1.6.0,可能由于版本问题,设置代理一直不成功,改为:
request.meta['http_proxy'] = ip
之后,代理不成功的问题成功解决!!!
基于java springboot uniapp图书借阅系统源码(毕设)
StoneDB 宣布开源,一体化实时 HTAP 架构为何是当前最优解
香蕉派 Banana PI BPI M2 Berry 开源硬件开发板全志V40 芯片设计
苹果无线网服务器改什么速度快,iPhone网速慢怎么办?苹果手机如何配置DNS?
python底层与机器底层关系_起底 Python 的底层逻辑
「Spring Boot 系列」06. Spring Boot 配置文件加载顺序
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification 论文学习