3分钟Python爬取9000张表情包图片

发布时间:2023-04-20 13:00

先看下我的爬取成果:

很多人学习python,不知道从何学起。
很多人学习python,掌握了基本语法过后,不知道在哪里寻找案例上手。
很多已经做案例的人,却不知道如何去学习更加高深的知识。
那么针对这三类人,我给大家提供一个好的学习平台,免费领取视频教程,电子书籍,以及课程的源代码!
QQ群:961562169

本视频的演示步骤:

  1. 使用requests爬取200个网页
  2. 使用BeautifulSoup实现图片的标题和地址解析
  3. 将图片下载到本地目录

这2个库的详细用法,请看我的其他视频课程

import requests
from bs4 import BeautifulSoup
import re

1、下载共200个页面的HTML

def download_all_htmls():
    \"\"\"
    下载所有列表页面的HTML,用于后续的分析
    \"\"\"
    htmls = []
    for idx in range(200):
        url = f\"https://fabiaoqing.com/biaoqing/lists/page/{idx+1}.html\"
        print(\"craw html:\", url)
        r = requests.get(url)
        if r.status_code != 200:
            raise Exception(\"error\")
        htmls.append(r.text)
    print(\"success\")
    return htmls
# 执行爬取
htmls = download_all_htmls()
craw html: https://fabiaoqing.com/biaoqing/lists/page/1.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/2.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/3.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/4.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/188.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/189.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/190.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/191.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/192.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/193.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/194.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/195.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/196.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/197.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/198.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/199.html
craw html: https://fabiaoqing.com/biaoqing/lists/page/200.html
success
htmls[0][:1000]
\'\\n\\n\\n    \\n    \\n    \\n    \\n    热门表情_发表情,表情包大全fabiaoqing.com\\n    \\n    \\n    \\n    \\n    \\n    \\n