code_tin

Scrapy网页数据抓取教程-1.环境准备
网站数据抓取是数据分析的第一个步骤 属于数据采集步骤 我们今天介绍Python下的网页抓取框架:Scrapy
扫描右侧二维码阅读全文
18
2018/10

Scrapy网页数据抓取教程-1.环境准备

网站数据抓取是数据分析的第一个步骤
属于数据采集步骤
我们今天介绍Python下的网页抓取框架:Scrapy

环境准备

我们使用Ubuntu 16.04 LTS作为基础环境
首先确保Python环境可用

apt-get update
apt-get install python

然后使用pip安装scrapy

pip install scrapy

实用技巧

1.优化APT源
APT默认源修改成阿里云镜像会极大的提高apt下载速度

vi /etc/apt/sources.list.d/sources-aliyun-0.list

内容修改如下

deb http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial main
deb-src http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial main

deb http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial-updates main
deb-src http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial-updates main

deb http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial universe
deb-src http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial universe
deb http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial-updates universe
deb-src http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial-updates universe
deb http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial-security main
deb-src http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial-security main
deb http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial-security universe
deb-src http://mirrors.cloud.aliyuncs.com/ubuntu/ xenial-security universe

操作完毕后更新一下apt

apt-get update


2.优化pip源

vi ~/.pip/pip.conf

内容修改如下

[global]
trusted-host=mirrors.aliyun.com
index-url=https://mirrors.aliyun.com/pypi/simple/

 

Last modification:November 26th, 2018 at 04:16 pm
If you think my article is useful to you, please feel free to appreciate

Leave a Comment