• 加入Google Analytics、Google Tag Manager QQ群一起交流谷歌分析小站—总群
  • 加入Adobe Analytics、Adobe Launch交流群,加入请附上Adobe的组织ID,没有请勿加Adobe Analytics交流群
  • Google Analytics和Google Tag Manager视频课程第三版http://ke.ichdata.com/course/50

Python爬58同城二手房图片

Python与爬虫 GA小站 4年前 (2016-04-08) 2484次浏览 已收录 0个评论

Python爬58同城二手房图片

from bs4 import BeautifulSoup
import requests
import os
import urllib.request
import random
import time
import re

user_agent = ['Mozilla/5.0 (Windows NT 6.1)\
AppleWebKit/537.11 (KHTML, like Gecko)\
Chrome/23.0.1271.64 Safari/537.11','Mozilla/5.0 (Windows NT 6.1; WOW64)\
AppleWebKit/537.36 (KHTML, like Gecko)\
Chrome/47.0.2526.106 Safari/537.36','Mozilla/5.0 \
(Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0',"Mozilla/5.0\
(X11; Linux x86_64) AppleWebKit/537.17 (KHTML, like Gecko)\
Chrome/24.0.1312.56 Safari/537.17",'Mozilla/5.0\
(Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0']

url=[]
for i in range(1000):
	if i==0:
		url.append('http://gz.58.com/ershoufang/')
	else:
		url.append('http://gz.58.com/ershoufang/pn{0}/'.format(i))
print("url is done!")
b=0
url=['http://gz.58.com/ershoufang/']
cd week7
cd douban
for i in url:
	time.sleep(1)
	agent = random.choice(user_agent)
	header= {
    'Connection': 'Keep-Alive',
    'Accept': 'text/html, application/xhtml+xml, */*',
    'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3',
    'User-Agent': '%s' %agent}

	soup=BeautifulSoup(requests.get(i,headers = header).text,"html.parser")
	items=soup('tr',logr=re.compile('^j'))
	if len(items)==0:
		break
	else:
		for item in items:
			urllib.request.urlretrieve(item.find('div','img_list').img.get('lazy_src'),
				os.path.basename(item.find('p','bthead').a.get_text()+'.jpg'))
			# print(item.find('div','img_list').img.get('lazy_src'))
			b+=1
			print("下载%d张"%int(b))
print("Finish Down %d Picture" %int(b))
喜欢 (1)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址