Python3简单实现微信爬虫

2017-12-07 0 条评论 1.31k 次阅读 1 人点赞

爬虫使用ghost.py主要通过搜狗搜索的微信搜索功能,来实现爬取微信公众号信息的功能爬虫代码如下:

# -*- coding: utf-8 -*-
import sys
reload(sys)
import datetime
import time
sys.setdefaultencoding("utf-8")
  
from ghost import Ghost
ghost = Ghost(wait_timeout=20)
  
url="http://weixin.sogou.com/gzh?openid=oIWsFt8JDv7xubXz5E3U41T0eFbk"
page,resources = ghost.open(url)
result, resources = ghost.wait_for_selector("#wxmore a")
  
from bs4 import BeautifulSoup
c=0
while True:
  if c>=30:
    break
  
  soup = BeautifulSoup(ghost.content)
  
  for wx in soup.find_all("h4"):
    print wx
  
  page, resources = ghost.evaluate(
    """
    var div1 = document.getElementById("wxbox");
    div1.innerHTML = '';
    """)
  ghost.click("#wxmore a")
  result, resources = ghost.wait_for_selector(".wx-rb3")
  
  c=c+1
  pass

希望对大家学习Python能够有所帮助

Kiwi

Valar Morghulis

文章评论(0)