今天来演示一个,获取IP海的代理IP列表
❝好了话不多说,直接上代码,代码上已经详细注释了;看代码即可!
''睡眠延迟函数 Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long) Function 取得网页源码(Optional ByVal pages As Integer = 1) As String On Error GoTo er: Dim iurl As String: iurl = "https://www.kuaidaili.com/free/inha/" & pages ''读取网页源码 With CreateObject("WinHttp.WinHttpRequest.5.1") ''请求对象 .Open "GET", iurl, False ''请求参数 .send ''发送请求 ''取得源码 strText = .responseText 取得网页源码 = strText End With Exit Function er: 取得网页源码 = "查询出错啦:" & Err.Description End Function Sub 解析网页源码() Dim sht As Worksheet: Set sht = Worksheets("IP地址池") sht.Range("A1:AA65536").ClearContents ''测试取5页数据 For p = 1 To 5 ''解析html Dim xmldocstr As String: xmldocstr = 取得网页源码(p) Dim HTMLDoc As Object, TDElements As Object Set HTMLDoc = CreateObject("htmlfile") ''大致判断内容 If Len(xmldocstr) < 100 Then Exit Sub HTMLDoc.body.innerhtml = xmldocstr ''定位html表格 Set TDElements = HTMLDoc.getElementById("list") Dim infotb As Object Set infotb = TDElements.Children(1) ''读取表头 Dim heads As Object: Set heads = infotb.Children(0).Children(0) For j = 0 To heads.Cells.Length - 1 ''数据表头写入表格 sht.Cells(1, j + 1) = heads.Children(j).innertext DoEvents Next ''读取内容 Dim Contents As Object: Set Contents = infotb.Children(1) For i = 0 To Contents.Rows.Length - 1 Dim Content As Object: Set Content = Contents.Children(i) ''取得实际行数 Dim rw As Integer: rw = sht.Range("A65536").End(xlUp).Row DoEvents For k = 0 To Content.Cells.Length - 1 ''数据内容写入表格 sht.Cells(rw + 1, k + 1) = Content.Children(k).innertext DoEvents Next DoEvents Next Sleep 800 ''如果无法获取第二页内容,请把延迟秒数调大一点 DoEvents Next End Sub
注意爬虫千万不要涉嫌隐私问题,最好遵循Robots协议!
文章来源:https://mp.weixin.qq.com/s/ZMborUHj6p4hkNFt3LR10w
还没有评论,来说两句吧...