scrapy ajax的问题 财富值77

2016-11-05 17:48发布

我正在爬一个asp.net的网页,其中有一个以post方法提交表单的ajax,我通过模拟post表单发现,响应的内容和用浏览器的响应文本不一样
这是通过模拟post得到的文本

0|hiddenField|__EVENTTARGET||0|hiddenField|__EVENTARGUMENT||0|hiddenField|__LASTFOCUS||1204|hiddenField|__VIEWSTATE|/wEPDwUIMTI1NjYzOTMPZBYCAgMPZBYMAgUPEGQQFQUM6K+36YCJ5oupLi4uBueUsue6pwbkuZnnuqcG5LiZ57qnBuS4gee6pxUFAAExATIBMwE0FCsDBWdnZ2dnZGQCCQ8QFgYeDURhdGFUZXh0RmllbGQFCEFyZWFOYW1lHg5EYXRhVmFsdWVGaWVsZAUCSUQeC18hRGF0YUJvdW5kZxAVAg0tLeivt+mAieaLqS0tCeW5v+S4nOecgRUCAAQyMTQ2FCsDAmdnZGQCCw8QZGQUKwEBZmQCDQ8QZGQUKwEBZmQCFQ9kFgJmD2QWBAIBDxYCHgtfIUl0ZW1Db3VudAIBFgJmD2QWAmYPFQcBMRLlub/kuJznnIHmuIXov5zluIIDNTExATMq5bm/5Lic55yB5pyJ6Imy6YeR5bGe5Zyw6LSo5bGA5Lmd5Zub4peL6ZifBuS4mee6pwnmnY7mm7TlsJRkAgMPZBYEAgEPFgYeBWNsYXNzBQ5NZXNzYWdlQmFySW5mbx4JaW5uZXJodG1sBRjor7fpgInmi6nmn6Xor6LmnaHku7bvvIEeB1Zpc2libGVoZAIDDxYCHgVzdHlsZQW0AWRpc3BsYXk6bm9uZTttYXJnaW46IDVweCAwcHggMHB4IDBweDtwYWRkaW5nOjNweCAwcHggMHB4IDBweDt3aWR0aDoxMDAlO3doaXRlLXNwYWNlOm5vd3JhcDtvdmVyZmxvdzpoaWRkZW47Ym9yZGVyLXRvcDojMDAwMDAwIDFweCBzb2xpZDtib3JkZXItYm90dG9tOiMwMDAwMDAgMXB4IHNvbGlkO2hlaWdodDozMHB4OxYCZg9kFgRmDxYCHwcFDWRpc3BsYXk6bm9uZTsWAgIBDxBkZBYBAgFkAgEPFgIfBwUNZGlzcGxheTpub25lOxYCAgEPDxYGHg5DdXN0b21JbmZvVGV4dGUeCFBhZ2VTaXplAg8eC1JlY29yZGNvdW50AgFkZAIXD2QWAmYPZBYCAgMPZBYEAgEPFgQfBAUOTWVzc2FnZUJhckluZm8fBQUY6K+36YCJ5oup5p+l6K+i5p2h5Lu277yBZAIDD2QWAmYPZBYEZg8WAh8HBQ1kaXNwbGF5Om5vbmU7FgICAQ8QZGQWAQIBZAIBD2QWAgIBDw8WAh8IZWRkZE7/qFJ/lZUXHG/3+KW81s12taYs|300|hiddenField|__EVENTVALIDATION|/wEWIgL66/n1DwLTvNCHCQLd/pb/DALSkbwRAtORvBEC0JG8EQLRkbwRArPa2IkIAvPV8OEHArPflNsEAq6b7dwBAvW/o+IHAsWd9e4KAsHXwpEEAqmkwZANAqbLq/0BAqbLl/0BAqTLq/0BAqLLq/0BAoCKzKoGAsjsgfcEAoG75PEDAtmO7KMFAp7F97oNAqbT3oYIAqm8tOsEAqm8iOsEAqu8tOsEAq28tOsEAo/907wDAuWroYgGAvq1h7kMAtDqo+sLAoX62sAGRfPLI+SQuFVqoE9J4Gr5FRTO0CU=|19|asyncPostBackControlIDs||BtnSearch,BtnAnnual|0|postBackControlIDs|||27|updatePanelIDs||tUpdatePanel5,tUpdatePanel1|0|childUpdatePanelIDs|||0|panelsToRefreshIDs|||2|asyncPostBackTimeout||90|26|formAction||QueryList.aspx?Areald=2147|4|pageTitle||在线查询|

这是我用抓包工具抓到的响应包的返回数据部分截图

发现通过模拟获得的文本是抓到的包的最后一行,而我想要的是除了最后一行的内容
帮我看看为什么,谢谢了

这是我模拟post的代码

友情提示: 此问题已得到解决,问题已经关闭,关闭后问题禁止继续编辑,回答。
该问题目前已经被作者或者管理员关闭, 无法添加新回复
1条回答

这个东西,只能具体网页具体分析,不过如果参数那么多,如果不是特别追求效率的话,还是用Phantomjs + selenium 吧