有的时候我们想知道未跳转前的内容,可是Python的urllib2却自动跟随跳转,研究了下库里的代码,发现我们修改下就可以了。
注释的三行代码可以返回当前的code,比如301或302什么的,如果只想知道code,那么只要去掉这三行注释再把return response注释了就可以了。
“debug_handler = urllib2.HTTPHandler(debuglevel = 1)”是调试跟踪用的,不想跟踪改为0就可以了。
import urllib2 import socket class SimpleRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_301(self, req, response, code, msg, headers): # result = urllib2.HTTPRedirectHandler.http_error_301(self, req, response, code, msg, headers) # result.status = code # return result return response http_error_302 = http_error_303 = http_error_307 = http_error_301 def unRedirectUrl(url): socket.setdefaulttimeout(90) req = urllib2.Request(url) debug_handler = urllib2.HTTPHandler(debuglevel = 1) opener = urllib2.build_opener(debug_handler, SimpleRedirectHandler) content = '' try: response = opener.open(req) content = response.read() response.close() except socket.timeout, e: print 'socket.timeout:', e except IOError, e: print 'IOError:', e except: print 'unknown error' return content print unRedirectUrl('http://localhost/test') |
受教了!呵呵!