requests.exceptions.ConnectionError: HTTPSConnectionPool(host='publicsuffix.org', port=443): Max retries exceeded with url: /list/public_suffix_list.dat (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f06384d29b0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)
In [1]: from tldextract import TLDExtract In [2]: extract = TLDExtract(cache_file='/root/tldextract.cache') In [3]: extract('http://forums.bbc.co.uk') # 解析url Out[3]: ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk') In [4]: after = extract('http://forums.bbc.co.uk') In [5]: after.registered_domain # 获取注册域名 Out[5]: 'bbc.co.uk'
这样我们就可以将提出爬虫中获取的子域内容了。
如果是ip如何解析?
1 2 3 4 5
In [6]: after = extract('http://127.0.0.1:8080/deployed/') In [7]: after Out[7]: ExtractResult(subdomain='', domain='127.0.0.1', suffix='') In [8]: after.registered_domain Out[8]: ''