python - how to speed up extracting domain from a url? -


I have the following script that opens a file that contains two column IPs, domains

  Like 108.170.206.91 || com. EventMedia.product2pixel  

and tries the previously-awarded domain name because it is in the FN form and then removes the second level domain through the public suffix module.

  like- invitemedia.com`  

It works well, but it is a bit slow, can anyone help me to make it faster is?

Here's my script:

  psl = publicsuffixList () d = {} f = open (file, 'r') for n, in enumerate (f) Line: ip, reversed_domain_1 = line .split ('|') Try: reversed_domain_2 = reversed_domain_1.split ('.') Reversed_domain_3 = list (reverse (reversed_domain_2)) domain = (include '.' '(Reversed_domain_3)). Strip ('.') Domain = psl .get_public_suffix (domain) Specifies the domain if the IP in D: D [IP] .add (domain) Other: D [IP] = Set (except [domain]): Print (Domain) issued to the IP, domain in d.iteritems (): print ("% s |% d"% (ip, domain), file = output)  

You can use a default word for the d variable that you are handling. If you do a piece instead of reverse and similar, then there may be better performance too.

import default defaults by default d = defaultdict (set) # You can now treat it as if every key is always present ... domain = '.' Include (input.split ('.') [2: 0: -1])

The default word means it does not need to check that it exists before the key is handled Is:

  d = defaultdict (set) d [1] .add (2)  

Comments