python - BeautifulSoup 'href' list that is giving ambiguous TypeErrors? -


i'm using beautifulsoup scrape urls webpage. going good, until of urls have non-ascii characters in them.

requests.get('http://www.reddit.com') soup = beautifulsoup(req.content)  urls = [i.get('href') in soup.findall('a') if         'keyword' in str(i.get('href'))] 

the list comprehension return unicodeerror.
thought separate list comprehension 2 parts instead:

urls = [i.get('href') in soup.findall('a')]  urls = [i.encode('utf-8') in urls] 

this when got attributeerror, saying items nonetype.

i checked type:

print [type(i) in urls] 

which showed unicode types. seems none , unicode @ same time.

you must have missed none value. checked www.reddit.com and, sure enough, there's:

<a name="content"></a> 

its href none. instead of printing values , search none manually, do:

urls = [(i, i.get('href')) in soup.findall('a')] print [u u in urls if u[1] none] 

Comments

Popular posts from this blog

android - Why am I getting the message 'Youractivity.java is not an activity subclass or alias' -

python - How do I create a list index that loops through integers in another list -

c# - “System.Security.Cryptography.CryptographicException: Keyset does not exist” when reading private key from remote machine -