python - BeautifulSoup 'href' list that is giving ambiguous TypeErrors? -


i'm using beautifulsoup scrape urls webpage. going good, until of urls have non-ascii characters in them.

requests.get('http://www.reddit.com') soup = beautifulsoup(req.content)  urls = [i.get('href') in soup.findall('a') if         'keyword' in str(i.get('href'))] 

the list comprehension return unicodeerror.
thought separate list comprehension 2 parts instead:

urls = [i.get('href') in soup.findall('a')]  urls = [i.encode('utf-8') in urls] 

this when got attributeerror, saying items nonetype.

i checked type:

print [type(i) in urls] 

which showed unicode types. seems none , unicode @ same time.

you must have missed none value. checked www.reddit.com and, sure enough, there's:

<a name="content"></a> 

its href none. instead of printing values , search none manually, do:

urls = [(i, i.get('href')) in soup.findall('a')] print [u u in urls if u[1] none] 

Comments

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -