python - Beautiful Soup .find Chinese Characters -


a_string = soup.find(text='围')  soup.find_all('title', limit=1) # [<title>the dormouse's story</title>]  soup.find('title') # <title>the dormouse's story</title> 

is there anyway can handle find chinese characters while using beautifulsoup?

tried awhile , can't seem detect character. english character works fine

source of website i'm working with

<!doctype html> <html lang="zh-cn">   <head>         <meta charset="gbk" /> 

when use find(text='something') search text nodes containing text 'something' , nothing else.

if want find text contains particular letter, or match other regular expression must use regular expression pattern instead (like @yannis said):

soup.find(text=re.compile(u'定')) 

note the re.u flag not required not changing behavior of special characters \s or \w. if case, might need provide it. see more on regular expressions here


Comments

Popular posts from this blog

database - VFP Grid + SQL server 2008 - grid not showing correctly -

jquery - Set jPicker field to empty value -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -