python - Beautiful Soup .find Chinese Characters -


a_string = soup.find(text='围')  soup.find_all('title', limit=1) # [<title>the dormouse's story</title>]  soup.find('title') # <title>the dormouse's story</title> 

is there anyway can handle find chinese characters while using beautifulsoup?

tried awhile , can't seem detect character. english character works fine

source of website i'm working with

<!doctype html> <html lang="zh-cn">   <head>         <meta charset="gbk" /> 

when use find(text='something') search text nodes containing text 'something' , nothing else.

if want find text contains particular letter, or match other regular expression must use regular expression pattern instead (like @yannis said):

soup.find(text=re.compile(u'定')) 

note the re.u flag not required not changing behavior of special characters \s or \w. if case, might need provide it. see more on regular expressions here


Comments

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -