python - Beautiful Soup .find Chinese Characters -
a_string = soup.find(text='围') soup.find_all('title', limit=1) # [<title>the dormouse's story</title>] soup.find('title') # <title>the dormouse's story</title>
is there anyway can handle find chinese characters while using beautifulsoup?
tried awhile , can't seem detect character. english character works fine
source of website i'm working with
<!doctype html> <html lang="zh-cn"> <head> <meta charset="gbk" />
when use find(text='something')
search text nodes containing text 'something' , nothing else.
if want find text contains particular letter, or match other regular expression must use regular expression pattern instead (like @yannis said):
soup.find(text=re.compile(u'定'))
note the re.u
flag not required not changing behavior of special characters \s or \w. if case, might need provide it. see more on regular expressions here
Comments
Post a Comment