python - LookupError: Resource 'corpora/stopwords' not found -
i trying run webapp on heroku using flask. webapp programmed in python nltk (natural language toolkit library).
one of file has following header:
import nltk, json, operator nltk.corpus import stopwords nltk.tokenize import regexptokenizer
when webpage stopwords code called, produces following error:
lookuperror: ********************************************************************** resource 'corpora/stopwords' not found. please use nltk downloader obtain resource: >>> nltk.download() searched in: - '/app/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' **********************************************************************
the exact code used:
#remove punctuation toker = regexptokenizer(r'((?<=[^\w\s])\w(?=[^\w\s])|(\w))+', gaps=true) data = toker.tokenize(data) #remove stop words , digits stopword = stopwords.words('english') data = [w w in data if w not in stopword , not w.isdigit()]
the webapp on heroku doesn't produce lookup error when stopword = stopwords.words('english')
commented out.
the code runs without glitch on local computer. have have installed required libraries on computer using
pip install requirements.txt
the virtual environment provided heroku running when tested code on computer.
i have tried nltk provided 2 different sources, lookuperror
still there. 2 sources used are:
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip
https://github.com/nltk/nltk.git
the problem corpus ('stopwords' in case) doesn't uploaded heroku. code works on local machine because has nltk corpus. please follow these steps solve issue.
- create new directory in project (let's call 'nltk_data')
- download nltk corpus in directory. have configure during download.
- tell nltk particular path. add
nltk.data.path.append('path_to_nltk_data')
python file that's using nltk. - now push app heroku.
hope solves problem. worked me!
Comments
Post a Comment