python - LookupError: Resource 'corpora/stopwords' not found -

- May 15, 2014

i trying run webapp on heroku using flask. webapp programmed in python nltk (natural language toolkit library).

one of file has following header:

import nltk, json, operator nltk.corpus import stopwords  nltk.tokenize import regexptokenizer

when webpage stopwords code called, produces following error:

lookuperror:  **********************************************************************   resource 'corpora/stopwords' not found.  please use nltk     downloader obtain resource:  >>> nltk.download()     searched in:       - '/app/nltk_data'       - '/usr/share/nltk_data'       - '/usr/local/share/nltk_data'       - '/usr/lib/nltk_data'       - '/usr/local/lib/nltk_data'   **********************************************************************

the exact code used:

#remove punctuation   toker = regexptokenizer(r'((?<=[^\w\s])\w(?=[^\w\s])|(\w))+', gaps=true)  data = toker.tokenize(data)    #remove stop words , digits  stopword = stopwords.words('english')   data = [w w in data if w not in stopword , not w.isdigit()]

the webapp on heroku doesn't produce lookup error when stopword = stopwords.words('english') commented out.

the code runs without glitch on local computer. have have installed required libraries on computer using

pip install requirements.txt

the virtual environment provided heroku running when tested code on computer.

i have tried nltk provided 2 different sources, lookuperror still there. 2 sources used are:
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip
https://github.com/nltk/nltk.git

the problem corpus ('stopwords' in case) doesn't uploaded heroku. code works on local machine because has nltk corpus. please follow these steps solve issue.

create new directory in project (let's call 'nltk_data')
download nltk corpus in directory. have configure during download.
tell nltk particular path. add nltk.data.path.append('path_to_nltk_data') python file that's using nltk.
now push app heroku.

hope solves problem. worked me!

Search This Blog

Backgorund

python - LookupError: Resource 'corpora/stopwords' not found -

Comments

Post a Comment

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -