python - Why is HttpCacheMiddleware disabled in scrapyd? -

- July 15, 2014

why httpcachedmiddleware need scrapy.cfg , how work around issue?

i use scrapyd-deploy build egg, , deploy project scrapyd.

when job run, see log output httpcachemiddleware disabled because scrapy.cfg not found.

2014-06-08 18:55:51-0700 [scrapy] warning: disabled httpcachemiddleware: unable find scrapy.cfg file infer project data dir

i check egg file , scrapy.cfg indeed not there because egg file consists of files in project directory. wrong, think egg built correctly.

foo/  |- project/  |      |- __init__.py  |      |- settings.py  |      |- spiders/  |            |- ...  |- scrapy.cfg

digging code more, think 1 of 3 if-condition failing somehow in middlewaremanager.

        try:             mwcls = load_object(clspath)             if crawler , hasattr(mwcls, 'from_crawler'):                 mw = mwcls.from_crawler(crawler)             elif hasattr(mwcls, 'from_settings'):                 mw = mwcls.from_settings(settings)             else:                 mw = mwcls()             middlewares.append(mw)         except notconfigured, e:             if e.args:                 clsname = clspath.split('.')[-1]                 log.msg(format="disabled %(clsname)s: %(eargs)s",                         level=log.warning, clsname=clsname, eargs=e.args[0])

place empty scrapy.cfg under working directory.

as source code shows, project_data_dir try find closest scrapy.cfg , use infer project data dir.

Search This Blog

Backgorund

python - Why is HttpCacheMiddleware disabled in scrapyd? -

Comments

Post a Comment

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -