java - How To handle cookies in a crawler -


i building webcrawler ( tiny one). few sites nytimes checks cookies whenever visit them. seems first check cookie, if not available sets it. if goes wrong, redirects login page.

now, how behavior can handled using programming , prevent redirection.

general answer: cookies used number of different purposes, there's no one-size-fits-all solution can use crawler. sites use cookies key component of user identification schemes, , if mess those, site might not identify crawler is. not preferred. if want more info on sending cookie data, though, can read here: http://en.wikipedia.org/wiki/http_cookie#setting_a_cookie. important line in http request is:

cookie: name=value; name2=value2 

specific answer: know of ny times site, require subscription in order read material, cookies required authentication, , should not spoofed crawler.


Comments

Popular posts from this blog

database - VFP Grid + SQL server 2008 - grid not showing correctly -

jquery - Set jPicker field to empty value -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -