Implementing Crawler4j with Selenium in Java doesn`t work -


i'm trying use crawler4j simultaneous selenium website testing. after webpage crawled, selenium should start simultaneous test parameters got crawler. url should open or id's of search fields example. if use crawler4j alone works fine , can extract information need. if start test selenium predefined parameters url , id's works should. when put same code selenium crawler code exception. guess it`s maybe threat problem? nice if give me hint or me:

exception in thread "crawler 1" java.lang.noclassdeffounderror: com/google/common/base/function     @ mypackage.com.selenium.init(selenium.java:21)     @ mypackage.com.mycrawler.visit(mycrawler.java:57)     @ edu.uci.ics.crawler4j.crawler.webcrawler.processpage(webcrawler.java:351)     @ edu.uci.ics.crawler4j.crawler.webcrawler.run(webcrawler.java:220)     @ java.lang.thread.run(thread.java:722) caused by: java.lang.classnotfoundexception: com.google.common.base.function     @ java.net.urlclassloader$1.run(urlclassloader.java:366)     @ java.net.urlclassloader$1.run(urlclassloader.java:355)     @ java.security.accesscontroller.doprivileged(native method)     @ java.net.urlclassloader.findclass(urlclassloader.java:354)     @ java.lang.classloader.loadclass(classloader.java:423)     @ sun.misc.launcher$appclassloader.loadclass(launcher.java:308)     @ java.lang.classloader.loadclass(classloader.java:356)     ... 5 more 

here code crawler:

public class mycrawler extends webcrawler {       private final static pattern filters = pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g" + "|png|tiff?|mid|mp2|mp3|mp4"             + "|wav|avi|mov|mpeg|ram|m4v|pdf" + "|rm|smil|wmv|swf|wma|zip|rar|gz))$");  /** * should implement function specify whether given url * should crawled or not (based on crawling logic). */     @override     public boolean shouldvisit(weburl url) {         string href = url.geturl().tolowercase();         return !filters.matcher(href).matches() && href.startswith("http://");                                             }        /**      * function called when page fetched , ready processed      * program.      */     @override     public void visit(page page) {             int docid = page.getweburl().getdocid();             string url = page.getweburl().geturl();             string domain = page.getweburl().getdomain();             string path = page.getweburl().getpath();             string subdomain = page.getweburl().getsubdomain();             string parenturl = page.getweburl().getparenturl();             string anchor = page.getweburl().getanchor();              system.out.println("docid: " + docid);             system.out.println("url: " + url);             system.out.println("domain: '" + domain + "'");             system.out.println("sub-domain: '" + subdomain + "'");             system.out.println("path: '" + path + "'");             system.out.println("parent page: " + parenturl);             system.out.println("anchor text: " + anchor);             // here webbrowser should open don`t work             webdriver driver = new firefoxdriver();        driver.get(url);                  if (page.getparsedata() instanceof htmlparsedata) {                     htmlparsedata htmlparsedata = (htmlparsedata) page.getparsedata();                     string text = htmlparsedata.gettext();                     string html = htmlparsedata.gethtml();                     list<weburl> links = htmlparsedata.getoutgoingurls();                      system.out.println("text length: " + text.length());                     system.out.println("html length: " + html.length());                     system.out.println("number of outgoing links: " + links.size());                 }                  header[] responseheaders = page.getfetchresponseheaders();             if (responseheaders != null) {                     system.out.println("response headers:");                     (header header : responseheaders) {                             system.out.println("\t" + header.getname() + ": " + header.getvalue());                     }             }              system.out.println("=============");      } 

i used httpcore-4.2.2 , httpclient-4.2.3. there must have been problem using selenium. after updated them latest release worked should. lost 1 week finding this.


Comments

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -