Implementing Crawler4j with Selenium in Java doesn`t work -
i'm trying use crawler4j simultaneous selenium website testing. after webpage crawled, selenium should start simultaneous test parameters got crawler. url should open or id's of search fields example. if use crawler4j alone works fine , can extract information need. if start test selenium predefined parameters url , id's works should. when put same code selenium crawler code exception. guess it`s maybe threat problem? nice if give me hint or me:
exception in thread "crawler 1" java.lang.noclassdeffounderror: com/google/common/base/function @ mypackage.com.selenium.init(selenium.java:21) @ mypackage.com.mycrawler.visit(mycrawler.java:57) @ edu.uci.ics.crawler4j.crawler.webcrawler.processpage(webcrawler.java:351) @ edu.uci.ics.crawler4j.crawler.webcrawler.run(webcrawler.java:220) @ java.lang.thread.run(thread.java:722) caused by: java.lang.classnotfoundexception: com.google.common.base.function @ java.net.urlclassloader$1.run(urlclassloader.java:366) @ java.net.urlclassloader$1.run(urlclassloader.java:355) @ java.security.accesscontroller.doprivileged(native method) @ java.net.urlclassloader.findclass(urlclassloader.java:354) @ java.lang.classloader.loadclass(classloader.java:423) @ sun.misc.launcher$appclassloader.loadclass(launcher.java:308) @ java.lang.classloader.loadclass(classloader.java:356) ... 5 more
here code crawler:
public class mycrawler extends webcrawler { private final static pattern filters = pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g" + "|png|tiff?|mid|mp2|mp3|mp4" + "|wav|avi|mov|mpeg|ram|m4v|pdf" + "|rm|smil|wmv|swf|wma|zip|rar|gz))$"); /** * should implement function specify whether given url * should crawled or not (based on crawling logic). */ @override public boolean shouldvisit(weburl url) { string href = url.geturl().tolowercase(); return !filters.matcher(href).matches() && href.startswith("http://"); } /** * function called when page fetched , ready processed * program. */ @override public void visit(page page) { int docid = page.getweburl().getdocid(); string url = page.getweburl().geturl(); string domain = page.getweburl().getdomain(); string path = page.getweburl().getpath(); string subdomain = page.getweburl().getsubdomain(); string parenturl = page.getweburl().getparenturl(); string anchor = page.getweburl().getanchor(); system.out.println("docid: " + docid); system.out.println("url: " + url); system.out.println("domain: '" + domain + "'"); system.out.println("sub-domain: '" + subdomain + "'"); system.out.println("path: '" + path + "'"); system.out.println("parent page: " + parenturl); system.out.println("anchor text: " + anchor); // here webbrowser should open don`t work webdriver driver = new firefoxdriver(); driver.get(url); if (page.getparsedata() instanceof htmlparsedata) { htmlparsedata htmlparsedata = (htmlparsedata) page.getparsedata(); string text = htmlparsedata.gettext(); string html = htmlparsedata.gethtml(); list<weburl> links = htmlparsedata.getoutgoingurls(); system.out.println("text length: " + text.length()); system.out.println("html length: " + html.length()); system.out.println("number of outgoing links: " + links.size()); } header[] responseheaders = page.getfetchresponseheaders(); if (responseheaders != null) { system.out.println("response headers:"); (header header : responseheaders) { system.out.println("\t" + header.getname() + ": " + header.getvalue()); } } system.out.println("============="); }
i used httpcore-4.2.2 , httpclient-4.2.3. there must have been problem using selenium. after updated them latest release worked should. lost 1 week finding this.
Comments
Post a Comment