Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用官网的demo加上本地的127.0.0.1:1080代理就报错 #37

Open
yejunyu opened this issue Dec 14, 2018 · 5 comments
Open

用官网的demo加上本地的127.0.0.1:1080代理就报错 #37

yejunyu opened this issue Dec 14, 2018 · 5 comments

Comments

@yejunyu
Copy link

yejunyu commented Dec 14, 2018

@Crawler(name = "basic",proxy = "http://127.0.0.1:1080")
public class Basic extends BaseSeimiCrawler {

    @Override
    public String[] startUrls() {
        //两个是测试去重的
        return new String[]{"http://www.cnblogs.com/","http://www.cnblogs.com/"};
    }

    @Override
    public void start(Response response) {
        JXDocument doc = response.document();
        try {
            List<Object> urls = doc.sel("//a[@class='titlelnk']/@href");
            logger.info("{}", urls.size());
            for (Object s:urls){
                push(Request.build(s.toString(),Basic::getTitle));
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    public void getTitle(Response response){
        JXDocument doc = response.document();
        try {
            logger.info("url:{} {}", response.getUrl(), doc.sel("//h1[@class='postTitle']/a/text()|//a[@id='cb_post_title_url']/text()"));
            //do something
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
2018-12-14 17:35:00.209 ERROR 12784 --- [ool-2-thread-17] c.wanghaomiao.seimi.core.SeimiProcessor  : org.apache.http.message.BasicHttpRequest cannot be cast to org.apache.http.client.methods.HttpUriRequest

java.lang.ClassCastException: org.apache.http.message.BasicHttpRequest cannot be cast to org.apache.http.client.methods.HttpUriRequest
	at cn.wanghaomiao.seimi.http.hc.HcDownloader.getRealUrl(HcDownloader.java:180) ~[SeimiCrawler-2.0.jar:na]
	at cn.wanghaomiao.seimi.http.hc.HcDownloader.renderResponse(HcDownloader.java:117) ~[SeimiCrawler-2.0.jar:na]
	at cn.wanghaomiao.seimi.http.hc.HcDownloader.process(HcDownloader.java:79) ~[SeimiCrawler-2.0.jar:na]
	at cn.wanghaomiao.seimi.core.SeimiProcessor.run(SeimiProcessor.java:101) ~[SeimiCrawler-2.0.jar:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
@cookou
Copy link

cookou commented Dec 14, 2018

我也遇到了,换成okhttp的实现就可以了,你试试。默认的apache http client 有挺多问题

@yejunyu
Copy link
Author

yejunyu commented Dec 15, 2018

我也遇到了,换成okhttp的实现就可以了,你试试。默认的apache http client 有挺多问题

怎么操作啊,能给个链接吗

@yejunyu
Copy link
Author

yejunyu commented Dec 15, 2018

我也遇到了,换成okhttp的实现就可以了,你试试。默认的apache http client 有挺多问题

文档上找到了,谢谢大哥

@liuyu-struggle
Copy link

请问这个问题怎么解决的呀

@liuyu-struggle
Copy link

我也遇到了,换成okhttp的实现就可以了,你试试。替换的apache http client有挺多问题

文档上找到了,谢谢大哥

你好,这个换成okHttp3,是可以解决这个报错的问题了,但是我这边又出现了,请求结果出现乱码的情况,请问有遇到过吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants