Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

line 1: b'ID C_5074630370399522 already defined' (line 1) #602

Open
cinyearchan opened this issue Sep 4, 2024 · 1 comment
Open

line 1: b'ID C_5074630370399522 already defined' (line 1) #602

cinyearchan opened this issue Sep 4, 2024 · 1 comment
Labels
failed 程序运行出错

Comments

@cinyearchan
Copy link

为了更好的解决问题,请认真回答下面的问题。等到问题解决,请及时关闭本issue。

  • 问:请您指明哪个版本运行出错(github版/PyPi版/全部)?

答:pypi 版

  • 问:您使用的是否是最新的程序(是/否)?

答:是

  • 问:爬取任意用户都会运行出错吗(是/否)?

答:否

  • 问:若只有爬特定微博时才出错,能否提供出错微博的weibo_id或url(非必填)?

答:

  • 问:若您已提供出错微博的weibo_id或url,可忽略此内容,否则能否提供出错账号的user_id及您配置的since_date,方便我们定位出错微博(非必填)?

答:
user_id 2492465520
since_date 2009-08-28
end_date now
usesr_id_list.txt 2492465520 刘晓光_恶魔奶爸 2024-08-22 10:21

  • 问:如果方便,请您描述出错详情,最好附上错误提示。

答:单次爬取过程中出现多次提示:

line 1: b'ID C_5074630370399522 already defined' (line 1)
Traceback (most recent call last):
  File "/Users/xxx/.pyenv/versions/3.9.7/lib/python3.9/site-packages/weibo_spider/parser/util.py", line 42, in handle_html
    selector = etree.HTML(resp.content)
  File "src/lxml/etree.pyx", line 3170, in lxml.etree.HTML
  File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1765, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 649, in lxml.etree._raiseParseError
  File "<string>", line 1
lxml.etree.XMLSyntaxError: line 1: b'ID C_5074630370399522 already defined'
@cinyearchan cinyearchan added the failed 程序运行出错 label Sep 4, 2024
@dataabc
Copy link
Owner

dataabc commented Sep 4, 2024

我现在没法调试,您可以参考https://www.mail-archive.com/[email protected]/msg00213.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
failed 程序运行出错
Projects
None yet
Development

No branches or pull requests

2 participants