Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

judge不同步策略的bug #6

Open
missuzhang opened this issue Apr 18, 2016 · 2 comments
Open

judge不同步策略的bug #6

missuzhang opened this issue Apr 18, 2016 · 2 comments

Comments

@missuzhang
Copy link

missuzhang commented Apr 18, 2016

出现过一次,其中一台judge状态运行,healthcheck都是ok,api调用也正常,但是不从hbs获取策略。
问题发现过程:发现部分策略修改后,告警显示的内容是很久之前的。然后通过judge的api查看策略
curl http://judge-001ip:6081/strategy/hostname-eg/net.port.listen 发现策略是修改之前的旧策略
curl http://judge-002ip:6081/strategy/hostname-eg/net.port.listen 发现策略是修改之后的新策略
但是history发现历史数据让然发送到judge-001。所以告警的内容和触发条件是错误的。
重新启动judge-001后就正常了。
这个问题不好复现,但是真实存在且后果严重。目前没有定位到产生的原因。最好多开启debugHost配置项,然后多注意自己的judge的运行状态。

问题定位到了,g/rpc.go中的call方法死锁了。
this.rpcClient.Call(method, args, reply)并没有超时机制,该方法正在执行时网络意外中断将导致调用一直等待 。 rpc.go的call方法加锁无法释放,导致再也无法从hbs获取策略了。

@laiwei
Copy link
Contributor

laiwei commented May 3, 2016

judge获取策略列表确实没有加超时
我们完善下

@bbaobelief
Copy link

strategy.go:41: [ERROR] Hbs.GetStrategies: call hbs timeout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants