Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After 20-30 Repeated Runs, Jobs Stay Running #177

Open
mayvn10 opened this issue Apr 19, 2016 · 2 comments
Open

After 20-30 Repeated Runs, Jobs Stay Running #177

mayvn10 opened this issue Apr 19, 2016 · 2 comments
Labels

Comments

@mayvn10
Copy link

mayvn10 commented Apr 19, 2016

@vsivsi Issuing a question here in reply to my Node DDP Client issue. Also, thanks for your helpful insight and solutions thus far!

Referring to this same project. The problem is not the connection because the connection is sustained. However, the problem is the last run job stays running far too long than needed. The job done was done successfully but never changed back to the "completed" state.

According to the docs for this repo, the best option may be to use "jc.shutdownJobServer([options], [callback])" but if there is another way, please explain.

If we need to use shutdownJobServer, where is the best place to use it? The Meteor app or the Node app?

Also, what's a good approach to detecting if a job is running too long (ex. 10-15 mins) and executing shutdownJobServer and then restarting the job server right away? Does this package automatically restart the server after a shutdown?

@vsivsi
Copy link
Owner

vsivsi commented Apr 19, 2016

This is almost certainly a problem with your code, and not with the job-collection package. Every job must eventually call either job.done() or job.fail(). If it doesn't then that "zombie" job will continue to show-up as "running" even though there is no worker actively working on it. Because servers can crash, network connections can drop, etc. etc, job-collection contains functionality to "auto-fail" jobs that appear to be zombies because the worker hasn't reported any progress (or logged any events) on the job within a specified time window. See the workTimeout option to jc.processJobs().

jc.shutdownJobServer() probably shouldn't be used for this purpose. The issue here is that you appear to have a path of execution out of your worker function (perhaps that "catch" you mention) where job.done() or job.fail() aren't called even though work on that job has effectively ended because the worker code hit some kind of an exception. You need to handle all exceptions and other errors in your worker function, and then either call job.done() or job.fail(), and finally always call the callback function provided by processJobs()

I obviously can't help you debug your program unless you share the code, as a complete Meteor application in its own repo. Debugging code via messages on a github issue is not productive.

@mayvn10
Copy link
Author

mayvn10 commented Apr 19, 2016

Thanks for the prompt reply.

Agreed, debugging code via messages is not productive.

We use job.done() in several places and we looked at every exception before, but you're right we may have missed something so we'll do another thorough check through the app once more.

I'll update this after we find what we're looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants