After 20-30 Repeated Runs, Jobs Stay Running #177

mayvn10 · 2016-04-19T09:57:23Z

@vsivsi Issuing a question here in reply to my Node DDP Client issue. Also, thanks for your helpful insight and solutions thus far!

Referring to this same project. The problem is not the connection because the connection is sustained. However, the problem is the last run job stays running far too long than needed. The job done was done successfully but never changed back to the "completed" state.

According to the docs for this repo, the best option may be to use "jc.shutdownJobServer([options], [callback])" but if there is another way, please explain.

If we need to use shutdownJobServer, where is the best place to use it? The Meteor app or the Node app?

Also, what's a good approach to detecting if a job is running too long (ex. 10-15 mins) and executing shutdownJobServer and then restarting the job server right away? Does this package automatically restart the server after a shutdown?

vsivsi · 2016-04-19T13:50:57Z

This is almost certainly a problem with your code, and not with the job-collection package. Every job must eventually call either job.done() or job.fail(). If it doesn't then that "zombie" job will continue to show-up as "running" even though there is no worker actively working on it. Because servers can crash, network connections can drop, etc. etc, job-collection contains functionality to "auto-fail" jobs that appear to be zombies because the worker hasn't reported any progress (or logged any events) on the job within a specified time window. See the workTimeout option to jc.processJobs().

jc.shutdownJobServer() probably shouldn't be used for this purpose. The issue here is that you appear to have a path of execution out of your worker function (perhaps that "catch" you mention) where job.done() or job.fail() aren't called even though work on that job has effectively ended because the worker code hit some kind of an exception. You need to handle all exceptions and other errors in your worker function, and then either call job.done() or job.fail(), and finally always call the callback function provided by processJobs()

I obviously can't help you debug your program unless you share the code, as a complete Meteor application in its own repo. Debugging code via messages on a github issue is not productive.

mayvn10 · 2016-04-19T14:16:22Z

Thanks for the prompt reply.

Agreed, debugging code via messages is not productive.

We use job.done() in several places and we looked at every exception before, but you're right we may have missed something so we'll do another thorough check through the app once more.

I'll update this after we find what we're looking for.

vsivsi added the question label Apr 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After 20-30 Repeated Runs, Jobs Stay Running #177

After 20-30 Repeated Runs, Jobs Stay Running #177

mayvn10 commented Apr 19, 2016

vsivsi commented Apr 19, 2016 •

edited

Loading

mayvn10 commented Apr 19, 2016

After 20-30 Repeated Runs, Jobs Stay Running #177

After 20-30 Repeated Runs, Jobs Stay Running #177

Comments

mayvn10 commented Apr 19, 2016

vsivsi commented Apr 19, 2016 • edited Loading

mayvn10 commented Apr 19, 2016

vsivsi commented Apr 19, 2016 •

edited

Loading