Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with downloading vboxwrapper* and vm_isocontext* files #46

Open
km-git-acc opened this issue Aug 12, 2018 · 3 comments
Open

Issues with downloading vboxwrapper* and vm_isocontext* files #46

km-git-acc opened this issue Aug 12, 2018 · 3 comments

Comments

@km-git-acc
Copy link

km-git-acc commented Aug 12, 2018

I have been setting up a boinc server using the boinc-server-docker framework. After editing a few files and using a custom build using the --build option, I was able to successfully launch the server and submit a few jobs, which completed and reported back as well.
To test things further on the client side, I removed the project and added it back again using existing credentials. On submitting some jobs and asking the client to request an update, a new job starts downloading along with all the required files (which got deleted from my local machine while removing the project).
However, the Transfers tab shows that only the layer and similar files are now successfully downloaded. Two files vboxwrapper* and vm_isocontext* get stuck at 0% and the boinc manager keeps retrying periodically (this has happened to both me on my Windows machine as well as a collaborator on his Mac)
On the server side, I tried for eg. from within the project folder

docker-compose down
URL_BASE=http://anthgrid.com PROJECT=dbnupperbound docker-compose up -d --build

This apparently resolved the issue on the Windows machine, although the problem persisted at the Mac end. Hence I later again tried out 1) restarting the physical server, 2) restarting the project as above, 3) detaching/attaching the project from the client side, and the problem has now recurred on the Windows machine as well.

I looked at the event logs generated, and posting the logs post the update request

8/12/2018 10:53:38 AM | dbnupperbound | update requested by user
8/12/2018 10:53:41 AM | dbnupperbound | Sending scheduler request: Requested by user.
8/12/2018 10:53:41 AM | dbnupperbound | Requesting new tasks for CPU
8/12/2018 10:53:46 AM | dbnupperbound | Scheduler request completed: got 1 new tasks
8/12/2018 10:53:48 AM | dbnupperbound | Started download of vboxwrapper_26200_windows_x86_64.exe
8/12/2018 10:53:48 AM | dbnupperbound | Started download of vm_isocontext_v1.0.0.iso
8/12/2018 10:54:10 AM | dbnupperbound | Temporarily failed download of vboxwrapper_26200_windows_x86_64.exe: connect() failed
8/12/2018 10:54:10 AM | dbnupperbound | Backing off 00:02:07 on download of vboxwrapper_26200_windows_x86_64.exe
8/12/2018 10:54:10 AM | dbnupperbound | Temporarily failed download of vm_isocontext_v1.0.0.iso: connect() failed
8/12/2018 10:54:10 AM | dbnupperbound | Backing off 00:02:51 on download of vm_isocontext_v1.0.0.iso
8/12/2018 10:54:10 AM | dbnupperbound | Started download of vbox_job_4f89c783ce0c4f20b908bc4a14059af3.xml
8/12/2018 10:54:10 AM | dbnupperbound | Started download of boinc_app_6aedaa339bb34abe8bf6cd4c457edc43
8/12/2018 10:54:11 AM |  | Project communication failed: attempting access to reference site
8/12/2018 10:54:13 AM |  | Internet access OK - project servers may be temporarily down.
8/12/2018 10:54:15 AM | dbnupperbound | Finished download of vbox_job_4f89c783ce0c4f20b908bc4a14059af3.xml
8/12/2018 10:54:15 AM | dbnupperbound | Finished download of boinc_app_6aedaa339bb34abe8bf6cd4c457edc43
8/12/2018 10:54:15 AM | dbnupperbound | Started download of layer_b4aaa2a9b1c803b2d5e2be58eb77fc258749f3e67bfd9b06677e0c4ec551c580.tar.manual.gz
8/12/2018 10:54:15 AM | dbnupperbound | Started download of layer_a3c77c9bff4b76ebb10c03fdb4412507c07e2229429c322c7e2c505c4391602f.tar.manual.gz
8/12/2018 10:54:16 AM | dbnupperbound | Finished download of layer_a3c77c9bff4b76ebb10c03fdb4412507c07e2229429c322c7e2c505c4391602f.tar.manual.gz
8/12/2018 10:54:16 AM | dbnupperbound | Started download of layer_870d3f402f75021060706f2cbe98030c77db022fd0435e78ba32c9318ce9001c.tar.manual.gz
8/12/2018 10:54:17 AM | dbnupperbound | Finished download of layer_870d3f402f75021060706f2cbe98030c77db022fd0435e78ba32c9318ce9001c.tar.manual.gz
8/12/2018 10:54:17 AM | dbnupperbound | Started download of layer_6a79bf36f57324bd393870cb8e38b5b76814a079f04d31327c131959d17bcbb1.tar.manual.gz
8/12/2018 10:54:18 AM | dbnupperbound | Finished download of layer_6a79bf36f57324bd393870cb8e38b5b76814a079f04d31327c131959d17bcbb1.tar.manual.gz
8/12/2018 10:54:18 AM | dbnupperbound | Started download of layer_b40fae1c32824b69017ffeb2ddb45afaeddec0ef5e0f5b51cc8ecffa1d8c54aa.tar.manual.gz
8/12/2018 10:54:19 AM | dbnupperbound | Finished download of layer_b40fae1c32824b69017ffeb2ddb45afaeddec0ef5e0f5b51cc8ecffa1d8c54aa.tar.manual.gz
8/12/2018 10:54:19 AM | dbnupperbound | Started download of layer_9909df9a616a8748a1bdfc03c1b313b72e53301775b8debd2c6286ffda4be1b8.tar.manual.gz
8/12/2018 10:54:20 AM | dbnupperbound | Finished download of layer_9909df9a616a8748a1bdfc03c1b313b72e53301775b8debd2c6286ffda4be1b8.tar.manual.gz
8/12/2018 10:54:20 AM | dbnupperbound | Started download of layer_8dbde8905f7ed067e8e2995167ccfbc558bf7e9ce77df8e9ce36aab2615f4007.tar.manual.gz
8/12/2018 10:54:43 AM | dbnupperbound | Finished download of layer_b4aaa2a9b1c803b2d5e2be58eb77fc258749f3e67bfd9b06677e0c4ec551c580.tar.manual.gz
8/12/2018 10:54:43 AM | dbnupperbound | Started download of layer_2af2c0deefccd67415d414df5f933d6f0d852637908a9d19c894c1dc26b0ef09.tar.manual.gz
8/12/2018 10:54:44 AM | dbnupperbound | Finished download of layer_2af2c0deefccd67415d414df5f933d6f0d852637908a9d19c894c1dc26b0ef09.tar.manual.gz
8/12/2018 10:54:44 AM | dbnupperbound | Started download of layer_01217b14ec2f229cd6c54e5656e5e002189d08d6252353547bd402ceeba727c8.tar.manual.gz
8/12/2018 10:55:18 AM | dbnupperbound | Finished download of layer_01217b14ec2f229cd6c54e5656e5e002189d08d6252353547bd402ceeba727c8.tar.manual.gz
8/12/2018 10:55:18 AM | dbnupperbound | Started download of layer_34321bb08c5f29ab0ac24fb1770e8fc2f5bc56e1a4d93816814b535b206ffdfa.tar.manual.gz
8/12/2018 10:55:22 AM | dbnupperbound | Finished download of layer_8dbde8905f7ed067e8e2995167ccfbc558bf7e9ce77df8e9ce36aab2615f4007.tar.manual.gz
8/12/2018 10:55:22 AM | dbnupperbound | Finished download of layer_34321bb08c5f29ab0ac24fb1770e8fc2f5bc56e1a4d93816814b535b206ffdfa.tar.manual.gz
8/12/2018 10:55:22 AM | dbnupperbound | Started download of image_04d415dc1215f3d7814c90573df9d0f77aaeec1624b6e2087ffeb8bd5877a2ad.tar.manual.gz
8/12/2018 10:55:23 AM | dbnupperbound | Finished download of image_04d415dc1215f3d7814c90573df9d0f77aaeec1624b6e2087ffeb8bd5877a2ad.tar.manual.gz
8/12/2018 10:56:17 AM | dbnupperbound | Started download of vboxwrapper_26200_windows_x86_64.exe
8/12/2018 10:56:39 AM | dbnupperbound | Temporarily failed download of vboxwrapper_26200_windows_x86_64.exe: connect() failed
8/12/2018 10:56:39 AM | dbnupperbound | Backing off 00:05:53 on download of vboxwrapper_26200_windows_x86_64.exe
8/12/2018 10:56:40 AM |  | Project communication failed: attempting access to reference site
8/12/2018 10:56:42 AM |  | Internet access OK - project servers may be temporarily down.
8/12/2018 10:57:01 AM | dbnupperbound | Started download of vm_isocontext_v1.0.0.iso
8/12/2018 10:57:24 AM | dbnupperbound | Temporarily failed download of vm_isocontext_v1.0.0.iso: connect() failed
8/12/2018 10:57:24 AM | dbnupperbound | Backing off 00:07:19 on download of vm_isocontext_v1.0.0.iso
8/12/2018 10:57:25 AM |  | Project communication failed: attempting access to reference site
8/12/2018 10:57:26 AM |  | Internet access OK - project servers may be temporarily down.
8/12/2018 11:02:33 AM | dbnupperbound | Started download of vboxwrapper_26200_windows_x86_64.exe
8/12/2018 11:02:55 AM |  | Project communication failed: attempting access to reference site
8/12/2018 11:02:55 AM | dbnupperbound | Temporarily failed download of vboxwrapper_26200_windows_x86_64.exe: connect() failed
8/12/2018 11:02:55 AM | dbnupperbound | Backing off 00:12:18 on download of vboxwrapper_26200_windows_x86_64.exe
8/12/2018 11:02:57 AM |  | Internet access OK - project servers may be temporarily down.

I tried to check the apache logs on the server, but they point to /dev/stdout and /dev/stderr hence haven't been able to check them.
On trying to check logs from the host environment, for eg. docker logs dbnupperbound_apache_1, I get several reap messages at the end

...
...
2018-08-12 05:00:04,014 CRIT reaped unknown pid 1348)
2018-08-12 05:00:04,112 CRIT reaped unknown pid 1339)
2018-08-12 05:00:04,113 CRIT reaped unknown pid 1341)
2018-08-12 05:00:04,113 CRIT reaped unknown pid 1343)
2018-08-12 05:00:04,114 CRIT reaped unknown pid 1345)
2018-08-12 05:00:04,114 CRIT reaped unknown pid 1347)

Given that most task files are getting downloaded, it seems I am missing something obvious, but haven't been able to figure it out. Great if you could point in the right direction.

EDIT:
Also, an approach which had worked in the past, but haven't been able to replicate this time. I tried a hard reset by first backing up the mysql database, then using docker-compose down -v, then the normal up -d --build command, and then restoring the mysql database. Post that a project detach/attach at the client side. While all the user and workunit information is retained, and again the Transfers tab shows the layer files getting downloaded, the two v* files stay at 0%.

EDIT2:
On a replica vm, I followed all the steps except importing the mysql backup, and it worked! Ofcourse this cannot be a feasible option once a project has run for some aount of time, hence I have been searching for a series of steps to use in a crisis scenario, before scaling things up.
In the main server, there have been some abandoned tasks over the past 2 days, and some tasks which are still in progress but pending the v* file transfer. Is it possible these are creating some kind of bottleneck?

@km-git-acc km-git-acc changed the title Issues with downloading vboxwrapper* and vm_isocontext files Issues with downloading vboxwrapper* and vm_isocontext* files Aug 12, 2018
@km-git-acc
Copy link
Author

Currently I have partially solved the problem this way,
Within the mysql backup, I deleted all the .ibd files except the user, host, forum and such 'user data' tables. So no workunit or results information is retained within the modified backup (since historically the amount of work done as reflected in the credits is present in the user table itself, that is retained). Also, results are anyways saved outside the server from time to time.

Since everything works on a fresh start, if things freeze the overall flow is to 1) create a mysql backup, 2) issue a down -v command, 3) issue a up -d --build command, 4) issue a down command, 5) restore the modified backup, 6) issue a up -d command

Also, it's possible the original problem will be faced quite rarely. On revisiting everything that was done, it seems I had additionally first started the boinc server with a numeric ip, then while some tasks were running in the clients, I purchased a domain name and linked it to a new ip. Then i restarted the boinc server using the domain name. This may have affected some of the results and workunit tables, though not really sure about that.

@km-git-acc
Copy link
Author

km-git-acc commented Aug 19, 2018

@marius311
By the way, is there a Linux disk image available where the Boinc client and Virtualbox work well?
I tried out different OS and virtualbox combinations on the client side, but the Boinc server sent no tasks (unlike Windows/Mac where the tasks are downloading and executing without any issues). For eg. with a test account at Cosmology@home, the server sent some camb_legacy tasks, but not the camb_boinc2docker tasks. On editing preferences to recieve only boinc2docker tasks, the client did not receive those task types.
I believe a disk image would get around these issues. The interest in Linux images being that such machines on the cloud are quite cheap and hence allow a cost-effective scaleup.

@cminnoy
Copy link

cminnoy commented Feb 4, 2021

Experiencing the same issue. Download stuck on vboxwrapper_26198_x86_64-pc-linux-gnu and vm_isocontext_v1.0.0.iso.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants