Skip to content

Bacalhau project report 20220321

lukemarsden edited this page Mar 21, 2022 · 12 revisions

Mostly a tech debt and planning week this week.

Tech debt – test flakes

Dave's logging PR got us into test flakiness hell (not Dave's fault!) with an obscure Golang deadlock to do with grandchild processes not closing their pipes that was only affecting our Docker runtime in go test. The solution was this obscure fix. 😱 Golang is a really good language but dealing with subprocesses and concurrency reliably seems to be hard everywhere.

It burned a lot of cycles tracking this down since we had hypotheses that it was several different factors causing the flakiness so we had to test lots of different combinations.

We also discovered some exciting flakiness due to the way IPFS aggressively peers with other IPFS nodes on the network. Failed tests were leaving docker containers running IPFS around. Those IPFS nodes were peering with the IPFS nodes in new test docker containers. This was causing IPFS to occasionally hang, as the old IPFS nodes thought a CID was hosted on a now-gone peer. We also saw IPFS peering between containers on my laptop and containers on Kai's laptop when we were on the same WLAN! We fixed this just by disabling MDNS peer discovery for the devstack IPFS instances.

Phew. Tests are reliably green now.

Getting Bacalhau to production!

Kai and I spent the rest of the time this week thinking big picture, and in detail, about how to get Bacalhau to production in the most rapid, reasonable and secure way possible.

We have now published this plan for how to get a sensibly scoped Bacalhau into production, targeting October. We'll go over this plan and the design that it entails in detail in Paris. Please leave comments on the doc!

The document is now with the Powers That Be (TM) for approval.

Clone this wiki locally