Replies: 16 comments
-
Ideally the CI server that does the auditing should run on an infrastructure that is independent of the infrastructure that builds the image. For instance we could use bitbucket + circle ci for the auditing CI service while we keep using github + travis ci to build the images. |
Beta Was this translation helpful? Give feedback.
-
A few thoughts: I'm not seeing how this hashing idea would work -- we don't have deterministic builds, so two identical builds of the same docker image will generally have different hashes for most binaries, due to things like embedded timestamps. (Deterministic builds are really hard -- ref 1, ref 2. Not really worth trying given the archaic toolchains we're stuck with, IMO.) Maybe I'm not understanding exactly what's being proposed? I'm also not sure what we should worry about exactly (or in security jargon, "what's our threat model"). Empirically, compromises of software distribution sites are very rare (not sure why), and practically speaking it doesn't make sense for us to worry about being more secure than, say, pip or pypi. We definitely should have the conversation about security, because it's a bit of a kick to realize how large the exposure is in things like this, but I think it's better to start by thinking about what specifically are the worst risks and what kind of practical things we can do to mitigate them. Here's a quick-and-dirty attempt to enumerate our trusted base (i.e., list of things where compromising them would let someone trojan all manylinux binaries):
A lot of this is stuff for me falls into "not worth worrying about". All else being equal it'd be nice if this list were shorter, but realistically if someone compromises github or quay.io or dockerhub or travis-ci or python.org or pip or pypi or the global certificate authority infrastructure, then really the manylinux docker images are the least of our concerns. The things that jump out at me as perhaps worth worrying about are:
Regarding github: enabling 2FA is probably a good idea (I just did :-)), but hardly sufficient -- I know for me, if someone got access to my laptop or phone then they could cause all kinds of havoc with my logged-in browsers and ssh keys. In particular they could silently push changes directly to the Regarding quay.io: it turned out I had a stray credential stored in /root/.docker/config.json, which I deleted... but in general this is rather annoying -- they don't even offer 2FA. Fortunately, unlike github, I basically never actually need to log into the site now that things are set up, so I guess I'll make sure that the only copies of the password are stored securely (e.g. not in my browser password store), and also disable their github-based login mechanism, and then make sure that I stay logged out on my browser... ugh. Maybe we should put up a little wiki page or a note in the README about this? (notes on securing accounts that get access, notes on reviewing changes for their effect on the trusted base) Trying to think of folks to CC who have security background and are interested in the manylinux stuff... maybe @dstufft @alex? |
Beta Was this translation helpful? Give feedback.
-
Actually, I missed a piece in the list above: it looks like there's a bit of a mess around the CentOS version of the devtools 2 release. Apparently the toolchain that everyone's using to build generic linux binaries for distribution (not just us, but also the super-popular holy build box, and probably others as well) is a bunch of unsigned RPMs fetched over insecure-http from someone's personal account at (Notice in the readme: "Known issues: (0) unsigned packages.") AFAICT though this is currently the only available version of this toolchain that doesn't require a RH subscription. This is kinda suboptimal from an internet public health standpoint. Maybe someone at Redhat can/should take an interest? @ncoghlan might know who to ping? |
Beta Was this translation helpful? Give feedback.
-
Ouch - I'd forgotten that one of the downsides of using CentOS 5 as the baseline was not being able to use the softwarecollections.org infrastructure (since that only supports CentOS 6+). @lhawthorn, @kbsingh, any ideas? Context is https://www.python.org/dev/peps/pep-0513/ which relies on CentOS 5 and Developer Toolset 2 as a "lowest common denominator" build environment for cross-distro Linux binaries. |
Beta Was this translation helpful? Give feedback.
-
A possible alternative approach would be to use CERN's devtoolset 2 binaries for Scientific Linux rather than the people.centos.org ones: http://linuxsoft.cern.ch/cern/devtoolset/slc5-devtoolset.repo |
Beta Was this translation helpful? Give feedback.
-
More info on CERN's setup: http://linux.web.cern.ch/linux/devtoolset/#install |
Beta Was this translation helpful? Give feedback.
-
Tru's stack should be the best devtools-2 setup for now, I can work with him to make sure its revalidated and put onto the mirror/cdn instead. However, its worth keeping in mind that EL5 overall is now well into its wind-down days and we are working with folks still running it to move off ( EOL date is Q1 2017 ). w.r.t the SLC devtools-2, that was also only ever a test release, never meant to go into prod for any role, and was never maintained. Within those 2 limitations, if you feel its still a route worth adopting, I'll work with Tru and get the devtools-2 stack in a better home, revalidated and signed. |
Beta Was this translation helpful? Give feedback.
-
@ncoghlan: Interesting, I failed to find that. Looking at http://linuxsoft.cern.ch/cern/devtoolset/slc5-devtoolset.repo , it looks like they probably do provide signed packages, so if we can figure out how to use them + load the relevant key + make sure that rpm is configured to reject unsigned packages, then it would close the main threat vectors. Unfortunately I am a Debian guy and have no idea how to do that :-) |
Beta Was this translation helpful? Give feedback.
-
@kbsingh: I'm hoping manylinux2 will be able to use CentOS 6 + devtoolset-3 from softwarecollections.org as a cross-distro binary baseline, but at the moment EL5 is still too widespread in academia to ignore (plus it's the established baseline that folks like Enthought, Continuum Analytics and Phusion have demonstrated works well in practice) |
Beta Was this translation helpful? Give feedback.
-
@kbsingh: oh, thanks for the update. never mind about the SLC devtools then :-) I know EL5 is running down, but unfortunately it's still the de facto baseline that everyone seems to be using for "I need to build a binary that will run on ~all systems". (Fortunately this doesn't require actually using it for anything besides running So, if there's a reasonable way to make the devtools-2 available in a more robust way, that would be much appreciated. (Worst case it would probably be fine to just provide a tarball that could be dumped into /opt/rh somewhere along with its hash... we all know that there will be no more devtools-2 releases :-)) CC'ing @FooBarWidget too, since the HBB probably would probably also benefit from having a secure source for devtoolset-2, and they might have comments. |
Beta Was this translation helpful? Give feedback.
-
Thanks for CC'ing me. Yeah having devtools-2 available in a more robust/secure way would be great. I don't mind that CentOS 5 is being deprecated as long as existing stuff keeps working in the future. |
Beta Was this translation helpful? Give feedback.
-
Hi, I published the devtools for CentOS-5 because I am using it. At that time, there was little/none feedback/interest from the community, and CentOS-6 was getting most of the traction. As @kbsingh said, we can definitely work out a solution. |
Beta Was this translation helpful? Give feedback.
-
@truatpasteurdotfr: they're certainly very much appreciated! The python wheels built with this repo have already been downloaded ~120,000 times, and that number will probably go up by an order of magnitude within the next month or two as packages like numpy and scipy start publishing builds. I also know that both Continuum and Enthought have been using them for parts of their Python distributions, plus there are lots of folks using HBB for I have no idea what. So there's definitely lots of interest, it turns out -- I guess just it took a while :-). So thank you! |
Beta Was this translation helpful? Give feedback.
-
@njsmith about reproducible builds, it's not as bad as one might have thought: The hashes for a locally built patchelf (and gcc from devtools):
match with the image built by our CI:
But the global hashes for the docker images themselves do not match:
probably build timestamps. https://reproducible-builds.org/ is a very interesting resource though. In particular the tools they provide might be useful if we want to guarantee reproducible builds for manylinux1 images: |
Beta Was this translation helpful? Give feedback.
-
The hashes for the python binaries for instance do not match. |
Beta Was this translation helpful? Give feedback.
-
If/when you switch to CentOS 6, seriously consider just using devtoolset-4 if it is an option. It has a very recent C++ compiler with full C++14 support. Just something to thing about. |
Beta Was this translation helpful? Give feedback.
-
I am opening this issue to keep track of the open questions raised in the discussion at #44 (comment).
An attacker might find ways to silently install a rootkit in the binaries (especially the gcc and patchelf commands) of our quay.io hosted docker images. The attack could happen on quay.io, on github.com, on the travis build machine or on one of the third party resources we fetch software from in our build scripts (centos repositories, patchelf source repository and others). At the moment we have no easy way to detect such attacks.
One way we could at least detect that something is wrong would be to compute the sha256sum of all the binaries of our docker images and store that list of hashes offline and maybe a also hash the hash list could be pushed to an independent append-only time-stamped public log (for instance a dedicated twitter account).
We should also probably setup some automated CI bot to periodically recompute the sha256sum list of all the files in the public quay.io hosted images and compare them to the matching entry of the append-only time-stamped public log.
Beta Was this translation helpful? Give feedback.
All reactions