- Setup
- Code quality
- Release management
- APIs
- Switching device mode
- Accessing system directories in a container
- Repository elements which are generated or created separately
- Edit, build, and deploy the Read the Docs site
-
Use
make build-images
to produce Docker container images. -
Use
make push-images
to push Docker container images to a Docker image registry. The default is to push to a local Docker registry. Some other registry can be configured by setting the variables described in in the test-config.sh file, see the configuration options section below. Alternatively, the registry can also be set with a make variable:make push-images REGISTRY_NAME=my-registry:5000
See the Makefile for additional make targets and possible make variables.
The source code gets developed and tested using the version of Go that
is set with GO_VERSION
in the Dockerfile. Some other
version may or may not work. In particular, test_fmt
and
test_vendor
are known to be sensitive to the version of Go.
The normal Go style guide applies. It is enforced by make test
, which calls gofmt
.
In most cases, input comes from a trusted source because network
communication is protected by mutual TLS and the kubectl
binaries
runs with the same privileges as the user invoking it.
Nonetheless, input needs to be validated to catch mistakes:
- detect incorrect parameters for
kubectl
- ensure that messages passed to gRPC API implementations in registry, controller and driver have all required fields
- the gRPC implementation rejects incoming messages that are too large (https://godoc.org/google.golang.org/grpc#MaxRecvMsgSize) and refuses to send messages that are larger (https://godoc.org/google.golang.org/grpc#MaxSendMsgSize)
- webhook and metrics SDK code does input validation before invoking PMEM-CSI
The master
branch is the main branch. It is guaranteed to have
passed full CI testing. However, the Dockerfile uses whatever is
the latest upstream content for the base distribution and therefore
tests results are not perfectly reproducible.
The devel
branch contains additional commits on top of master
which might not have been tested in that combination yet. Therefore it
may be a bit less stable than master
. The master
branch gets
advanced via a fast-forward merge after successful testing by the CI job
that rebuilds and tests the devel
branch.
Code changes are made via pull requests against devel
. Each of them
will get tested separately by the CI system before merging, but only a
subset of the tests can be run due to time constraints.
Beware that after merging one PR, the existing pre-merge tests results
for other PRs become stale because they were based on the old devel
branch. Because devel
is allowed to be less stable than master
, it
is okay to merge two PRs quickly after one another without
retesting. If two PRs that merged that don't have code conflicts
(which would get detected by GitHub) but which nonetheless don't work
together, the combined testing in the devel
branch will find
that. This will block updating master
and thus needs to be dealt
quickly.
Releases are created by branching release-x.y
from master
or some
older, stable revision. The actual vx.y.z
release tags are set
on revisions in the corresponding release-x.y
branch.
Releases and the corresponding images are never changed. If something goes wrong after setting a tag (like detecting a bug while testing the release images), a new release is created.
Container images reference a fixed base image. To ensure that the base image remains secure, it gets scanned for known vulnerabilities regularly and a new release is prepared manually if needed. The new release then uses a newer base image.
The devel
and master
branch build and use the canary
version of
the PMEM-CSI driver images. Before tagging a release, all of those
version strings need to be replaced by the upcoming version. All
tagged releases then use the image that corresponds to that release.
The hack/set-version.sh
script can be used to set these versions.
The modified files then need to be committed. Merging such a commit
triggers a rebuild of the devel
branch, but does not yet produce a
release: the actual image only gets pushed when there is a tag that
corresponds to the version embedded in the source code. The
Jenkinsfile ensures that.
- Create a new
release-x.y
branch. - Run
hack/set-version.sh vx.y.z
and commit the modified files. - Push to
origin
. - Create a draft release for that new branch, including a change log gathered from new commits.
- Review the change log.
- Tag
vx.y.z
manually and push to origin. - Wait for a successful CI build for that tag and promotion of the resulting images to Docker Hub.
- Publish the GitHub release.
Follow below steps to publish new operator release to OperatorHub:
- Generate OLM catalog for new release
$ make operator-generate-catalog VERSION=<X.Y.Z> #semantic version number
Running the above command generates the OLM catalog files under deploy/olm-catalog/<X.Y.Z>
- Clone
operator-framework/community-operators
repository
$ git clone https://github.com/operator-framework/community-operators.git
- Copy generated catalog files. Commit the changes and submit a pull request to community-operators repository.
$ cp -r <PMEM-CSI_ROOT>/deploy/olm-catalog/ <COMMUNITY-OPERATORS_ROOT>/upstream-community-operators/pmem-csi-operator/
$ cd <COMMUNITY-OPERATORS_ROOT>
$ git add upstream-community-operators/pmem-csi-operator/
$ git commit -s -m "Updating PMEM-CSI Operator to version <X.Y.Z>"
Kubernetes CSI API is exposed over Unix domain socket. CSI operations are executed as gRPC calls. Input data is allowed as permitted by CSI specification. Output data is formatted as gRPC response.
Following CSI operations are supported, with arguments as specified by CSI specification: CreateVolume, DeleteVolume, StageVolume, UnstageVolume, PublishVolume, UnpublishVolume, ListVolumes, GetCapacity, GetCapabilities, GetPluginInfo, GetPluginCapabilities.
Network ports are opened as configured in manifest files:
- registry endpoint: typical port value 10000, used for PMEM-CSI internal communication
- controller endpoint: typical port value 10001, used by the nodes for providing the serving CSI API to the PMEM-CSI controller
- metrics endpoint: typical port values 10010 (PMEM-CSI) and 10011 (external-provisioner)
- webhook endpoint: disabled by default, port chosen when enabling the scheduler extensions
Except for the metrics and webhook endpoint, all ports are protected via mutual TLS. The metrics endpoint and webhook are supposed to be easily usable and expose no confidential data, therefore TLS is not used.
Kubernetes CSI API used over local socket inside same host.
- unix:///var/lib/kubelet/plugins/pmem-csi-reg.sock
- unix:///var/lib/kubelet/plugins/pmem-csi/csi.sock
- unix:///var/lib/kubelet/plugins/pmem-csi/csi-controller.sock
argument name | meaning | type | range |
---|---|---|---|
-alsologtostderr | log to standard error as well as files | ||
-log_backtrace_at value | when logging hits line file:N, emit a stack trace | ||
-log_dir string | If non-empty, write log files in this directory | string | |
-log_file string | If non-empty, use this log file | string | |
-logtostderr | log to standard error instead of files | ||
-skip_headers | avoid header prefixes in the log messages | ||
-stderrthreshold value | logs at or above this threshold go to stderr (default 2) | ||
-v value | log level for V logs | int | |
-vmodule value | comma-separated list of pattern=N settings for file-filtered logging | string | |
-caFile string | Root CA certificate file to use for verifying connections | string | |
-certFile string | SSL certificate file to use for authenticating client connections(RegistryServer/NodeControllerServer) | string | |
-clientCertFile string | Client SSL certificate file to use for authenticating peer connections | string | |
-clientKeyFile string | Client private key associated to client certificate | string | |
-controllerEndpoint string | internal node controller endpoint | string | |
-deviceManager string | device mode to use. ndctl selects mode which is described as direct mode in documentation. | string | lvm or ndctl |
-drivername string | name of the driver | string | |
-endpoint string | PMEM CSI endpoint | string | |
-keyFile string | Private key file associated to certificate | string | |
-mode string | driver run mode | string | controller, node |
-nodeid string | node id | string | |
-registryEndpoint string | endpoint to connect/listen registry server | string | |
-statePath | Directory path where to persist the state of the driver running on a node | string | absolute directory path on node |
-schedulerListen | listen address for scheduler extender and mutating webhook | address string | controller |
-pmemPercentage value | represents the percentage of space to be used by the driver in each PMEM region (currently only supported by the driver in LVM mode) |
int | 0-100 |
TEST_WORK is used by registry server unit-test code to specify path to certificates in test system. Note, THIS IS NOT USED IN PRODUCTION
The klog.Info statements are used via the verbosity checker using the following levels:
- klog.V(3) - Generic information. Level 3 is the default Info log level in pmem-csi, and example deployment files set this level for production configuration.
- klog.V(4) - Elevated verbosity messages.
- klog.V(5) - Even more verbose messages, useful for debugging and issue resolving. This level is used in testing type of deployment examples.
There are also messages using klog.Warning, klog.Error and klog.Fatal, and their formatted counterparts.
If device mode is switched between LVM and direct(aka ndctl), please keep in mind that PMEM-CSI driver does not clean up or reclaim namespaces, therefore namespaces plus other related context (LVM state) created in previous mode will remain stored on device and most likely will create trouble in another device mode.
- examine LV groups state on a node:
vgs
- examine LV physical volumes state on a node:
pvs
- delete LV groups before deleting namespaces to avoid orphaned volume groups:
vgremove VGNAME
NOTE: The following WILL DELETE ALL NAMESPACES so be careful!
- Delete namespaces on a node using CLI:
ndctl destroy-namespace all --force
No special steps are needed to clean up namespaces state.
If PMEM-CSI driver has been operating correctly, there should not be
existing namespaces as CSI volume lifecycle should have been deleted
those after end of life of volume. If there are, you can either keep
those (LVM device mode does honor "foreign" namespaces and leaves those
alone) if you have enough space, or you can choose to delete those
using ndctl
on node.
The PMEM-CSI driver will run as container, but it needs access to system directories /sys and /dev. Two related potential problems have been diagnosed so far.
In some deployment schemes /sys remains mounted read-only in the
container running pmsm-csi-driver. This creates problem for the
driver which needs write access to /sys for namespaces management
operations. There is start-time check for read-write mount of /sys in
the code. An error in pod log pmem-driver: Failed to run driver: FATAL: /sys mounted read-only, can not operate
is the sign of such
state.
Containers runtime may not pass /dev from host into the container. If the /dev/ of the host is not accessible in the PMEM-CSI container, there will be failure in accessing of newly created block device /dev/pmemX.Y which will not be visible inside container. The driver does not detect the root cause of that problem during start-up, but only when a volume creation has failed. This problem can be avoided by specifying explicit mount of /dev in the PMEM-CSI manifest.
Here are creation and update notes for these elements in the repository which are not hand-edited
Two diagrams are created with dia drawing program. The single source file has layers: {common, lvm, direct} so that two diagram variants can be produced from single source. Image files are produced by saving in PNG format with correct set of layers visible. The PNG files are committed as repository elements in docs/images/devicemodes/.
This diagram was created with the dia drawing program using source file.
Image file is produced by saving in PNG format. The PNG file is committed as a repository element.
Two diagrams are generated using plantuml program. Source files:
The PNG files are committed as repository elements in docs/images/sequence/.
pkg/pmem-registry/pmem-registry.pb.go is generated from pkg/pmem-registry/pmem-registry.proto
protoc comes from package protobuf-compiler on Ubuntu 18.04
- get protobuf for Go:
$ git clone https://github.com/golang/protobuf.git && cd protobuf
$ make # installs needed binary in $GOPATH/bin/protoc-gen-go
- generate by running in ~/go/src/github.com/intel/pmem-csi/pkg/pmem-registry:
$ protoc --plugin=protoc-gen-go=$GOPATH/bin/protoc-gen-go --go_out=plugins=grpc:./ pmem-registry.proto
Table of Contents can be generated using multiple methods.
- One possibility is to use pandoc
$ pandoc -s -t markdown_github --toc README.md -o /tmp/temp.md
Then check and hand-pick generated TOC part(s) from /tmp/temp.md and insert in desired location. Note that pandoc is known to produce incorrect TOC entries if headers contain special characters, means TOC generation will be more reliable if we avoid non-letter-or-number characters in the headers.
- Another method is to use emacs command markdown-toc-generate-toc and manually check and edit the generated part: we do not show generated 3rd-level headings in README.md.
The PMEM-CSI documentation is available as in-repo READMEs and as a GitHub* hosted website. The website is created using the Sphinx documentation generator and the well-known Read the Docs theme.
Building the documentation requires Python 3.x and venv.
$ make vhtml
Sphinx uses reStructuredText (reST) as the primary document source type but can be
extended to use Markdown by adding the recommonmark
and
sphinx_markdown_tables
extensions (see conf.json).
Change the navigation tree or add documents by updating the toctree
. The
main toctree
is in index.rst
:
.. toctree::
:maxdepth: 2
README.md
docs/design.md
docs/install.md
docs/DEVELOPMENT.md
docs/autotest.md
examples/readme.rst
Project GitHub repository <https://github.com/intel/pmem-csi>
reST files, Markdown files, and URLs can be added to a toctree
. The
:maxdepth:
argument dictates the number of header levels that will be
displayed on that page. This website replaces the index.html
output of
this project with a redirect to README.html
(the conversion of the top
level README) to closer match the in-repo documentation.
Any reST or Markdown file not referenced by a toctree
will generate a
warning in the build. This document has a toctree
in:
index.rst
examples/readme.rst
Files or directories that are intentionally not referenced can be excluded
in conf.json
.
NOTE: Though GitHub can parse reST files, the toctree
directive is Sphinx
specific, so it is not understood by GitHub. examples/readme.rst
is a good
example. Adding the :hidden:
argument to the toctree
directive means
that the toctree
is not displayed in the Sphinx built version of the page.
This project has some custom capabilities added to the conf.py to fix or improve how Sphinx generates the HTML site.
-
Markdown files: Converts references to Markdown files that include anchors.
[configuration options](autotest.md#configuration-options)
-
reST files: Fixes explicit links to Markdown files.
`Google Cloud Engine <gce.md>`__
-
Markdown files: Fixes references to reST files.
[Application examples](examples/readme.rst)
-
Markdown files: Fixes links to files and directories within the GitHub repo.
[Makefile](/Makefile) [deploy/kustomize](/deploy/kustomize)
Links to files can be fixed one of two ways, which can be set in the conf.py.
baseBranch = "devel" useGitHubURL = True commitSHA = getenv('GITHUB_SHA') githubBaseURL = "https://github.com/intelkevinputnam/pmem-csi/"
If
useGitHubURL
is set to True, it will try to create links based on yourgithubBaseURL
and the SHA for the commit to the GitHub repo determined by the GitHub workflow on merge). If there is no SHA available, it will use the value ofbaseBranch
.If
useGitHubURL
is set to False, it will copy the files to the HTML output directory and provide links to that location.NOTE: Links to files and directories should use absolute paths relative to the repo (see Makefile and deploy/kustomize above). This will work both for the Sphinx build and when viewing in the GitHub repo.
Links to directories are always converted to links to the GitHub repository.
The publish workflow is run each time a commit is made to the designated branch and pushes the rendered HTML to the gh-pages branch. Other rules can be created for other branches.
on:
push:
branches:
- devel
NOTE: Create a secret called ACCESS_TOKEN
in repo>settings>secrets with a token generated by a user with write privileges to enable the automated push to the gh-pages branch.