Hashicorp's Vault, mlock and LXD
October 14, 2018. 746 words.
When using the LXD OS-container, either for testing purposes or as regular means for environment isolation, special requirements need to be met in non-standard ways.
Finding out how to satisfy the
mlock-requirement when deploying Hashicorp’s Vault turned out to be such a non-standard way, under-documented, barely hinted at, difficult to find.
When dealing with systems which encrypt data, it is imperative that the keys to en- and decrypt remain securely guarded. Processes using encryption usually do not store the keys on disk, where they are vulnerable to leakage. Instead, keys must either be entered at run-time, or the keys themselves are encrypted and require a secret to unlock. In advanced systems, such an unlocking or unsealing secret may be composed of a number of secrets, of which a minimal subset must be supplied. “Shamir’s Secret Sharing”“Shamir’s Secret Sharing”on Wikipedia. For any polynomial of degree k, a number of at least k + 1 points must be known to reconstruct the polynomial. is one of the better known algorithms of this class.
Then, the secret will only ever be readable from memory and never be committed to disks in clear text. For running processes, usually the operating system ensures that no process can read another processes memory (Meltdwon and Spectre aside), which secures the confidentiality of the secret. When memory is committed to disk as in the event of swapping, though, there is no guarantee that, in such a time when the operating system may be down, a third party may access the swap volume and extract memory contents and thus, the secret, therefrom.
Unix kernels have a system call
mlock(2) - Linux Man Page: mlock, munlock, mlockall, munlockall - lock and unlock memory. , which, as Evan Klitzke takes great care to explainEvan Klitzke: Misunderstanding mlock(2) and mlockall(2).
2015. , is meant to prevent certain pages being swapped to disk.
Because preventing swapping may negatively impact a system’s performance when processes monopolize physical memory at the expense of other processes - which they just might be inclined to do to increase performance - kernel limits regulate how many bytes a userland process may lock in memory.
ulimit -l yields a size of 16kB on my system.
A binary with the
IPC_LOCK capability may then, when called with a user’s privileges, lock memory accoding to the limits set, which is the amount of
In a systemd-unit, the capability can be granted implicitly including
LimitMEMLOCK=infinity into the unit, as the Hashicorp documentationHashicorp: Vault Configuration. suggests.
When packaging the
vault binary into an OS-container like LXD, this is not sufficient.
Bisecting manually in powers of 2, I have found the
vault binary to require a staggering lock-limit of 256kB from the “inside of the container” perspective, while in 2016Fajar A. Nugraha: Capabilities (mlock) in unprivileged containers. 2016. credible sources report 32kB.
This may to the inherent skeptic again prove the CS variation of Parkinson’s Law: “Programs expand to fill the memory available to hold them.”
However, consider that not only the secret to unseal the vault, but also the secrets stored by the vault may be in memory:
Then, 256kB may easily be consumed by a mere fifty to hundred certificate keys, and I believe when certificates are stored as key-value pairs, it is not possible to disambiguate between certificate and key.
256kB do not seem excessive then.
From the “outside of the container” perspective, I have yet to understand the results of the experimentation. The LXD unit needs to be amended with
Apparently, the limit seen inside (
ulimit -l) is a thousandth of the limit set on the outside, which I have difficulties to grasp.
I have yet to find any documentation explaining this observation and will probably try to reach out to the LXD people for some clarification.
I will amend my post accordingly should that quest bear any fruit.