| benches | ||
| bench-option-times.nix | ||
| bench-option-times.xsh | ||
| bench-overrides.nix | ||
| bench-overrides.xsh | ||
| overrides-e2.nix | ||
| overrides-e3.nix | ||
| overrides-e4.nix | ||
| overrides-e5.nix | ||
| overrides-e6.nix | ||
| README.md | ||
RFD: Modular Dynamicism in NixOS
Definitions
Applying a NixOS configuration goes through multiple stages. For our purposes, these are:
- Evaluation (eval), during which the Nix interpreter evaluates Nix language source code, turning a web of NixOS modules into derivations for the system
- This is what you can observe in the REPL
nixos-rebuild dry-buildnixos-rebuild replnix eval ".#nixosConfigurations.$HOSTNAME.config.system.build.toplevel.outPath"
- Build, during which the Nix daemon runs derivation builders, which output collections of symlinks, generated config files, activation script files, etc
- This is when files and directories are organized into the tree for the system configuration
nixos-rebuild buildnix build ".#nixosConfigurations.$HOSTNAME.config.system.build.toplevel"
- Activation, during which a built configuration is applied via activation scripts
- This is when
/run/current-systemchanges, when systemd units are reloaded, when agenix sets up/run/agenix, etc nixos-rebuild switch,nixos-rebuild test./result/activate
- This is when
- Runtime, when the NixOS configuration has been applied and the system is running normally
In this document these will be respectively called eval time, build time, activation time, and runtime.
Motivation
A NixOS system configuration is a monolith. This is kind of good! It's how we get such reproducibility, and it means you can introspect a lot about the configuration just by looking at the config.system.build.toplevel derivation. But it also means eval times are long (9.9 seconds for my desktop's system closure on a Ryzen 7950X), and the smallest change must rebuild the entire system-path (/run/current-system/sw). Tweaking a single value in a service's config explodes into rebuild the entire systemd service tree, the entire /etc/static tree, etc.
There are only very limited facilities to allow mutation in some places but not others, or to keep certain data out of the Nix store. Existing solutions are either highly domain specific (agenix), or blanket toggles without programmatic changes (system.etc.overlay.mutable). It's not currently possible to, for example, reactively change the configured threads available to a web service in response to current load.
This RFD proposes a modular solution to dynamically and even programmatically modify the effective NixOS configuration of a running system.
Rationale
The driving goal is to break up the system configuration. The system configuration, config.system.build.toplevel, requires resolving all used NixOS options, but this is not the case for most other options. NixOS options are used to model service and application settings, and these options can be evaluated without pulling in the rest of the system. This makes all the difference: 10 seconds for a system closure, but 500 milliseconds for config.nix.settings. This NixOS module system is not inherently slow; propagating options is slow, and Nixlang-implemented file format serializers (lib.generators/pkgs.formats) are slow. This means we can cheaply re-evaluate specified parts of a system configuration, especially if we keep to most structured, most independent options. For example:
config.boot.kernel.sysctlis fast, butconfig.environment.etc."sysctl.d/60-nixos.conf".textis slowerconfig.fileSystemsis fast, butconfig.environment.etc.fstab.textis slowerconfig.services.openssh.settingsis fast,config.systemd.services.sshd.runner.textis slower, andconfig.systemd.units."sshd.service".textis slowest
By breaking up the system configuration, we unfortunately lose the ability to seamlessly rollback these dynamic values with NixOS generations. But we can recreate our own form of generations and rollback with another NixOS modules feature: priority. lib.mkForce and lib.mkDefault are the most commonly used priorities, but arbitrary integer priorities are available with lib.mkOverride, and this is extremely cheap. Hundreds of overrides has a sub-measurable impact on evaluation time, and even millions of overrides multiplies evaluation time by only a single order of magnitude. This means we can change NixOS option values in an "append-only" manner, keeping a record of all previous changes inside the module system itself.
Benchmarks using NixOS Tests
Nixpkgs's NixOS tests themselves use NixOS configurations, which we can query like any other NixOS configuration to get a minimal but reasonable example of usage of these services. The benchmarks below are performed with hyperfine --shell=none, on an AMD Ryzen 9 9050X, and using nix-instantiate --quiet --eval --no-eval-cache --json --strict.
Jenkins
Options are relative to nixpkgs#nixosTests.jenkins.nodes.master.
| NixOS Option | Eval Time |
|---|---|
system.build.toplevel.outPath |
3.257 s |
systemd.units."jenkins.service".text |
773.3 ms |
services.jenkins.jobBuilder |
401.0 ms |
Grafana
Options are relative to nixpkgs#nixosTests.grafana.provision.nodes.provisionNix. I've split Grafana's settings options into two benchmarks: one with all the option values, and one with all the option values except exactly one: services.grafana.settings.paths.provisioning. The default value for that option is a derivation that depends on multiple other values in services.grafana.settings.*, and uses pkgs.formats.yaml. That single option adds 512 ms, a 42% increase!
| NixOS Option | Mean Eval Time |
|---|---|
system.build.toplevel.outPath |
2.224 s |
systemd.units."grafana.service".text |
1.320 s |
services.grafana.settings w/o drv |
718.0 ms |
services.grafana.settings w/ drv |
1.230 s |
GotoSocial
Options are relative to nixpkgs#nixosTests.gotosocial.nodes.machine.
| NixOS Option | Mean Eval Time |
|---|---|
system.build.toplevel.outPath |
1.704 s |
systemd.units."gotosocial.service".text |
880.7 ms |
services.gotosocial.settings |
445.3 ms |
Proposal
I propose to explore the space of using the NixOS module system itself to declaratively defer dynamic configurations to activation-time or runtime. I will explore the API design of specifying dynamic configurations, with one possibility being something as simple as services.foobar.settings.jobs = lib.mkDynamic 16. I will also explore the design options for stubbing out the static values and applying the dynamic ones. Some possibilities include:
- Making the eval time systemd unit file irrespective of
services.foofar.settings, pointing instead to a mutable path (e.g.,/run/nixos/foobar.conf) which will be generated at activation time.- This is what agenix does for secrets — the generated unit files do not change even when the secret contents change.
- Generating a stub systemd unit file at eval time, and generating and applying systemd unit drop-ins at activation time.
- With the unit file contents generation at activation time, we could use much faster implementations than those in Nixpkgs.
- New drop-ins can consecutively override prior generated drop-ins.
- Entirely deferring unit file generation to activation time, and using entirely runtime unit files (
/run/systemd).
The final approach may be some mix of these, or something else entirely if exploration brings new discoveries to light, but by encoding this information in the module system, we can stub-out or swap out implementations freely.
Pseudocode
User configuration to make an option dynamic, with an initial value:
{ lib, ... }:
{
services.some-web-service.settings.jobs = lib.mkDynamic 16;
}
Service unit:
[Service]
ExecStart=/nix/store/…some-web-service/bin/some-web-serviced --config /run/nixos/some-web-service/some-web-service.toml
Load balancer:
if vm.is_overloaded():
append_to_nix_config(f"services.some-web-service.settings.jobs = lib.mkPriority {self.next_priority()} {self.jobs * 2}")
subprocess.check_call("/nix/var/nix/profiles/system/activate")
Activation script (config.system.activationScripts."dynamic-webserviced"):
currentGeneration=$(readlink /run/nixos/some-web-service) # /run/nixos/some-web-service.d/generation-19
nextGeneration=$(incrementGeneration $currentGeneration) # /run/nixos/some-web-service.d/generation-20
mkdir -p "$nextGeneration"
nix eval --json -f '/etc/configuration.nix' 'config.services.some-web-service.settings' | json2toml > "$nextGeneration/some-web-service.toml"
ln -sf "$nextGeneration" "/run/nixos/some-web-service"
systemctl restart some-web-service.service
Alternatives
If later findings determine the above approach is not ideal, we may investigate alternatives, which might include:
- Writing a CLI tool with a Nix language parser to manipulate option values in source text directly.
- Moving all dynamicism to runtime, taking activation time out of the equation.
- Use classic monolithic NixOS configurations and simply continuously re-deploy.
- Using other Nix features such as impure derivations.