meta/meeting-notes/2024-11-20-architecture-meetup/2024-11-21-roberth-comments-post-meeting.md

7.6 KiB

Architecture diagram

  • Split out central database, netbox. They're separate databases "Operator" accounts and high-level configuration => NixPanel database
  • Netbox will be accessed as a resource; perhaps move down
    • Netbox IP resource (IP allocation resource)
  • Split up into nodes for each db so that it's clear which parts talk to which state.
  • Nix configurator: NixOS is just a resource to NixOps, as Valentin says
  • (PlantUML diagram) is this a package diagram? Package diagrams are meant to describe the application level structure (e.g. within NixPanel) rather than the system level structure (e.g. between NixPanel and NixOps) The arrows are supposed to be code dependencies, but many aren't. They seem to be some sort of interactions.

Misc Notes

Valentin is on point about the NixOps decoupling

Ansible could be invoked by NixOps, over ssh remotely. Only if there's a need. 100% on restricting to NixOS for now.

Platform definitions

Running some "fixed" infrastructure on non-NixOS, like Proxmox, Garage, etc is fine. We do have an ambition to deploy everything from hardware using NixOps and NixOS, but this is not a blocker. The way to think of this is what is the platform onto which a Fediversity is deployed:

  • current platform: Proxmox + Garage + deSEC + netbox
  • future alternative platform: bare metal
  • other future alternative platform: Google Cloud, OVH, AWS, Scaleway, etc
    • no ambition to do this as part of Fediversity, but someone else may want to develop this
    • Nix language is capable of such abstractions
Example: Object Storage

Depending on the platform, the NixOps configuration will activate different resources. For example, object storage could be provided by

  1. A provided Garage instance. The Fediversity NixOps configuration will need admin credentials to be entered. This is our first and for now only target.
  2. Bare metal, in which case the Fediversity NixOps configuration may allocate VMs that run Garage, and configures Garage to give NixOps admin credentials.
  3. A cloud native S3 compatible service
Abstractions in Nix

In NixOps4 we use the module system to perform abstractions. Possible pattern:

  • Option objectStorage.backend.s3: submodule with options to access an S3 compatible service

    • configured by hand when deploying onto an existing S3 service, such as Garage on Debian, or a cloud provider
  • Option objectStorage.backend.garage.enable: whether to deploy a Garage cluster

  • Option objectStorage.buckets.<name>: submodule with information about the bucket

  • Config resources."bucket-${name}": the various bucket resources, derived from the objectStorage.buckets option set

  • Config resources.garage.resources."${...}", conditional on objectStorage.backend.garage.enable: the resources to deploy Garage onto Proxmox or bare metal

The infra layer might be best modeled as a separate NixOps deployment. It does not need to run as part of the NixPanel user interface, but run by those who run the service provider.

Secrets

  • Sources
    • NixOps user level (workstation, not copied anywhere)
      • user ssh private key is reused by NixOS resource
      • credentials to access the NixOps state (e.g. in bucket object or over ssh)
      • (anything that bootstraps access to the NixOps state)
    • Outputs of resources will contain credentials (stored in NixOps state)
      • e.g. the admin password for anything created through NixOps resources
    • Sensitive infrastructure parameters inserted into the NixOps state initially
      • e.g. using a pinentry resource
      • or dummy resource that fails, but whose output attributes can be overridden by the NixOps CLI
  • Destinations
    • Files on NixOS machines
      • copied to very private directory over ssh (root-only)

      • copied to their final place by NixOS systemd units

        • work with NixOS community to standardize this
        • e.g. each service specifies with options where it expects its secrets, and waits for secret-specific systemd units using some convention
        • nixops4-nixos, agenix, sops-nix, etc. can all implement this interface
      • handled in such a way that they don't end up in the Nix store (use protective wrapper when passed around in the Nix language)

    • NixOps resources
      • Passing one resource's output to another resource's input: the purpose of NixOps
      • NixOps state
        • Encrypted at rest
  • Process
    • Simplest is to pass credentials around through NixOps, but indirections are possible. Benefits:
      • Implement key rotation without redeployment => vault-style secret management with short-lived secrets so that credential leaks are less damaging and less valuable
      • Fewer places with secrets => brings more structure to the organization's credentials management
        • Only if all secrets from the NixOps state can be moved over to the secret management system.
        • This will require more complexity in NixOps; would not recommend for now
        • Single point of failure, or "use one hinge and oil it well" -- paraphrasing age authors or reference, out of context
    • vaultwarden can be written to by NixOps as needed, so that it can grant access declaratively

Security: compromise of a NixOps user compromises the whole deployment, and same for the NixOps state. This probably includes equivalents of aforementioned "NixOps user level" secrets

  • State is likely compromised by compromising a NixOps user
  • NixPanel credentials are probably in the state, and they will need to have similar capabilities to those of the NixOps user credentials Configuration management is expected to be able to do anything. (No need to freak out, but do use caution.)

Terminology

  • operator: I'm not super happy about "operator" instead of e.g. "customer". You're ignoring the Ops in NixOps. I can accept that, but it's not great communication.

    For NixOps it makes sense to use "operator" and "author" roles to describe its users. I don't see a better alternative for NixOps.

  • target platform: Core services, infrastructure services, etc may be useful terms to distinguish the roles in terms of behavior and functionality. I think the missing term is "target platform", which captures the boundary of the scope of the Fediversity project. As described, this scope can be variable.

    For example, infrastructure services like Garage or email can each individually be part of the target platform, as they are right now, or they can become part of Fediversity's infrastructure services.

    Being managed by Fediversity code is a less intrinsic property than it being "infrastructure", although arguably that's still a bit fuzzy; S3 not so much, but email is.

Meta notes

  • My contract is to develop NixOps, not NixPanel. I give advice.
  • Usually "foo is done by NixOps" does not mean that it is implemented in the NixOps project, but rather that some action is initiated by calling a NixOps command, and the details of the operation are declared in a combination of
    • NixPanel output of some sort
    • A NixOps resource provider implementation
    • A set of Nix expressions developed by the Fediversity project
    • A Nix expression that's specific to a Fediversity deployment, which refers to the generic Fediversity expressions (kept to a minimum, things like does Fediversity deploy a garage or is it provided by the underlying platform)