Merge pull request 'data migration notes' (#29) from data-model into main

Reviewed-on: Fediversity/meta#29
2025-05-31 18:48:25 +02:00 · 2025-05-31 18:48:25 +02:00 · bc59794685
commit bc59794685
parent 5803f0bd1e 4234ff4b33
1 changed files with 21 additions and 119 deletions
--- a/architecture-docs/data-model-requirements.md
+++ b/architecture-docs/data-model-requirements.md
@ -1,126 +1,28 @@
 # migration data model requirements
-To transfer between two providers, the target provider must be able to import the sending provider's versions. (e.g.: a deployment may have latest fediversity, latest pixelfed, but previous mastadon) Thus, for each "realease" of the data model, it needs to be versioned, and applications/APIs also are versioned.
+Given:
 * (May need a way to show on the front-end which versions are in place, and which migrations are supported. However, for application versions which are completely controlled by the installation and setup, this is "solved".)
-for release version 0, focus on known current needs
+- no change in control of domains;
- * to be expanded later as each new application is added and can be transferred between providers
+- two Fediversity set-ups (to be provided by ProcoliX) with a run-time environment such as ProxmoX, for an initial test using the same version;
- * review migration guides for the known apps with an eye to odd/unusual details that influence design choices (task for Niols? others?)
+- an operator's configuration, including:
  - DNS automation hooks for the desired domain (RFC 2136, optionally authenticated by TSIG (RFC 2845) or GSS-TSIG (RFC 3645));
  - a Fediversity configuration of at least a single application (to start).
-Specifically, this suggests scoping to migrating:
+Our data model must describe a migration:
- managed infrastructure (rather than managed applications)
+- specifying [entity relations](https://mermaid.js.org/syntax/entityRelationshipDiagram.html#relationship-syntax) e.g. many-to-many;
- between servers owned by procolix
+- migrating both deployed and staged configurations;
- same proxmox version
+- deploying of applications using the same versions;
- NixOS VMs set up by us so we can guarantee identical application versions
+- retaining relevant application state;
- hosting limited to a single application (to start)
+- handling of application-specific migration logic, such as to rewrite URLs as needed;
-First, a bit of an inventory (list without much structure now, later will create structured form/schema with e.g. many-to-many links, useful for the migration code):
+Tests:
 * clearly mark items that will not be in the first migration as eventually or speculative
 * or reamove them if they would be too far in the future
 * later we understand what is useful for migration code, we can extract and transform in to a format suitable as data model documentation
-Hosting Provider provides:
+1.
-* proxmox, git
+    A Fediversity user may wish to migrate their Fediversity set-up between monolithic and distributed configurations.
-* hardware
+    In an admin screen they can get their configuration and data for transfer.
-* filesystem storage
+    Using this they may migrate to the desired configuration.
-* DNS automation hooks?
+1.
-* central/shared garage storage or only hardware+diskspace for the garage VMs to create storage?
+    At any time a Fediversity user may wish to migrate their Fediversity set-up.
-  * with central: more efficient but less isolated
+    They can go to an admin screen where they can get their configuration and data for transfer.
-
+    This data can be provided to a new service provider where they will be up-and-running again, with minimal downtime.
 FooUniversity (Operator)
 * invoice info
  * is all info expected to be transferred from provider A to provider B?
    * May not want to transfer e.g. bank details, because they are already set up at B
    * May also depend on regulation (which information are you allowed to hand out?)
 * Admins:
  * credentials
 * persistent identifiers
  * mappings between them (also need to travel across providers)
    * e.g. if we can't change content URLs, we may need to create (and from then on carry around) a redirects mapping
    * those mappings are likely application-specific, but they all belong to the same type class
 * domain(s)
    * what is needed for DNS management?
    * users
      * display name
      * email(s)
      * login id
        * oauth2 (eventually)
        * 2fa
        * password
        * passkeys (eventually)
        * LDAP? (eventually?)
    * all applications
      * sub domain ( social.example.org vs example.org/social )
      * info for proxmox setup such as to provision VMs (to reproduce proxmox )
        * mem
        * cpus
        * storage mounts
        * IPs likely not the same in the target network
      * storage
        * filesystem
           * very well specified per application
        * blob storage config (garage, s3-like)
           * index
           * Can we make it a requirement that Garage is behind a predictable URL, eg. `<application>.garage.<customer domain>`? As opposed to something vendor-specific, eg. `pixelfed-university.garage.procolix.com/<customer domain>/<application>`
           * may need to rewrite URLs to blobs automatically, depending on the underlying URL scheme, which may be per setup or application
        * limits? per application? per user? where are these used/set/enforced?
        * TODO: what does e.g. borgmatic need to back up storage?
        * out of scope?: focus on actual state, disregarding reconstructable stuff
      * SQL database
        * dump/snapshot
        * TODO: what does e.g. borgmatic need to back up databases?
 * application specifics
  * postfix? (is email in version 0?)
  * pixelfed
    * where is blob storage
      * in the specific case of Pixelfed, if blob storage changed URL, we might need to rewrite the pictures URLs in the database (try to avoid this)
    * redis (in the case of pixelfed, it is not just a cache)
      * misc config: theme, name of instance, email of sysadmin
    * database
    * on-disk files
      * Daniel Supernault is currently making it so evertying can be stored remotely in a garage or sql database
    * users (login id) (in database? in redis?)
      * user preferences/settings
  * peertube
  * mastodon
  * matrix? (is it in version 0?)
 * logos
 Other considerations:
 - Put a boundary for what is
  - operator-configurable
  - needs to get fixed, but at the implementation level
  - what can be configured dynamically per environment
 - Most importantly we need to preserve persistent identifiers
 - When transforming the data-model code to a deliverable version of the data model as part of the technical architecture document, documenting user-data storage and with respects fot security and GDPR
 See also:
 - possible overlap/inspiration: Stalw.art [configuration docs](https://stalw.art/docs/server/general)
 ## MVP scoping ideas
 User story 1: New customer
 When a new customer goes to the Fediversity website we want to show that user what Fediversity is all about and what it can give to the customer. This points the customer to a signup form where they can enter all the details that are needed to get it working. Here they can also decide what applications to use (at first no more than three). Details can be, the user/admin login, name, address, bank details, domain, other users, and applications. Than when the customer hits the install/provision/go button everything starts to install automagically. After which the customer is presented with (some) url's to login to.
 User story 2: Take out / move to other instance
 At any time a customer may wish to change service providers. They can easily go to an admin screen where they can get their configuration and data packaged for transfer. This packaged data can be provided to a new service provider where they will be up-and-running again easily, with minimal downtime.
 proposed MVP scope:
 - block storage
 - blob storage (garage)
 - physical servers
 - proxmox vm management
 - nixops service
 - nixops scripts
 - 1 to 3 applications packaged in Nix (Mastodon, Peertube, Pixelfed)
 - frontend / website
 - working dns, can be external, but automated
 - takeout area
 - import area
 - 2 Fediversity environments to transfer between
 - demonstration of User story 1
 - demonstration of User story 2