add notes on storage architecture discussion

This commit is contained in:
Valentin Gagarin 2024-12-09 13:35:17 +01:00
parent cd4d59942a
commit 3500bc34a7

View file

@ -0,0 +1,32 @@
# Storage architecture with Garage
Attendees: Koen, Valentin, Nicolas
* (some historical/technical background discussion on object storage and file systems)
* Koen: 1.5 y ago looked into available FOSS solutions, only minio and Garage
* Garage was NGI-funded and located in France and responsive
* Targeted at self-hosting, minimal hardware requirements
* Ideally would have used a W3C standard protocol, but "S3-compatible" is the closest thing
* Should have a few big, replicated, well-monitored storage clusters instead of many small ones
* Currently have 3x4Ux32drive storage boxes; would replicate them over 3 locations
* (discussion of ZFS performance characteristics)
* Koen presented a storage architecture based on ZFS
* Main feature (not clear on the details) is to add a NVMe-based primary write target to Garage
* e.g. 45TB NVMe + (2x?) 560 TB (1PB raw) HDD
* According to independent reviews[citation needed] Garage handles this sort of setup better than minio
* Requirement estimates:
* mastodon.nl has 50k users, 5k active users, 200GB database (ever-growing, launched Nov 2022), 1.5 TB of files
* Hot data is <~5 days old, most of the rest is never looked up
* Most of the usage profiles intel will have to come from our university partners
* Will scale the physical storage to the actual write capacity requirements at the spindle layer (add more vdevs as needed)
* Additional object storage replication can be configured at the S3 (Garage) layer by setting another target at a different provider
* We expect this to happen rarely; replication will likely take a lot of performance out of the application
* By default we already have 3-way replication internally (i.e. 3 RAIDZ2 machines per Garage instance)
* There are already three sites in NL to rack the hardware
* This is why we need to connect NetherLight so NORDUnet universities have a low-latency connection
* Koen: Ideally at the end of next year you'd have three sites set up with machines to run Proxmox on and the storage machines, and run a NixOS ISO to set everything up and then control it via NixPanel
* Nicolas: We should talk to Garage developers, because they run it on NixOS but not sure how exactly
* Koen: Ideally before FOSDEM 2025
* Valentin: Need reference documentation for the storage architecture discussed (so we can point to something when making implementation decisions), and a slide deck to tell the story of how it came into being (as a sales device for the project), and eventually derive business plans for hosting providers and operators from that
Hand-drawn sketches posted to the internal private repo