diff --git a/matrix/README.md b/matrix/README.md index 02ac5089..780f5015 100644 --- a/matrix/README.md +++ b/matrix/README.md @@ -5,10 +5,13 @@ include_toc: true # A complete Matrix installation -This is going to be a Matrix installation with all bells and whistles. Not -just the server, but every other bit that you need or want. +This documentation describes how to build a complete Matrix environment with +all bells and whistles. Not just the Synapse server, but (almost) every bit +you want. + +The main focus will be on the server itself, Synapse, but there's a lot more +than just that. -We're building it with workers, so it will scale. ## Overview @@ -24,24 +27,65 @@ conferencing * [Consent tracking](https://element-hq.github.io/synapse/latest/consent_tracking.html) * Authentication via -[OpenID](https://element-hq.github.io/synapse/latest/openid.html) -* Several [bridges](https://matrix.org/ecosystem/bridges/) +[OpenID](https://element-hq.github.io/synapse/latest/openid.html) (later) +* Several [bridges](https://matrix.org/ecosystem/bridges/) (later) -# Synapse +# Overview -This is the core component: the Matrix server itself. +This documentation aims to describe the installation of a complete Matrix +platform, with all bells and whistles. Several components are involved and +finishing the installation of one can be necessary for the installation of the +next. -Installation and configuration is documented under [synapse](synapse). +Before you start, make sure you take a look at the [checklist](checklist.md). + +These are the components we're going to use: -# nginx +## Synapse + +This is the core component: the Matrix server itself, you should probably +install this first. + +Because not every usecase is the same, we'll describe two different +architectures: + +** [Monolithic](synapse) + +This is the default way of installing Synapse, this is suitable for scenarios +with not too many users, and, importantly, users do not join many very crowded +rooms. + +** [Worker-based](synapse/workers) + +For servers that get a bigger load, for example those that host users that use +many big rooms, we'll describe how to process that higher load by distributing +it over workers. + + +## PostgreSQL + +This is the database Synapse uses. This should be the first thing you install +after Synapse, and once you're done, reconfigure the default Synapse install +to use PostgreSQL. + +If you have already added stuff to the SQLite database that Synapse installs +by default that you don't want to lose: [here's how to migrate from SQLite to +PostgreSQL](https://element-hq.github.io/synapse/latest/postgres.html#porting-from-sqlite). + + +## nginx We need a webserver for several things, see how to [configure nginx](nginx) here. +If you install this, make sure to check which certificates you need, fix the +DNS entries and probably keep TTL for for those entries very low until after +the installation, when you know everything's working. -# Element Call + +## Element Call Element Call is the new way to have audio and video conferences, both one-on-one and with groups. This does not use Jitsi and keeps E2EE intact. See @@ -51,7 +95,7 @@ how to [setup and configure it](element-call). # Element Web This is the fully-fledged web client, which is very [easy to set -up](element-call). +up](element-web). # TURN @@ -60,8 +104,8 @@ We may need a TURN server, and we'll use [coturn](coturn) for that. It's apparently also possible to use the built-in TURN server in Livekit, -which we'll use if we use [Element Call](call). It's either/or, so make sure -you pick the right approach. +which we'll use if we use [Element Call](element-call). It's either/or, so make +sure you pick the right approach. You could possibly use both coturn and LiveKit, if you insist on being able to use both legacy and Element Call functionality. This is not documented here @@ -72,3 +116,4 @@ yet. With Draupnir you can do moderation. It requires a few changes to both Synapse and nginx, here's how to [install and configure Draupnir](draupnir). + diff --git a/matrix/checklist.md b/matrix/checklist.md new file mode 100644 index 00000000..da10d48f --- /dev/null +++ b/matrix/checklist.md @@ -0,0 +1,97 @@ +# Checklist + +Before you dive in and start installing, you should do a little planning +ahead. Ask yourself what you expect from your server. + +Is it a small server, just for yourself and some friends and family, or for +your hundreds of colleagues at work? Is it for private use, or do you need +decent moderation tools? Do you need audio and videoconferencing or not? + + +# Requirements + +It's difficult to specify hardware requirements upfront, because they don't +really depend on the number of users you have, but on their behaviour. A +server with users who don't engage in busy rooms like +[#matrix:matrix.org](https://matrix.to/#/#matrix:matrix.org) doesn't need more +than 2 CPU cores, 8GB of RAM and 50GB of diskspace. + +A server with users who do join very busy rooms, can easily eat 4 cores and +16GB of RAM. Or more. Or even much more. If you have a public server, where +unknown people can register new accounts, you'll probably need a bit more +oompf (and [moderation](draupnir)). + +During its life, the server may need more resources, if users change +their behaviour. Or less. There's no one-size-fits-all approach. + +If you have no idea, you should probably start with 2 cores, 8GB RAM and some +50GB diskspace, and follow the [monolithic setup](synapse). + +If you expect a higher load (you might get there sooner than you think), you +should probably follow the [worker-based setup](synapse/workers), because +changing the architecture from monolithic to worker-based once the server is +already in use, is a tricky task. + +Here's a ballpark figure. Remember, your mileage will probably vary. And +remember, just adding RAM and CPU doesn't automatically scale: you'll need to +tune [PostgreSQL](postgresql/README.md#tuning) and your workers as well so +that your hardware is optimally used. + +| Scenario | Architecture | CPU | RAM | Diskspace (GB) | +| :------------------------------------ | :-----------------------------: | :----: | :----: | :------------: | +| Personal, not many very busy rooms | [monolithic](synapse) | 2 | 8GB | 50 | +| Private, users join very busy rooms | [worker-based](synapse/workers) | 4 | 16GB | 100 | +| Public, many users in very busy rooms | [worker-based](synapse/workers) | 8 | 32GB | 250 | + + +# DNS and certificates + +You'll need to configure several things in DNS, and you're going to need a +couple of TLS-certificates. Best to configure those DNS entries first, so that +you can quickly generate the certificates once you're there. + +It's usually a good idea to keep the TTL of all these records very low while +installing and configuring, so that you can quickly change records without +having to wait for the TTL to expire. Setting a TTL of 300 (5 minutes) should +be fine. Once everything is in place and working, you should probably increase +it to a more production ready value, like 3600 (1 hour) or more. + +What do you need? Well, first of all you need a domain. In this documentation +we'll use `example.com`, you'll need to substitute that with your own domain. + +Under the top of that domain, you'll need to host 2 files under +`/.well-known`, so you'll need a webserver there, using a valid +TLS-certificate. This doesn't have to be the same machine as the one you're +installing Synapse on. In fact, it usually isn't. + +Assuming you're hosting Matrix on the machine `matrix.example.com`, you need +at least an `A` record in DNS, and -if you have IPv6 support, which you +should- an `AAAA` record too. **YOU CAN NOT USE A CNAME FOR THIS RECORD!** +You'll need a valid TLS-certificate for `matrix.example.com` too. + +You'll probably want the webclient too, so that users aren't forced to use an +app on their phone or install the desktop client on their PC. You should never +run the web client on the same name as the server, that opens you up for all +kinds of Cross-Site-Scripting attack. We'll assume you use +`element.example.com` for the web client. You need a DNS entry for that. This +can be a CNAME, but make sure you have a TLS-certificate with the correct name +on it. + +If you install a [TURN-server](coturn), either for legacy calls or for [Element +Call](element-call) (or both), you need a DNS entry for that too, and -again- a +TLS-certificate. We'll use `turn.example.com` for this. + +If you install Element Call (and why shouldn't you?), you need a DNS entry plus +certificate for that, let's assume you use `call.example.com` for that. This +can be a CNAME again. Element Call uses [LiveKit](element-call#livekit) for the +actual processing of audio and video, and that needs its own DNS entry and certificate +too. We'll use `livekit.example.com`. + +| FQDN | Use | Comment | +| :-------------------- | :--------------------- | :--------------------------------------- | +| `example.com` | Hosting `.well-known` | This is the `server_name` | +| `matrix.example.com` | Synapse server | This is the `base_url`, can't be `CNAME` | +| `element.example.com` | Webclient | | +| `turn.example.com` | TURN / Element Call | Highly recommended | +| `call.example.com` | Element Call | Optional | +| `livekit.example.com` | LiveKit SFU | Optional, needed for Element Call | diff --git a/matrix/coturn/README.md b/matrix/coturn/README.md index bedb8821..4468f0da 100644 --- a/matrix/coturn/README.md +++ b/matrix/coturn/README.md @@ -5,16 +5,22 @@ include_toc: true # TURN server -You need an TURN server to connect participants that are behind a NAT firewall. +You need a TURN server to connect participants that are behind a NAT firewall. Because IPv6 doesn't really need TURN, and Chrome can get confused if it has to use TURN over IPv6, we'll stick to a strict IPv4-only configuration. Also, because VoIP traffic is only UDP, we won't do TCP. -IMPORTANT! TURN can also be offered by [LiveKit](../element-call#livekit), in -which case you should probably not run coturn (unless you don't use LiveKit's -built-in TURN server, or want to run both to support legacy calls too). +TURN-functionality can be offered by coturn and LiveKit alike: coturn is used +for legacy calls (only one-on-one, supported in Element Android), whereas +Element Call (supported by ElementX, Desktop and Web) uses LiveKit. +In our documentation we'll enable both, which is probably not the optimal +solution, but at least it results in a system that supports old and new +clients. + +Here we'll describe coturn, the dedicated ICE/STUN/TURN server that needs to +be configured in Synapse, [LiveKit](../element-call#livekit) has its own page. # Installation @@ -72,24 +78,24 @@ certbot certonly --nginx -d turn.example.com This assumes you've already setup and started nginx (see [nginx](../nginx)). -The certificate files reside under `/etc/letsencrypt/live`, but coturn doesn't -run as root, and can't read them. Therefore we create the directory +{#fixssl} +The certificate files reside under `/etc/letsencrypt/live`, but coturn and +LiveKit don't run as root, and can't read them. Therefore we create the directory `/etc/coturn/ssl` where we copy the files to. This script should be run after each certificate renewal: ``` #!/bin/bash -# This script is hooked after a renewal of the certificate, so -# that it's copied and chowned and made readable by coturn: +# This script is hooked after a renewal of the certificate, so that the +# certificate files are copied and chowned, and made readable by coturn: cd /etc/coturn/ssl cp /etc/letsencrypt/live/turn.example.com/{fullchain,privkey}.pem . chown turnserver:turnserver *.pem -# We should restart either coturn or LiveKit, they cannot run both! -systemctl restart coturn -#systemctl restart livekit-server +# Make sure you only start/restart the servers that you need! +systemctl try-reload-or-restart coturn livekit-server ``` @@ -101,7 +107,8 @@ renew_hook = /etc/coturn/fixssl ``` Yes, it's a bit primitive and could (should?) be polished. But for now: it -works. +works. This will copy and chown the certificate files and restart coturn +and/or LiveKit, depending on if they're running or not. # Configuration {#configuration} @@ -120,9 +127,13 @@ Now that we have this, we can configure our configuration file under `/etc/coturn/turnserver.conf`. ``` +# We don't use the default ports, because LiveKit uses those +listening-port=3480 +tls-listening-port=5351 + # We don't need more than 10000 connections: -min-port=50000 -max-port=60000 +min-port=40000 +max-port=49999 use-auth-secret static-auth-secret= @@ -132,7 +143,7 @@ user-quota=12 total-quota=1200 # Of course: substitute correct IPv4 address: -listening-ip=185.206.232.60 +listening-ip=111.222.111.222 # VoIP traffic is only UDP no-tcp-relay diff --git a/matrix/coturn/turnserver.conf b/matrix/coturn/turnserver.conf index 8e1c6d7f..3b99ef7b 100644 --- a/matrix/coturn/turnserver.conf +++ b/matrix/coturn/turnserver.conf @@ -3,11 +3,17 @@ # Only IPv4, IPv6 can confuse some software listening-ip=111.222.111.222 +# Listening port for TURN (UDP and TCP): +listening-port=3480 + +# Listening port for TURN TLS (UDP and TCP): +tls-listening-port=5351 + # Lower and upper bounds of the UDP relay endpoints: # (default values are 49152 and 65535) # -min-port=50000 -max-port=60000 +min-port=40000 +max-port=49999 use-auth-secret static-auth-secret= diff --git a/matrix/draupnir/README.md b/matrix/draupnir/README.md index b7bbd17f..94fa8f35 100644 --- a/matrix/draupnir/README.md +++ b/matrix/draupnir/README.md @@ -53,9 +53,10 @@ Copy it to `production.yaml` and change what you must. | Option | Value | Meaning | | :---- | :---- | :---- | -| `homeserverUrl` | `http://localhost:8008` | Where to communicate with Synapse | +| `homeserverUrl` | `http://localhost:8008` | Where to communicate with Synapse when using network port| +| `homeserverUrl` | `http://unix:/run/matrix-synapse/incoming_main.sock` | Where to communicate with Synapse when using UNIX sockets (see [Workers](../synapse/workers.md)) | | `rawHomeserverUrl` | `https://matrix.example.com` | Same as `server_name` | -| `accessToken` | access token | Copy from login session | +| `accessToken` | access token | Copy from login session or create in [Synapse Admin](../synapse-admin)) | | `password` | password | Password for the account | | `dataPath` | `/opt/Draupnir/datastorage` | Storage | | `managementRoom` | room ID | Room where moderators command Draupnir | diff --git a/matrix/element-call/README.md b/matrix/element-call/README.md index 0dbd793e..43d0e5f5 100644 --- a/matrix/element-call/README.md +++ b/matrix/element-call/README.md @@ -3,153 +3,38 @@ gitea: none include_toc: true --- -# Element Call +# Overview -This bit needs to be updated: Go compiler and the whole Node.js/yarn/npm stuff -needs to be cleaned up and standardized. For now the procedure below will -probably work. +Element Call consists of a few parts, you don't have to host all of them +yourself. In this document, we're going to host everything ourselves, so +here's what you need. -Element Call enables users to have audio and videocalls with groups, while -maintaining full E2E encryption. +* **lk-jwt**. This authenticates Synapse users to LiveKit. +* **LiveKit**. This is the "SFU", which actually handles the audio and video, and does TURN. +* **Element Call widget**. This is basically the webapplication, the user interface. -It requires several bits of software and entries in .well-known/matrix/client +As mentioned in the [checklist](../checklist.md) you need to define these +three entries in DNS and get certificates for them: -This bit is for later, but here's a nice bit of documentation to start: +* `turn.example.com` +* `livekit.example.com` +* `call.example.com` -https://sspaeth.de/2024/11/sfu/ +You may already have DNS and TLS for `turn.example.com`, as it is also used +for [coturn](../coturn). +For more inspiraten, check https://sspaeth.de/2024/11/sfu/ -# Install prerequisites - -Define an entry in DNS for Livekit and Call, e.g. `livekit.example.com` -and `call.example.com`. Get certificates for them and make sure to -[automatically renew them](../nginx/README.md#certrenew). - -Expand `.well-known/matrix/client` to contain the pointer to the SFU: - -``` -"org.matrix.msc4143.rtc_foci": [ - { - "type": "livekit", - "livekit_service_url": "https://livekit.example.com" - } - ] -``` - -Create `.well-known/element/element.json`, which is opened by Element-web and -ElementX to find the Element Call widget. It should contain something like -this: - -``` -{ - "call": { - "widget_url": "https://call.example.com" - } -} -``` - -Make sure it is served as `application/json`, just like the other .well-known -files. - - -lk-jwt-service is a small Go program that handles authorization tokens. You'll need a -Go compiler, so install that: - -``` -apt install golang -``` - - -# lk-jwt-service {#lkjwt} - -Get the latest source code and comile it (preferably *NOT* as root): - -``` -git clone https://github.com/element-hq/lk-jwt-service.git -cd lk-jwt-service -go build -o lk-jwt-service -``` - -You'll then notice that you need a newer compiler, so we'll download that and add it to -our PATH (again not as root): - -``` -wget https://go.dev/dl/go1.23.3.linux-amd64.tar.gz -tar xvfz go1.23.3.linux-amd64.tar.gz -cd go/bin -export PATH=`pwd`:$PATH -cd -``` - -Now, compile: - -``` -cd lk-jwt-service -go build -o lk-jwt-service -``` - -Copy and chown the binary to `/usr/local/sbin` (yes: as root): - -``` -cp ~user/lk-jwt-service/lk-jwt-service /usr/local/sbin -chown root:root /usr/local/sbin/lk-jwt-service -``` - -Create a service file for systemd, something like this: - -``` -# This thing does authorization for Element Call - -[Unit] -Description=LiveKit JWT Service -After=network.target - -[Service] -Restart=always -User=www-data -Group=www-data -WorkingDirectory=/etc/lk-jwt-service -EnvironmentFile=/etc/lk-jwt-service/config -ExecStart=/usr/local/sbin/lk-jwt-service - -[Install] -WantedBy=multi-user.target -``` - -We read the options from `/etc/lk-jwt-service/config`, -which we make read-only for group `www-data` and non-accessible by anyone -else. - -``` -mkdir /etc/lk-jwt-service -vi /etc/lk-jwt-service/config -chgrp -R www-data /etc/lk-jwt-service -chmod -R o-rwx /etc/lk-jwt-service -``` - -The contents of `/etc/lk-jwt-service/config` are not fully known yet (see -further, installation of the actual LiveKit, the SFU), but for now it's enough -to fill it with this: - -``` -LIVEKIT_URL=wss://livekit.example.com -LIVEKIT_SECRET=xxx -LIVEKIT_KEY=xxx -LK_JWT_PORT=8080 -``` - -Now enable and start this thing: - -``` -systemctl enable --now lk-jwt-service -``` # LiveKit {#livekit} -The actual SFU, Selective Forwarding Unit, is LiveKit. Downloading and -installing is easy: download the [binary from Github](https://github.com/livekit/livekit/releases/download/v1.8.0/livekit_1.8.0_linux_amd64.tar.gz) - to /usr/local/bin, chown -it to root:root and you're done. +The actual SFU, Selective Forwarding Unit, is LiveKit; this is the part that +handles the audio and video feeds and also does TURN (this TURN-functionality +does not support the legacy calls, you'll need [coturn](coturn) for that). + +Downloading and installing is easy: download the [binary from +Github](https://github.com/livekit/livekit/releases/download/v1.8.0/livekit_1.8.0_linux_amd64.tar.gz) + to /usr/local/bin, chown it to root:root and you're done. The quickest way to do precisely that, is to run the script: @@ -159,17 +44,42 @@ curl -sSL https://get.livekit.io | bash You can do this as a normal user, it will use sudo to do its job. -Configuring this thing is [documented -here](https://docs.livekit.io/home/self-hosting/deployment/). +While you're at it, you might consider installing the cli tool as well, you +can use it -for example- to generate tokens so you can [test LiveKit's +connectivity](https://livekit.io/connection-test): -Create a key and secret: +``` +curl -sSL https://get.livekit.io/cli | bash +``` + +Configuring LiveKit is [documented +here](https://docs.livekit.io/home/self-hosting/deployment/). We're going to +run LiveKit under authorization of user `turnserver`, the same users we use +for [coturn](coturn). This user is created when installing coturn, so if you +haven't installed that, you should create the user yourself: + +``` +adduser --system turnserver +``` + +## Configure {#keysecret} + +Start by creating a key and secret: ``` livekit-server generate-keys ``` -This key/secret has to be fed to lk-jwt-service, of course. Create a -configuration file for livekit, `/etc/livekit/livekit.yaml`: +This key and secret have to be fed to lk-jwt-service too, [see here](#jwtconfig). +Create the directory for LiveKit's configuration: + +``` +mkdir /etc/livekit +chown root:turnserver /etc/livekit +chmod 750 /etc/livekit +``` + +Create a configuration file for livekit, `/etc/livekit/livekit.yaml`: ``` port: 7880 @@ -190,24 +100,53 @@ turn: udp_port: 3478 external_tls: true keys: - # KEY: secret were autogenerated by livekit/generate - # in the lk-jwt-service environment variables - xxxxxxxxxxxxxxx: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx + # KEY: SECRET were generated by "livekit-server generate-keys" + : ``` -The LiveKit API listens on localhost, IPv6, port 7880. Traffic to this port is -forwarded from port 443by nginx, which handles TLS, so it shouldn't be reachable -from the outside world. +Being a bit paranoid: make sure LiveKit can only read this file, not write it: -The certificate files are not in the usual place under +``` +chown root:turnserver /etc/livekit/livekit.yaml +chmod 640 /etc/livekit/livekit.yaml +``` + +Port `7880` is forwarded by nginx: authentication is also done there, and that +bit has to be forwarded to `lk-jwt-service` on port `8080`. Therefore, we +listen only on localhost. + +The TURN ports are the normal, default ones. If you also use coturn, make sure +it doesn't use the same ports as LiveKit. Also, make sure you open the correct +ports in the [firewall](../firewall). + + +## TLS certificate + +The TLS-certificate files are not in the usual place under `/etc/letsencrypt/live`, see [DNS and -certificate (coturn)](../coturn/README.md#dnscert) why that is. +certificate](../coturn/README.md#dnscert) under coturn why that is. -The `xxx: xxxx` is the key and secret as generated before. +As stated before, we use the same user as for coturn. Because this user does +not have the permission to read private keys under `/etc/letsencrypt`, we copy +those files to a place where it can read them. For coturn we copy them to +`/etc/coturn/ssl`, and if you use coturn and have this directory, LiveKit can +read them there too. + +If you don't have coturn installed, you should create a directory under +`/etc/livekit` and copy the files to there. Modify the `livekit.yaml` file and +the [script to copy the files](../coturn/README.md#fixssl) to use that +directory. Don't forget to update the `renew_hook` in Letsencrypt if you do. + +The LiveKit API listens on localhost, IPv6, port 7880. Traffic to this port is +forwarded from port 443 by nginx, which handles TLS, so it shouldn't be reachable +from the outside world. See [LiveKit's config documentation](https://github.com/livekit/livekit/blob/master/config-sample.yaml) for more options. + +## Systemd + Now define a systemd servicefile, like this: ``` @@ -230,11 +169,125 @@ WantedBy=multi-user.target Enable and start it. -IMPORTANT! +Clients don't know about LiveKit yet, you'll have to give them the information +via the `.well-known/matrix/client`: add this bit to it to point them at the +SFU: -LiveKit is configured to use its built-in TURN server, using the same ports as -[coturn](../coturn). Obviously, LiveKit and coturn are mutually exclusive in -this setup. Shutdown and disable coturn if you use LiveKit's TURN server. +``` +"org.matrix.msc4143.rtc_foci": [ + { + "type": "livekit", + "livekit_service_url": "https://livekit.example.com" + } + ] +``` + +Make sure it is served as `application/json`, just like the other .well-known +files. + + +# lk-jwt-service {#lkjwt} + +lk-jwt-service is a small Go program that handles authorization tokens for use with LiveKit. +You'll need a Go compiler, but the one Debian provides is too old (at the time +of writing this, at least), so we'll install the latest one manually. Check +[the Go website](https://go.dev/dl/) to see which version is the latest, at +the time of writing it's 1.23.3, so we'll install that: + +``` +wget https://go.dev/dl/go1.23.3.linux-amd64.tar.gz +tar xvfz go1.23.3.linux-amd64.tar.gz +cd go/bin +export PATH=`pwd`:$PATH +cd +``` + +This means you now have the latest Go compiler in your path, but it's not +installed system-wide. If you want that, copy the whole `go` directory to +`/usr/local` and add `/usr/local/go/bin` to everybody's $PATH. + +Get the latest lk-jwt-service source code and comile it (preferably *NOT* as root): + +``` +git clone https://github.com/element-hq/lk-jwt-service.git +cd lk-jwt-service +go build -o lk-jwt-service +``` + +Now, compile: + +``` +cd lk-jwt-service +go build -o lk-jwt-service +``` + +Copy and chown the binary to `/usr/local/sbin` (yes: as root): + +``` +cp ~user/lk-jwt-service/lk-jwt-service /usr/local/sbin +chown root:root /usr/local/sbin/lk-jwt-service +``` + + +## Systemd + +Create a service file for systemd, something like this: + +``` +# This thing does authorization for Element Call + +[Unit] +Description=LiveKit JWT Service +After=network.target + +[Service] +Restart=always +User=www-data +Group=www-data +WorkingDirectory=/etc/lk-jwt-service +EnvironmentFile=/etc/lk-jwt-service/config +ExecStart=/usr/local/sbin/lk-jwt-service + +[Install] +WantedBy=multi-user.target +``` + +## Configuration {#jwtconfig} + +We read the options from `/etc/lk-jwt-service/config`, +which we make read-only for group `www-data` and non-accessible by anyone +else. + +``` +mkdir /etc/lk-jwt-service +vi /etc/lk-jwt-service/config +chgrp -R root:www-data /etc/lk-jwt-service +chmod 750 /etc/lk-jwt-service +``` + +This is what you should put into that config file, +`/etc/lk-jwt-service/config`. The `LIVEKIT_SECRET` and `LIVEKIT_KEY` are the +ones you created while [configuring LiveKit](#keysecret). + +``` +LIVEKIT_URL=wss://livekit.example.com +LIVEKIT_SECRET=xxx +LIVEKIT_KEY=xxx +LK_JWT_PORT=8080 +``` + +Change the permission accordingly: + +``` +chown root:www-data /etc/lk-jwt-service/config +chmod 640 /etc/lk-jwt-service/config +``` + +Now enable and start this thing: + +``` +systemctl enable --now lk-jwt-service +``` # Element Call widget {#widget} @@ -263,6 +316,9 @@ sudo apt install yarnpkg /usr/share/nodejs/yarn/bin/yarn install ``` +Yes, this whole Node.js, yarn and npm thing is a mess. Better documentation +could be written, but for now this will have to do. + Now clone the Element Call repository and "compile" stuff (again: not as root): @@ -273,8 +329,12 @@ cd element-call /usr/share/nodejs/yarn/bin/yarn build ``` -After that, you can find the whole shebang under "dist". Copy that to -`/var/www/element-call` and point nginx to it ([see nginx](../nginx#callwidget)). +If it successfully compiles (warnings are more or less ok, errors aren't), you will +find the whole shebang under "dist". Copy that to `/var/www/element-call` and point +nginx to it ([see nginx](../nginx#callwidget)). + + +## Configuring It needs a tiny bit of configuring. The default configuration under `config/config.sample.json` is a good place to start, copy it to `/etc/element-call` and change where @@ -300,3 +360,16 @@ necessary: "eula": "https://www.example.com/online-EULA.pdf" } ``` + +Now tell the clients about this widget. Create +`.well-known/element/element.json`, which is opened by Element Web, Element Desktop +and ElementX to find the Element Call widget. It should look this: + +``` +{ + "call": { + "widget_url": "https://call.example.com" + } +} +``` + diff --git a/matrix/element-call/element.json b/matrix/element-call/element.json new file mode 100644 index 00000000..78857259 --- /dev/null +++ b/matrix/element-call/element.json @@ -0,0 +1,6 @@ +{ + "call": + { + "widget_url": "https://call.example.com" + } +} diff --git a/matrix/firewall/README.md b/matrix/firewall/README.md index 461c9579..9e1ba33b 100644 --- a/matrix/firewall/README.md +++ b/matrix/firewall/README.md @@ -1,21 +1,25 @@ # Firewall -This page is mostly a placeholder for now, but configuration of the firewall -is -of course- very important. +Several ports need to be opened in the firewall, this is a list of all ports +that are needed by the components we describe in this document. -First idea: the ports that need to be opened are: +Those for nginx are necessary for Synapse to work, the ones for coturn and +LiveKit only need to be opened if you run those servers. | Port(s) / range | IP version | Protocol | Application | | :-------------: | :--------: | :------: | :--------------------- | | 80, 443 | IPv4/IPv6 | TCP | nginx, reverse proxy | | 8443 | IPv4/IPv6 | TCP | nginx, federation | -| 7881 | IPv4/IPv6 | TCP/UDP | coturn/LiveKit TURN | -| 3478 | IPv4 | UDP | coturn/LiveKit TURN | -| 5349 | IPv4 | TCP | coturn/LiveKit TURN | -| 50000-60000 | IPv4 | TCP/UDP | coturn/LiveKit TURN | +| 3478 | IPv4 | UDP | LiveKit TURN | +| 5349 | IPv4 | TCP | LiveKit TURN TLS | +| 7881 | IPv4/IPv6 | TCP | LiveKit RTC | +| 50000-60000 | IPv4/IPv6 | TCP/UDP | LiveKit RTC | +| 3480 | IPv4 | TCP/UDP | coturn TURN | +| 5351 | IPv4 | TCP/UDP | coturn TURN TLS | +| 40000-49999 | IPv4 | TCP/UDP | coturn RTC | -The ports necessary for TURN depend very much on the specific configuration -[coturn](../coturn#configuration) or [LiveKit](../element-call#livekit). +The ports necessary for TURN depend very much on the specific configuration of +[coturn](../coturn#configuration) and/or [LiveKit](../element-call#livekit). diff --git a/matrix/nginx/README.md b/matrix/nginx/README.md index 63afd7f2..18b55381 100644 --- a/matrix/nginx/README.md +++ b/matrix/nginx/README.md @@ -49,7 +49,7 @@ list-timers` lists `certbot.timer`. However, renewing the certificate means you'll have to restart the software that's using it. We have 2 or 3 pieces of software that use certificates: -[coturn](../cotorun) and/or [LiveKit](../livekit), and [nginx](../nginx). +[coturn](../coturn) and/or [LiveKit](../element-call#livekit), and [nginx](../nginx). Coturn/LiveKit are special with regards to the certificate, see their respective pages. For nginx it's pretty easy: tell Letsencrypt to restart it @@ -167,6 +167,54 @@ This is a very, very basic configuration; just enough to give us a working service. See this [complete example](revproxy.conf) which also includes [Draupnir](../draupnir) and a protected admin endpoint. +# Element Web + +You can host the webclient on a different machine, but we'll run it on the +same one in this documentation. You do need a different FQDN however, you +can't host it under the same name as Synapse, such as: +``` +https://matrix.example.com/element-web +``` +So you'll need to create an entry in DNS and get a TLS-certificate for it (as +mentioned in the [checklist](../checklist.md)). + +Other than that, configuration is quite simple. We'll listen on both http and +https, and redirect http to https: + +``` +server { + listen 80; + listen [::]:80; + listen 443 ssl http2; + listen [::]:443 ssl http2; + + ssl_certificate /etc/letsencrypt/live/element.example.com/fullchain.pem; + ssl_certificate_key /etc/letsencrypt/live/element.example.com/privkey.pem; + include /etc/letsencrypt/options-ssl-nginx.conf; + ssl_dhparam /etc/ssl/dhparams.pem; + + server_name element.example.com; + + location / { + if ($scheme = http) { + return 301 https://$host$request_uri; + } + add_header X-Frame-Options SAMEORIGIN; + add_header X-Content-Type-Options nosniff; + add_header X-XSS-Protection "1; mode=block"; + add_header Content-Security-Policy "frame-ancestors 'self'"; + } + + root /usr/share/element-web; + index index.html; + + access_log /var/log/nginx/elementweb-access.log; + error_log /var/log/nginx/elementweb-error.log; +} +``` + +This assumes Element Web is installed under `/usr/share/element-web`, as done +by the Debian package provided by Element.io. # Synapse-admin {#synapse-admin} diff --git a/matrix/nginx/call.conf b/matrix/nginx/conf/call.conf similarity index 100% rename from matrix/nginx/call.conf rename to matrix/nginx/conf/call.conf diff --git a/matrix/nginx/domain.conf b/matrix/nginx/conf/domain.conf similarity index 100% rename from matrix/nginx/domain.conf rename to matrix/nginx/conf/domain.conf diff --git a/matrix/nginx/elementweb.conf b/matrix/nginx/conf/elementweb.conf similarity index 75% rename from matrix/nginx/elementweb.conf rename to matrix/nginx/conf/elementweb.conf index 79181e3d..9784ffe9 100644 --- a/matrix/nginx/elementweb.conf +++ b/matrix/nginx/conf/elementweb.conf @@ -1,8 +1,8 @@ server { listen 80; listen [::]:80; - listen 443 ssl; - listen [::]:443 ssl; + listen 443 ssl http2; + listen [::]:443 ssl http2; ssl_certificate /etc/letsencrypt/live/element.example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/element.example.com/privkey.pem; @@ -14,9 +14,9 @@ server { location / { if ($scheme = http) { return 301 https://$host$request_uri; - } + } add_header X-Frame-Options SAMEORIGIN; - add_header X-Content-Type-Options nosniff; + add_header X-Content-Type-Options nosniff; add_header X-XSS-Protection "1; mode=block"; add_header Content-Security-Policy "frame-ancestors 'self'"; } @@ -24,6 +24,6 @@ server { root /usr/share/element-web; index index.html; - access_log /var/log/nginx/element-access.log; - error_log /var/log/nginx/element-error.log; + access_log /var/log/nginx/elementweb-access.log; + error_log /var/log/nginx/elementweb-error.log; } diff --git a/matrix/nginx/livekit.conf b/matrix/nginx/conf/livekit.conf similarity index 100% rename from matrix/nginx/livekit.conf rename to matrix/nginx/conf/livekit.conf diff --git a/matrix/nginx/revproxy.conf b/matrix/nginx/conf/revproxy.conf similarity index 100% rename from matrix/nginx/revproxy.conf rename to matrix/nginx/conf/revproxy.conf diff --git a/matrix/nginx/synapse-admin.conf b/matrix/nginx/conf/synapse-admin.conf similarity index 100% rename from matrix/nginx/synapse-admin.conf rename to matrix/nginx/conf/synapse-admin.conf diff --git a/matrix/nginx/workers/README.md b/matrix/nginx/workers/README.md new file mode 100644 index 00000000..e01edb0d --- /dev/null +++ b/matrix/nginx/workers/README.md @@ -0,0 +1,397 @@ +--- +gitea: none +include_toc: true +--- + +# Reverse proxy for Synapse with workers + +Changing nginx's configuration from a reverse proxy for a normal, monolithic +Synapse to one for a Synapse that uses workers, is a big thing: quite a lot has to +be changed. + +As mentioned in [Synapse with workers](../../synapse/workers/README.md#synapse), +we're changing the "backend" from network sockets to UNIX sockets. + +Because we're going to have to forward a lot of specific requests to all kinds +of workers, we'll split the configuration into a few bits: + +* all `proxy_forward` settings +* all `location` definitions +* maps that define variables +* upstreams that point to the correct socket(s) with the correct settings +* settings for private access +* connection optimizations + +Some of these go into `/etc/nginx/conf.d` because they are part of the +configuration of nginx itself, others go into `/etc/nginx/snippets` because we +need to include them several times in different places. + +**Important consideration** + +This part isn't a quick "put these files in place and you're done": a +worker-based Synapse is tailor-made, there's no one-size-fits-all. This +documentation gives hints and examples, but in the end it's you who has to +decide what types of workers to use and how many, all depending on your +specific use case and the available hardware. + + + + +# Optimizations + +In the quest for speed, we are going to tweak several settings in nginx. To +keep things manageable, most of those tweaks go into separate configuration +files that are either automatically included (those under `/etc/nginx/conf.d`) +or explicitly where we need them (those under `/etc/nginx/snippets`). + +Let's start with a few settings that affect nginx as a whole. Edit these +options in `/etc/nginx/nginx.conf`: + +``` +pcre_jit on; +worker_rlimit_nofile 8192; +worker_connections 4096; +multi_accept off; +gzip_comp_level 2; +gzip_types application/javascript application/json application/x-javascript application/xml application/xml+rss image/svg+xml text/css text/javascript text/plain text/xml; +gzip_min_length 1000; +gzip_disable "MSIE [1-6]\."; +``` + +We're going to use lots of regular expressions in our config, `pcre_jit on` +speeds those up considerably. Workers get 8K open files, and we want 4096 +workers instead of the default 768. Workers can only accept one connection, +which is (in almost every case) proxy_forwarded, so we set `multi_accept off`. + +We change `gzip_comp_level` from 6 to 2, we expand the list of content that is +to be gzipped, and don't zip anything shorter than 1000 characters, instead of +the default 20. MSIE can take a hike... + +These are tweaks for the connection, save this in `/etc/ngnix/conf.d/conn_optimize.conf`. + +``` +client_body_buffer_size 32m; +client_header_buffer_size 32k; +client_max_body_size 1g; +http2_max_concurrent_streams 128; +keepalive_timeout 65; +keepalive_requests 100; +large_client_header_buffers 4 16k; +server_names_hash_bucket_size 128; +tcp_nodelay on; +server_tokens off; +``` + +We set a few proxy settings that we use in proxy_forwards other than to our +workers, save this to `conf.d/proxy_optimize.conf`: + +``` +proxy_buffer_size 128k; +proxy_buffers 4 256k; +proxy_busy_buffers_size 256k; +``` + +For every `proxy_forward` to our workers, we want to configure several settings, +and because we don't want to include the same list of settings every time, we put +all of them in one snippet of code, that we can include every time we need it. + +Create `/etc/nginx/snippets/proxy.conf` and put this in it: + +``` +proxy_connect_timeout 2s; +proxy_buffering off; +proxy_http_version 1.1; +proxy_read_timeout 3600s; +proxy_redirect off; +proxy_send_timeout 120s; +proxy_socket_keepalive on; +proxy_ssl_verify off; + +proxy_set_header Accept-Encoding ""; +proxy_set_header Host $host; +proxy_set_header X-Forwarded-For $remote_addr; +proxy_set_header X-Forwarded-Proto $scheme; +proxy_set_header Connection $connection_upgrade; +proxy_set_header Upgrade $http_upgrade; + +client_max_body_size 50M; +``` + +Every time we use a `proxy_forward`, we include this snippet. There are 2 more +things we might set: trusted locations that can use the admin endpoints, and a +dedicated DNS-recursor. We include the `snippets/private.conf` in the +forwards to admin endpoints, so that not the entire Internet can play with it. +The dedicated nameserver is something you really want, because synchronising a +large room can easily result in 100.000+ DNS requests. You'll hit flood +protection on most servers if you do that. + +List the addresses from which you want to allow admin access in +`snippets/private.conf`: + +``` +allow 127.0.0.1; +allow ::1; +allow 12.23.45.78; +allow 87.65.43.21; +allow dead:beef::/48; +allow 2a10:1234:abcd::1; +deny all; +satisfy all; +``` + +Of course, subsitute these random addresses for the ones you trust. The +dedicated nameserver (if you have one, which is strongly recommended) should +be configured in `conf.d/resolver.conf`: + +``` +resolver [::1] 127.0.0.1 valid=60; +resolver_timeout 10s; +``` + + +# Maps {#maps} + +A map sets a variable based on, usually, another variable. One case we use this +is in determining the type of sync a client is doing. A normal sync, simply +updating an existing session, is a rather lightweight operation. An initial sync, +meaning a full sync because the session is brand new, is not so lightweight. + +A normal sync can be recognised by the `since` bit in the request: it tells +the server when its last sync was. If there is no `since`, we're dealing with +an initial sync. + +We want to forward requests for normal syncs to the `normal_sync` workers, and +the initial syncs to the `initial_sync` workers. + +We decide to which type of worker to forward the sync request to by looking at +the presence or absence of `since`: if it's there, it's a normal sync and we +set the variable `$sync` to `normal_sync`. If it's not there, we set `$sync` to +`initial_sync`. The content of `since` is irrelevant for nginx. + +This is what the map looks like: + +``` +map $arg_since $sync { + default normal_sync; + '' initial_sync; +} +``` + +We evaluate `$arg_since` to set `$sync`: `$arg_since` is nginx's variable `$arg_` +followed by `since`, the argument we want. See [the index of +variables in nginx](https://nginx.org/en/docs/varindex.html) for more +variables we can use in nginx. + +By default we set `$sync` to `normal_sync`, unless the argument `since` is +empty (absent); then we set it to `initial_sync`. + +After this mapping, we forward the request to the correct worker like this: + +``` +proxy_pass http://$sync; +``` + +See a complete example of maps in the file [maps.conf](maps.conf). + + +# Upstreams + +In our configuration, nginx is not only a reverse proxy, it's also a load balancer. +Just like what `haproxy` does, it can forward requests to "servers" behind it. +Such a server is the inbound UNIX socket of a worker, and there can be several +of them in one group. + +Let's start with a simple one, the `login` worker, that handles the login +process for clients. There's only one worker, so only one socket: + +``` +upstream login { + server unix:/run/matrix-synapse/inbound_login.sock max_fails=0; + keepalive 10; +} +``` + +Ater this definition, we can forward traffic to `login`. What traffic to +forward is decided in the `location` statements, see further. + +## Synchronisation + +A more complex example are the sync workers. Under [Maps](#Maps) we split sync +requests into two different types; those different types are handled by +different worker pools. In our case we have 2 workers for the initial_sync +requests, and 3 for the normal ones: + +``` +upstream initial_sync { + hash $mxid_localpart consistent; + server unix:/run/matrix-synapse/inbound_initial_sync1.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_initial_sync2.sock max_fails=0; + keepalive 10; +} + +upstream normal_sync { + hash $mxid_localpart consistent; + server unix:/run/matrix-synapse/inbound_normal_sync1.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_normal_sync2.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_normal_sync3.sock max_fails=0; + keepalive 10; +} +``` + +The `hash` bit is to make sure that request from one user are consistently +forwarded to the same worker. We filled the variable `$mxid_localpart` in the +maps. + +## Federation + +Something similar goes for the federation workers. Some requests need to go +to the same worker as all the other requests from the same IP-addres, other +can go to any of these workers. + +We define two upstreams with the same workers, only with different names and +the explicit IP-address ordering for one: + +``` +upstream incoming_federation { + server unix:/run/matrix-synapse/inbound_federation_reader1.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader2.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader3.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader4.sock max_fails=0; + keepalive 10; +} + +upstream federation_requests { + hash $remote_addr consistent; + server unix:/run/matrix-synapse/inbound_federation_reader1.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader2.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader3.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader4.sock max_fails=0; + keepalive 10; +} +``` + +Same workers, different handling. See how we forward requests in the next +paragraph. + +See [upstreams.conf](upstreams.conf) for a complete example. + + +# Locations + +Now that we have defined the workers and/or worker pools, we have to forward +the right traffic to the right workers. The Synapse documentation about +[available worker +types](https://element-hq.github.io/synapse/latest/workers.html#available-worker-applications) +lists which endpoints a specific worker type can handle. + +## Login + +Let's forward login requests to our login worker. The [documentation for the +generic_worker](https://element-hq.github.io/synapse/latest/workers.html#synapseappgeneric_worker) +says these endpoints are for registration and login: + +``` +# Registration/login requests +^/_matrix/client/(api/v1|r0|v3|unstable)/login$ +^/_matrix/client/(r0|v3|unstable)/register$ +^/_matrix/client/(r0|v3|unstable)/register/available$ +^/_matrix/client/v1/register/m.login.registration_token/validity$ +^/_matrix/client/(r0|v3|unstable)/password_policy$ +``` + +We forward that to our worker with this `location` definition, using the +`proxy_forward` settings we defined earlier: + +``` +location ~ ^(/_matrix/client/(api/v1|r0|v3|unstable)/login|/_matrix/client/(r0|v3|unstable)/register|/_matrix/client/(r0|v3|unstable)/register/available|/_matrix/client/v1/register/m.login.registration_token/validity|/_matrix/client/(r0|v3|unstable)/password_policy)$ { + include snippets/proxy.conf; + proxy_pass http://login; +} +``` + +## Synchronisation + +The docs say that the `generic_worker` can handle these requests for synchronisation +requests: + +``` +# Sync requests +^/_matrix/client/(r0|v3)/sync$ +^/_matrix/client/(api/v1|r0|v3)/events$ +^/_matrix/client/(api/v1|r0|v3)/initialSync$ +^/_matrix/client/(api/v1|r0|v3)/rooms/[^/]+/initialSync$ +``` + +We forward those to our 2 worker pools making sure the heavy initial syncs go +to the `initial_sync` pool, and the normal ones to `normal_sync`. We use the +variable `$sync`for that, which we defined in maps.conf. + +``` +# Normal/initial sync +location ~ ^/_matrix/client/(r0|v3)/sync$ { + include snippets/proxy.conf; + proxy_pass http://$sync; +} + +# Normal sync +location ~ ^/_matrix/client/(api/v1|r0|v3)/events$ { + include snippets/proxy.conf; + proxy_pass http://normal_sync; +} + +# Initial sync +location ~ ^(/_matrix/client/(api/v1|r0|v3)/initialSync|/_matrix/client/(api/v1|r0|v3)/rooms/[^/]+/initialSync)$ { + include snippets/proxy.conf; + proxy_pass http://initial_sync; +} +``` + +## Media + +The media worker is slightly different: some parts are public, but a few bits +are admin stuff. We split those, and limit the admin endpoints to the trusted +addresses we defined earlier: + +``` +# Media, public +location ~* ^(/_matrix/((client|federation)/[^/]+/)media/|/_matrix/media/v3/upload/) { + include snippets/proxy.conf; + proxy_pass http://media; +} + +# Media, admin +location ~ ^/_synapse/admin/v1/(purge_)?(media(_cache)?|room|user|quarantine_media|users)/[\s\S]+|media$ { + include snippets/private.conf; + include snippets/proxy.conf; + proxy_pass http://media; +} +``` + +# Federation + +Federation is done by two types of workers: one pool for requests from our +server to the rest of the world, and one pool for everything coming in from the +outside world. Only the latter is relevant for nginx. + +The documentation mentions two different types of federation: +* Federation requests +* Inbound federation transaction request + +The second is special, in that requests for that specific endpoint must be +balanced by IP-address. The "normal" federation requests can be sent to any +worker. We're sending all these requests to the same workers, but we make sure +to always send requests from 1 IP-address to the same worker: + +``` +# Federation readers +location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/(v1|v2)/send_join/|/_matrix/federation/(v1|v2)/send_leave/|/_matrix/federation/v1/make_knock/|/_matrix/federation/v1/send_knock/|/_matrix/federation/(v1|v2)/invite/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/timestamp_to_event/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/key/v2/query|/_matrix/federation/v1/hierarchy/) { + include snippets/proxy.conf; + proxy_pass http://incoming_federation; +} +# Inbound federation transactions +location ~ ^/_matrix/federation/v1/send/ { + include snippets/proxy.conf; + proxy_pass http://federation_requests; +} +``` + diff --git a/matrix/nginx/workers/conn_optimizations.conf b/matrix/nginx/workers/conn_optimizations.conf new file mode 100644 index 00000000..6822bc25 --- /dev/null +++ b/matrix/nginx/workers/conn_optimizations.conf @@ -0,0 +1,13 @@ +# These settings optimize the connection handling. Store this file under /etc/nginx/conf.d, because +# it should be loaded by default. + +client_body_buffer_size 32m; +client_header_buffer_size 32k; +client_max_body_size 1g; +http2_max_concurrent_streams 128; +keepalive_timeout 65; +keepalive_requests 100; +large_client_header_buffers 4 16k; +server_names_hash_bucket_size 128; +tcp_nodelay on; +server_tokens off; diff --git a/matrix/nginx/workers/locations.conf b/matrix/nginx/workers/locations.conf new file mode 100644 index 00000000..b7adf25c --- /dev/null +++ b/matrix/nginx/workers/locations.conf @@ -0,0 +1,111 @@ +# This file describes the forwarding of (almost) every endpoint to a worker or pool of +# workers. This file should go in /etc/nginx/snippets, because we need to load it once, on +# the right place in our site-definition. + +# Account-data +location ~ ^(/_matrix/client/(r0|v3|unstable)/.*/tags|/_matrix/client/(r0|v3|unstable)/.*/account_data) { + include snippets/proxy.conf; + proxy_pass http://account_data; +} + +# Typing +location ~ ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/typing { + include snippets/proxy.conf; + proxy_pass http://typing; +} + +# Receipts +location ~ ^(/_matrix/client/(r0|v3|unstable)/rooms/.*/receipt|/_matrix/client/(r0|v3|unstable)/rooms/.*/read_markers) { + include snippets/proxy.conf; + proxy_pass http://receipts; +} + +# Presence +location ~ ^/_matrix/client/(api/v1|r0|v3|unstable)/presence/ { + include snippets/proxy.conf; + proxy_pass http://presence; +} + +# To device +location ~ ^/_matrix/client/(r0|v3|unstable)/sendToDevice/ { + include snippets/proxy.conf; + proxy_pass http://todevice; +} + +# Push rules +location ~ ^/_matrix/client/(api/v1|r0|v3|unstable)/pushrules/ { + include snippets/proxy.conf; + proxy_pass http://push_rules; +} + +# Userdir +location ~ ^/_matrix/client/(r0|v3|unstable)/user_directory/search$ { + include snippets/proxy.conf; + proxy_pass http://userdir; +} + +# Media, users1 +location ~* ^/_matrix/((client|federation)/[^/]+/)media/ { + include snippets/proxy.conf; + proxy_pass http://media; +} +# Media, users2 +location ~* ^/_matrix/media/v3/upload { + include snippets/proxy.conf; + proxy_pass http://media; +} + +# Media, admin +location ~ ^/_synapse/admin/v1/(purge_)?(media(_cache)?|room|user|quarantine_media|users)/[\s\S]+|media$ { + include snippets/private.conf; + include snippets/proxy.conf; + proxy_pass http://media; +} + +# Login +location ~ ^(/_matrix/client/(api/v1|r0|v3|unstable)/login|/_matrix/client/(r0|v3|unstable)/register|/_matrix/client/(r0|v3|unstable)/register/available|/_matrix/client/v1/register/m.login.registration_token/validity|/_matrix/client/(r0|v3|unstable)/password_policy)$ { + include snippets/proxy.conf; + proxy_pass http://login; +} + +# Normal/initial sync: +# To which upstream to pass the request depends on the map "$sync" +location ~ ^/_matrix/client/(r0|v3)/sync$ { + include snippets/proxy.conf; + proxy_pass http://$sync; +} +# Normal sync: +# These endpoints are used for normal syncs +location ~ ^/_matrix/client/(api/v1|r0|v3)/events$ { + include snippets/proxy.conf; + proxy_pass http://normal_sync; +} +# Initial sync: +# These endpoints are used for initial syncs +location ~ ^/_matrix/client/(api/v1|r0|v3)/initialSync$ { + include snippets/proxy.conf; + proxy_pass http://initial_sync; +} +location ~ ^/_matrix/client/(api/v1|r0|v3)/rooms/[^/]+/initialSync$ { + include snippets/proxy.conf; + proxy_pass http://initial_sync; +} + +# Federation +# All the "normal" federation stuff: +location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/(v1|v2)/send_join/|/_matrix/federation/(v1|v2)/send_leave/|/_matrix/federation/v1/make_knock/|/_matrix/federation/v1/send_knock/|/_matrix/federation/(v1|v2)/invite/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/timestamp_to_event/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/key/v2/query|/_matrix/federation/v1/hierarchy/) { + include snippets/proxy.conf; + proxy_pass http://incoming_federation; +} +# Inbound federation transactions: +location ~ ^/_matrix/federation/v1/send/ { + include snippets/proxy.conf; + proxy_pass http://federation_requests; +} + + +# Main thread for all the rest +location / { + include snippets/proxy.conf; + proxy_pass http://inbound_main; + diff --git a/matrix/nginx/workers/maps.conf b/matrix/nginx/workers/maps.conf new file mode 100644 index 00000000..702da847 --- /dev/null +++ b/matrix/nginx/workers/maps.conf @@ -0,0 +1,55 @@ +# These maps set all kinds of variables we can use later in our configuration. This fil +# should be stored under /etc/nginx/conf.d so that it is loaded whenever nginx starts. + +# List of allowed origins, can only send one. +map $http_origin $allow_origin { + ~^https?://element.example.com$ $http_origin; + ~^https?://call.example.com$ $http_origin; + ~^https?://someserver.example.com$ $http_origin; + # NGINX won't set empty string headers, so if no match, header is unset. + default ""; +} + +# Client username from MXID +map $http_authorization $mxid_localpart { + default $http_authorization; + "~Bearer syt_(?.*?)_.*" $username; + "" $accesstoken_from_urlparam; +} + +# Whether to upgrade HTTP connection +map $http_upgrade $connection_upgrade { + default upgrade; + '' close; +} + +#Extract room name from URI +map $request_uri $room_name { + default "not_room"; + "~^/_matrix/(client|federation)/.*?(?:%21|!)(?[\s\S]+)(?::|%3A)(?[A-Za-z0-9.\-]+)" "!$room:$domain"; +} + +# Choose sync worker based on the existence of "since" query parameter +map $arg_since $sync { + default normal_sync; + '' initial_sync; +} + +# Extract username from access token passed as URL parameter +map $arg_access_token $accesstoken_from_urlparam { + # Defaults to just passing back the whole accesstoken + default $arg_access_token; + # Try to extract username part from accesstoken URL parameter + "~syt_(?.*?)_.*" $username; +} + +# Extract username from access token passed as authorization header +map $http_authorization $mxid_localpart { + # Defaults to just passing back the whole accesstoken + default $http_authorization; + # Try to extract username part from accesstoken header + "~Bearer syt_(?.*?)_.*" $username; + # if no authorization-header exist, try mapper for URL parameter "access_token" + "" $accesstoken_from_urlparam; +} + diff --git a/matrix/nginx/workers/private.conf b/matrix/nginx/workers/private.conf new file mode 100644 index 00000000..461857ae --- /dev/null +++ b/matrix/nginx/workers/private.conf @@ -0,0 +1,13 @@ +# This file defines the "safe" IP addresses that are allowed to use the admin endpoints +# of our installation. Store this file under /etc/nginx/snippets, so you can load it on +# demand for the bits you want/need to protect. + +allow 127.0.0.1; +allow ::1; +allow 12.23.45.78; +allow 87.65.43.21; +allow dead:beef::/48; +allow 2a10:1234:abcd::1; +deny all; +satisfy all; + diff --git a/matrix/nginx/workers/proxy.conf b/matrix/nginx/workers/proxy.conf new file mode 100644 index 00000000..4c3dbc54 --- /dev/null +++ b/matrix/nginx/workers/proxy.conf @@ -0,0 +1,8 @@ +# These are a few proxy settings that should be default. These are not used in the proxy_forward to +# our workers, we don't want buffering there. Store this file under /etc/nginx/conf.d because it contains +# defaults. + +proxy_buffer_size 128k; +proxy_buffers 4 256k; +proxy_busy_buffers_size 256k; + diff --git a/matrix/nginx/workers/proxy_forward.conf b/matrix/nginx/workers/proxy_forward.conf new file mode 100644 index 00000000..95bd3c25 --- /dev/null +++ b/matrix/nginx/workers/proxy_forward.conf @@ -0,0 +1,20 @@ +# Settings that we want for every proxy_forward to our workers. This file should live +# under /etc/nginx/snippets, because it should not be loaded automatically but on demand. + +proxy_connect_timeout 2s; +proxy_buffering off; +proxy_http_version 1.1; +proxy_read_timeout 3600s; +proxy_redirect off; +proxy_send_timeout 120s; +proxy_socket_keepalive on; +proxy_ssl_verify off; + +proxy_set_header Accept-Encoding ""; +proxy_set_header Host $host; +proxy_set_header X-Forwarded-For $remote_addr; +proxy_set_header X-Forwarded-Proto $scheme; +proxy_set_header Connection $connection_upgrade; +proxy_set_header Upgrade $http_upgrade; + +client_max_body_size 50M; diff --git a/matrix/nginx/workers/upstreams.conf b/matrix/nginx/workers/upstreams.conf new file mode 100644 index 00000000..a9120301 --- /dev/null +++ b/matrix/nginx/workers/upstreams.conf @@ -0,0 +1,116 @@ +# Stream workers first, they are special. The documentation says: +# "each stream can only have a single writer" + +# Account-data +upstream account_data { + server unix:/run/matrix-synapse/inbound_accountdata.sock max_fails=0; + keepalive 10; +} + +# Userdir +upstream userdir { + server unix:/run/matrix-synapse/inbound_userdir.sock max_fails=0; + keepalive 10; +} + +# Typing +upstream typing { + server unix:/run/matrix-synapse/inbound_typing.sock max_fails=0; + keepalive 10; +} + +# To device +upstream todevice { + server unix:/run/matrix-synapse/inbound_todevice.sock max_fails=0; + keepalive 10; +} + +# Receipts +upstream receipts { + server unix:/run/matrix-synapse/inbound_receipts.sock max_fails=0; + keepalive 10; +} + +# Presence +upstream presence { + server unix:/run/matrix-synapse/inbound_presence.sock max_fails=0; + keepalive 10; +} + +# Push rules +upstream push_rules { + server unix:/run/matrix-synapse/inbound_push_rules.sock max_fails=0; + keepalive 10; +} + +# End of the stream workers, the following workers are of a "normal" type + +# Media +# If more than one media worker is used, they *must* all run on the same machine +upstream media { + server unix:/run/matrix-synapse/inbound_mediaworker.sock max_fails=0; + keepalive 10; +} + +# Synchronisation by clients: + +# Normal sync. Not particularly heavy, but happens a lot +upstream normal_sync { + # Use the username mapper result for hash key + hash $mxid_localpart consistent; + server unix:/run/matrix-synapse/inbound_normal_sync1.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_normal_sync2.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_normal_sync3.sock max_fails=0; + keepalive 10; +} +# Initial sync +# Much heavier than a normal sync, but happens less often +upstream initial_sync { + # Use the username mapper result for hash key + hash $mxid_localpart consistent; + server unix:/run/matrix-synapse/inbound_initial_sync1.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_initial_sync2.sock max_fails=0; + keepalive 10; +} + +# Login +upstream login { + server unix:/run/matrix-synapse/inbound_login.sock max_fails=0; + keepalive 10; +} + +# Clients +upstream client { + hash $mxid_localpart consistent; + server unix:/run/matrix-synapse/inbound_clientworker1.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_clientworker2.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_clientworker3.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_clientworker4.sock max_fails=0; + keepalive 10; +} + +# Federation +# "Normal" federation, balanced round-robin over 4 workers. +upstream incoming_federation { + server unix:/run/matrix-synapse/inbound_federation_reader1.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader2.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader3.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader4.sock max_fails=0; + keepalive 10; +} +# Inbound federation requests, need to be balanced by IP-address, but can go +# to the same pool of workers as the other federation stuff. +upstream federation_requests { + hash $remote_addr consistent; + server unix:/run/matrix-synapse/inbound_federation_reader1.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader2.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader3.sock max_fails=0; + server unix:/run/matrix-synapse/inbound_federation_reader4.sock max_fails=0; + keepalive 10; +} + +# Main thread for all the rest +upstream inbound_main { + server unix:/run/matrix-synapse/inbound_main.sock max_fails=0; + keepalive 10; +} diff --git a/matrix/postgresql/README.md b/matrix/postgresql/README.md index ae86a943..a5421024 100644 --- a/matrix/postgresql/README.md +++ b/matrix/postgresql/README.md @@ -75,8 +75,10 @@ Make sure you add these lines under the one that gives access to the postgres superuser, the first line. -# Tuning +# Tuning {#tuning} This is for later, check [Tuning your PostgreSQL Server](https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server) on the PostgreSQL wiki. +For tuning in the scenario with [Synapse workers](../synapse/workers), see [this +useful site](https://tcpipuk.github.io/postgres/tuning/index.html). diff --git a/matrix/synapse/README.md b/matrix/synapse/README.md index c93f3fe3..bd9c6f73 100644 --- a/matrix/synapse/README.md +++ b/matrix/synapse/README.md @@ -180,7 +180,11 @@ Pointing clients to the correct server needs this at Very important: both names (example.com and matrix.example.com) must be A and/or AAAA records in DNS, not CNAME. -See [nginx](../nginx) for details about how to publish this data. +You can also publish support data: administrator, security officer, helpdesk +page. Publish that as `.well-known/matrix/support`. + +See the included files for more elaborate examples, and check +[nginx](../nginx) for details about how to publish this data. # E-mail {#Email} diff --git a/matrix/synapse/well-known-client.json b/matrix/synapse/well-known-client.json new file mode 100644 index 00000000..28a67dbc --- /dev/null +++ b/matrix/synapse/well-known-client.json @@ -0,0 +1,12 @@ +{ + "m.homeserver": { + "base_url": "https://matrix.example.com" + }, + + "org.matrix.msc4143.rtc_foci":[ + { + "type": "livekit", + "livekit_service_url": "https://livekit.example.com" + } + ] +} diff --git a/matrix/synapse/well-known-server.json b/matrix/synapse/well-known-server.json new file mode 100644 index 00000000..b9ffd998 --- /dev/null +++ b/matrix/synapse/well-known-server.json @@ -0,0 +1 @@ +{"m.server": "matrix.example.com"} diff --git a/matrix/synapse/well-known-support.json b/matrix/synapse/well-known-support.json new file mode 100644 index 00000000..ef9be1a1 --- /dev/null +++ b/matrix/synapse/well-known-support.json @@ -0,0 +1,17 @@ +{ + "contacts": [ + { + "email_address": "admin@example.com", + "matrix_id": "@john:example.com", + "role": "m.role.admin" + }, + + { + "email_address": "security@example.com", + "matrix_id": "@bob:example.com", + "role": "m.role.security" + } + ], + + "support_page": "https://support.example.com/" +} diff --git a/matrix/synapse/workers.md b/matrix/synapse/workers.md deleted file mode 100644 index c84eec72..00000000 --- a/matrix/synapse/workers.md +++ /dev/null @@ -1,99 +0,0 @@ ---- -gitea: none -include_toc: true ---- - -# Worker-based setup - -Very busy servers are brought down because a single thread can't keep up with -the load. So you want to create several threads for different types of work. - -See this [Matrix blog](https://matrix.org/blog/2020/11/03/how-we-fixed-synapse-s-scalability/) -for some background information. - - -# Redis - -First step is to install Redis. - -``` -apt install redis-server -``` - -For less overhead we use a UNIX socket instead of a network connection to -localhost. Disable the TCP listener and enable the socket in -`/etc/redis/redis.conf`: - -``` -port 0 - -unixsocket /run/redis/redis-server.sock -unixsocketperm 770 -``` - -Our matrix user (`matrix-synapse`) has to be able to read from and write to -that socket, which is created by Redis and owned by `redis:redis`, so we add -user `matrix-synapse` to the group `redis`. - -``` -adduser matrix-synapse redis -``` - -Restart Redis for these changes to take effect. Check if port 6379 is no -longer active, and if the socketfile `/run/redis/redis-server.sock` exists. - - -# Synapse - -First, create the directory where all the socket files for workers will come, -and give it the correct user, group and permission: - -``` -mkdir /run/matrix-synapse -dpkg-statoverride --add --update matrix-synapse matrix-synapse 0770 /run/matrix-synapse -``` - -Add a replication listener: - -``` -listeners: - - ... - - - path: /run/matrix-synapse/replication.sock - mode: 0660 - type: http - resources: - - names: - - replication -``` - -Check if the socket is created and has the correct permissions. Now point -Synapse at Redis in `conf.d/redis.yaml`: - -``` -redis: - enabled: true - path: /run/redis/redis-server.sock -``` - -Check if Synapse can connect to Redis via the socket, you should find log -entries like this: - -``` -synapse.replication.tcp.redis - 292 - INFO - sentinel - Connecting to redis server UNIXAddress('/run/redis/redis-server.sock') -synapse.util.httpresourcetree - 56 - INFO - sentinel - Attaching to path b'/_synapse/replication' -synapse.replication.tcp.redis - 126 - INFO - sentinel - Connected to redis -synapse.replication.tcp.redis - 138 - INFO - subscribe-replication-0 - Sending redis SUBSCRIBE for ['matrix.example.com/USER_IP', 'matrix.example.com'] -synapse.replication.tcp.redis - 141 - INFO - subscribe-replication-0 - Successfully subscribed to redis stream, sending REPLICATE command -synapse.replication.tcp.redis - 146 - INFO - subscribe-replication-0 - REPLICATE successfully sent -``` - - -# Workers - -Workers are Synapse instances that perform a single job (or a set of jobs). -Their configuration goes into `/etc/matrix-synapse/workers`, which we have to -create first. - - diff --git a/matrix/synapse/workers/README.md b/matrix/synapse/workers/README.md new file mode 100644 index 00000000..da351193 --- /dev/null +++ b/matrix/synapse/workers/README.md @@ -0,0 +1,593 @@ +--- +gitea: none +include_toc: true +--- + +# Introduction to a worker-based setup + +Very busy servers are brought down because a single thread can't keep up with +the load. So you want to create several threads for different types of work. + +See this [Matrix blog](https://matrix.org/blog/2020/11/03/how-we-fixed-synapse-s-scalability/) +for some background information. + +The traditional Synapse setup is one monolithic piece of software that does +everything. Joining a very busy room makes a bottleneck, as the server will +spend all its cycles on synchronizing that room. + +You can split the server into workers, that are basically Synapse servers +themselves. Redirect specific tasks to them and you have several different +servers doing all kinds of tasks at the same time. A busy room will no longer +freeze the rest. + +Workers communicate with each other via UNIX sockets and Redis. We choose +UNIX sockets because they're much more efficient than network sockets. Of +course, if you scale to more than one machine, you will need network sockets +instead. + +**Important note** + +While the use of workers can drastically improve speed, the law of diminished +returns applies. Splitting off more and more workers will not further improve +speed after a certain point. Plus: you need to understand what the most +resource-consuming tasks are before you can start to plan how many workers for +what tasks you need. + +In this document we'll basically create a worker for every task, and several +workers for a few heavy tasks, as an example. You mileage may not only vary, it +will. + +Tuning the rest of the machine and network also counts, especially PostgreSQL. +A well-tuned PostgreSQL can make a really big difference and should probably +be considered even before configuring workers. + +With workers, PostgreSQL's configuration should be changed accordingly: see +[Tuning PostgreSQL for a Matrix Synapse +server](https://tcpipuk.github.io/postgres/tuning/index.html) for hints and +examples. + +A worker-based Synapse is tailor-made, there is no one-size-fits-all approach. +All we can do here is explain how things work, what to consider and how to +build what you need by providing examples. + + +# Redis + +Workers need Redis as part of their communication, so our first step will be +to install Redis. + +``` +apt install redis-server +``` + +For less overhead we use a UNIX socket instead of a network connection to +localhost. Disable the TCP listener and enable the socket in +`/etc/redis/redis.conf`: + +``` +port 0 + +unixsocket /run/redis/redis-server.sock +unixsocketperm 770 +``` + +Our matrix user (`matrix-synapse`) has to be able to read from and write to +that socket, which is created by Redis and owned by `redis:redis`, so we add +user `matrix-synapse` to the group `redis`. You may come up with a +finer-grained permission solution, but for our example this will do. + +``` +adduser matrix-synapse redis +``` + +Restart Redis for these changes to take effect. Check for error messages in +the logs, if port 6379 is no longer active, and if the socketfile +`/run/redis/redis-server.sock` exists. + +Now point Synapse at Redis in `conf.d/redis.yaml`: + +``` +redis: + enabled: true + path: /run/redis/redis-server.sock +``` + +Restart Synapse and check if it can connect to Redis via the socket, you should find log +entries like this: + +``` +synapse.replication.tcp.redis - 292 - INFO - sentinel - Connecting to redis server UNIXAddress('/run/redis/redis-server.sock') +synapse.util.httpresourcetree - 56 - INFO - sentinel - Attaching to path b'/_synapse/replication' +synapse.replication.tcp.redis - 126 - INFO - sentinel - Connected to redis +synapse.replication.tcp.redis - 138 - INFO - subscribe-replication-0 - Sending redis SUBSCRIBE for ['matrix.example.com/USER_IP', 'matrix.example.com'] +synapse.replication.tcp.redis - 141 - INFO - subscribe-replication-0 - Successfully subscribed to redis stream, sending REPLICATE command +synapse.replication.tcp.redis - 146 - INFO - subscribe-replication-0 - REPLICATE successfully sent +``` + + +# Synapse + +Workers communicate with each other over sockets, that are all placed in one +directory. These sockets are owned by `matrix-synapse:matrix-synapse`, so make +sure nginx can write to them: add user `www-data` to group `matrix-synapse` +and restart nginx. + +Then, make sure systemd creates the directory for the sockets as soon as +Synapse starts: + +``` +systemctl edit matrix-synapse +``` + +Now override parts of the `Service` stanza to add these two lines: + +``` +[Service] +RuntimeDirectory=matrix-synapse +RuntimeDirectoryPreserve=yes +``` + +The directory `/run/matrix-synapse` will be created as soon +as Synapse starts, and will not be removed on restart or stop, because that +would create problems with workers who suddenly lose their sockets. + +Then we change Synapse from listening on `localhost:8008` to listening on a +socket. We'll do most of our workers work in `conf.d/listeners.yaml`, so let's +put the new listener configuration for the main proccess there. + +Remove the `localhost:8008` stanza, and configure these two sockets: + +``` +listeners: + - path: /run/matrix-synapse/inbound_main.sock + mode: 0660 + type: http + resources: + - names: + - client + - consent + - federation + + - path: /run/matrix-synapse/replication_main.sock + mode: 0660 + type: http + resources: + - names: + - replication +``` + +This means Synapse will create two sockets under `/run/matrix-synapse`: one +for incoming traffic that is forwarded by nginx (`inbound_main.sock`), and one for +communicating with all the other workers (`replication_main.sock`). + +If you restart Synapse now, it won't do anything anymore, because nginx is +still forwarding its traffic to `localhost:8008`. We'll get to nginx later, +but for now you should change: + +``` +proxy_forward http://localhost:8008; +``` + +to + +``` +proxy_forward http://unix:/run/matrix-synapse/inbound_main.sock; +``` + +If you've done this, restart Synapse and nginx, and check if the sockets are created +and have the correct permissions. + +Synapse should work normally again, we've switched from network sockets to +UNIX sockets, and added Redis. Now we'll create the actual workers. + + +# Worker overview + +Every worker is, in fact, a Synapse server, only with a limited set of tasks. +Some tasks can be handled by a number of workers, others only by one. Every +worker starts as a normal Synapse process, reading all the normal +configuration files, and then a bit of configuration for the specific worker +itself. + +Workers need to communicate with each other and the main process, they do that +via the `replication` sockets under `/run/matrix-synapse` and Redis. + +Most worker also need a way to be fed traffic by nginx: they have an `inbound` +socket for that, in the same directory. + +Finally, all those replicating workers need to be registered in the main +process: all workers and their replication sockets are listed in the `instance_map`. + + +## Types of workers + +We'll make separate workers for almost every task, and several for the +heaviest tasks: synchronising. An overview of what endpoints are to be +forwarded to a worker is in [Synapse's documentation](https://element-hq.github.io/synapse/latest/workers.html#available-worker-applications). + +We'll create the following workers: + +* login +* federation_sender +* mediaworker +* userdir +* pusher +* push_rules +* typing +* todevice +* accountdata +* presence +* receipts +* initial_sync: 1 and 2 +* normal_sync: 1, 2 and 3 + +Some of them are `stream_writers`, and the [documentation about +stream_witers](https://element-hq.github.io/synapse/latest/workers.html#stream-writers) +says: + +``` +Note: The same worker can handle multiple streams, but unless otherwise documented, each stream can only have a single writer. +``` + +So, stream writers must have unique tasks: you can't have two or more workers +writing to the same stream. Stream writers have to be listed in `stream_writers`: + +``` +stream_writers: + account_data: + - accountdata + presence: + - presence + receipts: + - receipts + to_device: + - todevice + typing: + - typing + push_rules: + - push_rules +``` + +As you can see, we've given the stream workers the name of the stream they're +writing to. We could combine all those streams into one worker, which would +probably be enough for most instances. + +We could define a worker with the name streamwriter and list it under all +streams instead of a single worker for every stream. + +Finally, we have to list all these workers under `instance_map`: their name +and their replication socket: + +``` +instance_map: + main: + path: "/run/matrix-synapse/replication_main.sock" + login: + path: "/run/matrix-synapse/replication_login.sock" + federation_sender: + path: "/run/matrix-synapse/replication_federation_sender.sock" + mediaworker: + path: "/run/matrix-synapse/replication_mediaworker.sock" +... + normal_sync1: + path: "unix:/run/matrix-synapse/replication_normal_sync1.sock" + normal_sync2: + path: "unix:/run/matrix-synapse/replication_normal_sync2.sock" + normal_sync3: + path: "unix:/run/matrix-synapse/replication_normal_sync3.sock" +``` + + +## Defining a worker + +Every working starts with the normal configuration files, and then loads its +own. We put those files under `/etc/matrix-synapse/workers`. You have to +create that directory, and make sure Synapse can read them. Being +profesionally paranoid, we restrict access to that directory and the files in +it: + +``` +mkdir /etc/matrix-synapse/workers +chown matrix-synapse:matrix-synapse /etc/matrix-synapse/workers +chmod 750 /etc/matrix-synapse-workers +``` + +We'll fill this directory with `yaml` files; one for each worker. + + +### Generic worker + +Workers look very much the same, very little configuration is needed. This is +what you need: + +* name +* replication socket (not every worker needs this) +* inbound socket (not every worker needs this) +* log configuration + +One worker we use handles the login actions, this is how it's configured in +/etc/matrix-synapse/workers/login.yaml`: + +``` +worker_app: "synapse.app.generic_worker" +worker_name: "login" +worker_log_config: "/etc/matrix-synapse/logconf.d/login.yaml" + +worker_listeners: + - path: "/run/matrix-synapse/inbound_login.sock" + type: http + resources: + - names: + - client + - consent + - federation + + - path: "/run/matrix-synapse/replication_login.sock" + type: http + resources: + - names: [replication] +``` + +The first line defines the type of worker. In the past there were quite a few +different types, but most of them have been phased out in favour of one +generic worker. + +The first listener is the socket where nginx sends all traffic related to logins +to. You have to configure nginx to do that, we'll get to that later. + +The `worker_log_config` defines how and where the worker logs. Of course you'll +need to configure that too, see further. + +The first `listener` is the inbound socket, that nginx uses to forward login +related traffic to. Make sure nginx can write to this socket. The +`resources` vary between workers. + +The second `listener` is used for communication with the other workers and the +main thread. The only `resource` it needs is `replication`. This socket needs +to be listed in the `instance_map` in the main thread, the inbound socket does +not. + +Of course, if you need to scale up to the point where you need more than one +machine, these listeners can no longer use UNIX sockets, but will have to use +the network. This creates extra overhead, so you want to use sockets whenever +possible. + + +### Media worker + +The media worker is slightly different than the generic one. It doesn't use the +`synapse.app.generic_worker`, but a specialised one: `synapse.app.media_repository`. +To prevent the main process from handling media itself, you have to explicitly +tell it to leave that to the worker, by adding this to the configuration (in +our setup `conf.d/listeners.yaml`): + +``` +enable_media_repo: false +media_instance_running_background_jobs: mediaworker +``` + +The worker `mediaworker` looks like this: + +``` +worker_app: "synapse.app.media_repository" +worker_name: "mediaworker" +worker_log_config: "/etc/matrix-synapse/logconf.d/media.yaml" + +worker_listeners: + - path: "/run/matrix-synapse/inbound_mediaworker.sock" + type: http + resources: + - names: [media] + + - path: "/run/matrix-synapse/replication_mediaworker.sock" + type: http + resources: + - names: [replication] +``` + +If you use more than one mediaworker, know that they must all run on the same +machine; scaling it over more than one machine will not work. + + +## Worker logging + +As stated before, you configure the logging of workers in a separate yaml +file. As with the definitions of the workers themselves, you need a directory for +that. We'll use `/etc/matrix-synapse/logconf.d` for that; make it and fix the +permissions. + +``` +mkdir /etc/matrix-synapse/logconf.d +chgrp matrix-synapse /etc/matrix-synapse/logconf.d +chmod 750 /etc/matrix-synapse/logconf.d +``` + +There's a lot you can configure for logging, but for now we'll give every +worker the same layout. Here's the configuration for the `login` worker: + +``` +version: 1 +formatters: + precise: + format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s' +handlers: + file: + class: logging.handlers.TimedRotatingFileHandler + formatter: precise + filename: /var/log/matrix-synapse/login.log + when: midnight + backupCount: 3 + encoding: utf8 + + buffer: + class: synapse.logging.handlers.PeriodicallyFlushingMemoryHandler + target: file + capacity: 10 + flushLevel: 30 + period: 5 + +loggers: + synapse.metrics: + level: WARN + handlers: [buffer] + synapse.replication.tcp: + level: WARN + handlers: [buffer] + synapse.util.caches.lrucache: + level: WARN + handlers: [buffer] + twisted: + level: WARN + handlers: [buffer] + synapse: + level: INFO + handlers: [buffer] + +root: + level: INFO + handlers: [buffer] +``` + +The only thing you need to change if the filename to which the logs are +written. You could create only one configuration and use that in every worker, +but that would mean all logs will end up in the same file, which is probably +not what you want. + +See the [Python +documentation](https://docs.python.org/3/library/logging.config.html#configuration-dictionary-schema) +for all the ins and outs of logging. + + +# Systemd + +You want Synapse and its workers managed by systemd. First of all we define a +`target`: a group of services that belong together. + +``` +systemctl edit --force --full matrix-synapse.target +``` + +Feed it with this bit: + +``` +[Unit] +Description=Matrix Synapse with all its workers +After=network.target + +[Install] +WantedBy=multi-user.target +``` + +First add `matrix-synapse.service` to this target by overriding the `WantedBy` +in the unit file. We're overriding and adding a bit more. + +``` +systemctl edit matrix-synapse.service +``` + +Add this to the overrides: + +``` +[Unit] +PartOf=matrix-synapse.target +Before=matrix-synapse-worker +ReloadPropagatedFrom=matrix-synapse.target + +[Service] +RuntimeDirectory=matrix-synapse +RuntimeDirectoryMode=0770 +RuntimeDirectoryPreserve=yes + +[Install] +WantedBy=matrix-synapse.target +``` + +The additions under `Unit` mean that `matrix-synapse.service` is part of the +target we created earlier, and that is should start before the workers. +Restarting the target means this service must be restarted too. + +Under `Service` we define the directory where the sockets live (`/run` is +prefixed automatically), its permissions and that it should not be removed if +the service is stopped. + +The `WantedBy` under `Install` includes it in the target. The target itself is +included in `multi-user.target`, so it should always be started in the multi-user +runlevel. + +For the workers we're using a template instead of separate unit files for every +single one. Create the template: + +``` +systemctl edit --full --force matrix-synapse-worker@ +``` + +Mind the `@` at the end, that's not a typo. Fill it with this content: + +``` +[Unit] +Description=Synapse worker %i +AssertPathExists=/etc/matrix-synapse/workers/%i.yaml + +# This service should be restarted when the synapse target is restarted. +PartOf=matrix-synapse.target +ReloadPropagatedFrom=matrix-synapse.target + +# if this is started at the same time as the main, let the main process start +# first, to initialise the database schema. +After=matrix-synapse.service + +[Service] +Type=notify +NotifyAccess=main +User=matrix-synapse +Group=matrix-synapse +WorkingDirectory=/var/lib/matrix-synapse +ExecStart=/opt/venvs/matrix-synapse/bin/python -m synapse.app.generic_worker --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/%i.yaml +ExecReload=/bin/kill -HUP $MAINPID +Restart=always +RestartSec=3 +SyslogIdentifier=matrix-synapse-%i + +[Install] +WantedBy=matrix-synapse.target +``` + +Now you can start/stop/restart every worker individually. Starting the `login` +worker would be done by: + +``` +systemctl start matrix-synapse-worker@login +``` + +Every worker needs to be enabled and started individually. Quickest way to do +that, is to run a loop in the directory: + +``` +cd /etc/matrix-synapse/workers +for worker in `ls *yaml | sed -n 's/\.yaml//p'`; do systemctl enable matrix-synapse-worker@$worker; done +``` + +After a reboot, Synapse and all its workers should be started. But starting +the target should also do that: + +``` +systemctl start matrix-synapse.target +``` + +This should start `matrix-synapse.service` first, the main worker. After that +all the workers should be started too. Check if the correct sockets appear and +if there are any error messages in the logs. + + +# nginx + +We may have a lot of workers, but if nginx doesn't forward traffic to the +correct worker(s), it won't work. We're going to have to change nginx's +configuration quite a bit. + +See [Deploying a Synapse Homeserver with +Docker](https://tcpipuk.github.io/synapse/deployment/nginx.html) for the +inspiration. This details a Docker installation, which we don't have, but the +reasoning behind it applies to our configuration too. + +Here's [how to configure nginx for use with workers](../../nginx/workers). diff --git a/matrix/synapse/workers/federation_receiver1.yaml b/matrix/synapse/workers/federation_receiver1.yaml new file mode 100644 index 00000000..64f394fa --- /dev/null +++ b/matrix/synapse/workers/federation_receiver1.yaml @@ -0,0 +1,15 @@ +worker_app: "synapse.app.generic_worker" +worker_name: "federation_reader1" +worker_log_config: "/etc/matrix-synapse/logconf.d/federation_reader-log.yaml" + +worker_listeners: + - path: "/run/matrix-synapse/replication_federation_reader1.sock" + type: http + resources: + - names: [replication] + + - path: "/run/matrix-synapse/inbound_federation_reader1.sock" + type: http + resources: + - names: [federation] + diff --git a/matrix/synapse/workers/federation_sender1.yaml b/matrix/synapse/workers/federation_sender1.yaml new file mode 100644 index 00000000..d2b0399c --- /dev/null +++ b/matrix/synapse/workers/federation_sender1.yaml @@ -0,0 +1,10 @@ +worker_app: "synapse.app.generic_worker" +worker_name: "federation_sender1" +worker_log_config: "/etc/matrix-synapse/logconf.d/federation_sender-log.yaml" + +worker_listeners: + - path: "/run/matrix-synapse/replication_federation_sender1.sock" + type: http + resources: + - names: [replication] + diff --git a/matrix/synapse/workers/initial_sync1.yaml b/matrix/synapse/workers/initial_sync1.yaml new file mode 100644 index 00000000..45d9b858 --- /dev/null +++ b/matrix/synapse/workers/initial_sync1.yaml @@ -0,0 +1,19 @@ +worker_app: "synapse.app.generic_worker" +worker_name: "initial_sync1" +worker_log_config: "/etc/matrix-synapse/logconf.d/initial_sync-log.yaml" + +worker_listeners: + + - path: "/run/matrix-synapse/inbound_initial_sync1.sock" + type: http + resources: + - names: + - client + - consent + - federation + + - path: "/run/matrix-synapse/replication_initial_sync1.sock" + type: http + resources: + - names: [replication] + diff --git a/matrix/synapse/workers/login-log.yaml b/matrix/synapse/workers/login-log.yaml new file mode 100644 index 00000000..7cb59753 --- /dev/null +++ b/matrix/synapse/workers/login-log.yaml @@ -0,0 +1,41 @@ +version: 1 +formatters: + precise: + format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s' +handlers: + file: + class: logging.handlers.TimedRotatingFileHandler + formatter: precise + filename: /var/log/matrix-synapse/login.log + when: midnight + backupCount: 3 + encoding: utf8 + + buffer: + class: synapse.logging.handlers.PeriodicallyFlushingMemoryHandler + target: file + capacity: 10 + flushLevel: 30 + period: 5 + +loggers: + synapse.metrics: + level: WARN + handlers: [buffer] + synapse.replication.tcp: + level: WARN + handlers: [buffer] + synapse.util.caches.lrucache: + level: WARN + handlers: [buffer] + twisted: + level: WARN + handlers: [buffer] + synapse: + level: INFO + handlers: [buffer] + +root: + level: INFO + handlers: [buffer] + diff --git a/matrix/synapse/workers/login.yaml b/matrix/synapse/workers/login.yaml new file mode 100644 index 00000000..c21bd540 --- /dev/null +++ b/matrix/synapse/workers/login.yaml @@ -0,0 +1,19 @@ +worker_app: "synapse.app.generic_worker" +worker_name: "login" +worker_log_config: "/etc/matrix-synapse/logconf.d/login-log.yaml" + +worker_listeners: + + - path: "/run/matrix-synapse/inbound_login.sock" + type: http + resources: + - names: + - client + - consent + - federation + + - path: "/run/matrix-synapse/replication_login.sock" + type: http + resources: + - names: [replication] + diff --git a/matrix/synapse/workers/media-log.yaml b/matrix/synapse/workers/media-log.yaml new file mode 100644 index 00000000..bbddbc1d --- /dev/null +++ b/matrix/synapse/workers/media-log.yaml @@ -0,0 +1,41 @@ +version: 1 +formatters: + precise: + format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s' +handlers: + file: + class: logging.handlers.TimedRotatingFileHandler + formatter: precise + filename: /var/log/matrix-synapse/media.log + when: midnight + backupCount: 3 + encoding: utf8 + + buffer: + class: synapse.logging.handlers.PeriodicallyFlushingMemoryHandler + target: file + capacity: 10 + flushLevel: 30 + period: 5 + +loggers: + synapse.metrics: + level: WARN + handlers: [buffer] + synapse.replication.tcp: + level: WARN + handlers: [buffer] + synapse.util.caches.lrucache: + level: WARN + handlers: [buffer] + twisted: + level: WARN + handlers: [buffer] + synapse: + level: INFO + handlers: [buffer] + +root: + level: INFO + handlers: [buffer] + diff --git a/matrix/synapse/workers/media.yaml b/matrix/synapse/workers/media.yaml new file mode 100644 index 00000000..65b3bf18 --- /dev/null +++ b/matrix/synapse/workers/media.yaml @@ -0,0 +1,15 @@ +worker_app: "synapse.app.media_repository" +worker_name: "mediaworker" +worker_log_config: "/etc/matrix-synapse/logconf.d/media-log.yaml" + +worker_listeners: + - path: "/run/matrix-synapse/inbound_mediaworker.sock" + type: http + resources: + - names: [media] + + - path: "/run/matrix-synapse/replication_mediaworker.sock" + type: http + resources: + - names: [replication] +