Files
nadir-agent/README.md
T

498 lines
23 KiB
Markdown
Raw Permalink Normal View History

2026-06-22 16:06:57 +02:00
# Nadir
Nadir is a lightweight, modular Linux system-administration backend - a modern,
FOSS system admin panels. It exposes a typed REST API for the everyday tasks
you'd otherwise SSH in to do: inspect the host, manage systemd services, edit
local users and groups, install packages, and read logs - all behind
role-based access control and a tamper-evident audit trail.
The API is generated with [Huma](https://huma.rocks) (OpenAPI 3.1) and ships
interactive docs at `/docs`. The backend is Go and self-contained: no external
database, no agent, no runtime dependencies beyond the standard system tools it
drives (`systemctl`, `hostnamectl`, `useradd`, the host package manager, …).
---
## What it does
Functionality is organized into **modules**. Each module owns a slice of the
API and declares its own permission vocabulary.
- **System** - Dashboard overview (OS/kernel, CPU, memory, disks, load, uptime,
network interfaces, temperatures); get/set hostname; time, timezone, and NTP;
locale and console keymap; reboot and power off.
- **Services** - List and inspect systemd units; start / stop / restart / enable
/ disable; read service logs from the journal or an allowlisted file, as a
snapshot or a live Server-Sent-Events stream.
- **Users** - List, inspect, create, and delete local accounts; set a password;
set supplementary groups.
- **Groups** - List, inspect, create, and delete local groups.
- **Packages** - List installed packages and available updates; install, remove,
and upgrade - streamed live over SSE. Auto-detects `dnf`, `apt`, or `pacman`.
- **Networking** - List network interfaces, routing tables, and DNS settings; configure IPv4 settings with temporary applying and safety auto-rollback; bring interfaces up or down.
- **Audit** - Read-only trail of every privileged write (who, what, when, result).
- **Terminal** - Interactive shell access. Upgrades connection to a WebSocket and spawns a PTY shell as the logged-in user (requires `root` permission).
- **Meta** - Self-description for clients: `/api/_modules`, `/api/whoami`,
`/api/health`.
### Security model at a glance
- **Authentication** is delegated to PAM (`pam_unix`), so logins use real system
credentials. A successful login sets an `HttpOnly`, `SameSite=Strict` session
cookie; sessions are stored in SQLite and survive restarts.
- **Machine credentials** for non-interactive callers (e.g. a central dashboard
managing many nodes) authenticate with a static `Authorization: Bearer nad_…`
token instead of a PAM session. Mint with `nadir token add <name>` (shown once,
only its SHA-256 is stored); revoke with `nadir token rm <name>` (immediate, no
restart). A token is an ordinary RBAC subject - its name is assigned a role in
`config.yaml` `assignments`, so a leaked token is scoped, not implicitly admin.
The audit trail records the actor as `token:<name>` to distinguish it from a
human. CSRF does not apply: browsers never auto-attach a Bearer header, so the
same-origin cookie defense is irrelevant for token auth. Bad-token guesses are
throttled per source IP.
- **Authorization** is RBAC driven entirely by `config.yaml`. Every protected
operation declares a `module` and one of three permission tiers:
- `read` - inspect (list users, read status, view logs…)
- `write` - routine changes (create a user, restart a service, set the hostname…)
- `root` - high-impact or irreversible actions (reboot, delete an account,
**reset a password**, **change group membership**). Password and group-
membership changes are `root` precisely because they can hand someone root.
- **Brute-force throttling** on login (per username + source IP cooldown).
- **CSRF** defense via `SameSite=Strict` plus a same-origin check on writes.
- **Audit** of every mutation, written off the request path to SQLite.
- The server **must run as root** - PAM reads `/etc/shadow`, and the system
tools it drives (`hostnamectl`, `systemctl`, `useradd`, `shutdown`, …) require
it.
<!-- api-desc-end -->
---
## Installing
### Prerequisites
- Linux with **systemd** (the Services module and the `nadir` service wrapper
use it).
- **Root** access (see above).
- Go (recent) to build from source.
### Build
The entry point is the `main` package under `cmd/server`:
```bash
go build -o nadir ./cmd/server
```
This produces a single static-ish binary, `nadir`.
### Run directly
On first start, `nadir` requires a configuration file to exist. If the configuration is missing, the server will fail to start and ask you to run `nadir install` (to install the systemd service) or use `--save-config`.
To generate a default configuration file (assigning the admin role to your current user) without installing the systemd service:
```bash
./nadir --save-config
```
To save it for the root user (who runs the server):
```bash
sudo ./nadir --save-config
```
You can also specify a custom path using `-f`/`--config`:
```bash
./nadir --save-config -f ./config.yaml
```
Once the configuration file is created, start the server directly:
```bash
sudo ./nadir # same as: sudo ./nadir run
```
By default it reads `~/.config/config.yaml` (resolving to the running user's home, i.e., `/root/.config/config.yaml` when run as root); override with the `-f`/`--config` flag or `CONFIG_PATH` env var:
```bash
sudo ./nadir -f /etc/nadir/config.yaml
# or: sudo CONFIG_PATH=/etc/nadir/config.yaml ./nadir
```
By default it serves **HTTPS** with a self-signed certificate (see
[Deployment note 2](#2-tls-three-modes)) on the `hostname:port` from the config,
and exposes interactive docs at `https://<host>:<port>/docs` and the raw spec at
`/openapi.json`.
### Run in the background (`-d`)
Like `docker run -d`, this detaches from the terminal and returns your shell:
```bash
sudo ./nadir run -d
# nadir running in background (pid 12345); logs: /var/lib/nadir/server.log
# follow with: nadir logs
```
Output goes to `/var/lib/nadir/server.log`.
### Install as a systemd service (start on boot)
For a real deployment, register nadir as a service so it starts on boot and is
managed with the usual tooling:
```bash
sudo ./nadir install # writes the unit, enables it, and starts it now
sudo ./nadir status
sudo ./nadir logs # follow the journal live
```
`install` writes `/etc/systemd/system/nadir.service` pinning the **absolute**
binary and the absolute config file path (so it doesn't depend on the working directory at
boot), runs `systemctl daemon-reload`, and `enable --now`. If no configuration file
exists at the target path, `install` automatically creates a default config file and
assigns the admin role to the installing user.
### CLI reference
| Command | Effect |
| ------------------------------------------------ | --------------------------------------------------------------------------- |
| `nadir [run] [-d]` | Start the server. `-d` / `--detach` runs it in the background. |
| `nadir --save-config` | Save the default configuration template to the target path and exit. |
| `nadir install` | Install + enable the systemd service (starts now and on boot). |
| `nadir uninstall` | Stop, disable, and remove the systemd service. |
| `nadir start` \| `stop` \| `restart` \| `status` | Control the running service. |
| `nadir enable` \| `disable` | Toggle start-on-boot without removing the unit. |
| `nadir logs` | Follow logs - journald if installed as a service, otherwise the detach log. |
| `nadir help` | Show usage. |
Most commands need root.
---
## Configuration (`config.yaml`)
`config.yaml` is the single source of truth for runtime configuration: server
and TLS settings, which roles exist, what each role can do, and who holds which
role. By default, it reads `~/.config/config.yaml`. The path can be overridden using
the `-f` / `--config` CLI flags or the `CONFIG_PATH` environment variable.
```yaml
server:
secure_tls: true # Secure flag on the session cookie (keep true behind TLS)
trust_proxy: true # a reverse proxy terminates TLS; see Deployment note 3
# tls_cert: /etc/nadir/tls/cert.pem # or terminate TLS in nadir yourself
# tls_key: /etc/nadir/tls/key.pem
hostname: 100.64.0.189
port: 9999
# Quote "*" - bare * is YAML alias syntax and fails to parse.
roles:
admin:
"*": ["*"] # every permission on every module (including future ones)
auditor:
"*": ["read"] # read-only everywhere
system_ops:
system: ["read", "write"]
assignments:
urania: [admin]
# Optional: per-unit allowlist of log files the Services module may read.
log_files:
nginx:
- /var/log/nginx/access.log
- /var/log/nginx/error.log
```
### `server`
| Key | Default | Meaning |
| --------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `secure_tls` | `true` | Sets the `Secure` flag on the session cookie. Keep `true` whenever the browser reaches nadir over HTTPS (direct or via proxy); `false` only for local plain-HTTP dev. |
| `trust_proxy` | `false` | When `true`, nadir serves plaintext HTTP and trusts `X-Forwarded-For` / forwarded `Host` from the proxy. See [Deployment note 3](#3-reverse-proxy--vpn). |
| `tls_cert`, `tls_key` | - | PEM paths. When both are set (and `trust_proxy` is off), nadir terminates TLS with this pair. |
| `hostname` | - | Address to bind. Use `127.0.0.1` for local-only, or an overlay/VPN address to expose nadir only on that interface. |
| `port` | - | TCP port to listen on. |
TLS selection is covered in [Deployment note 2](#2-tls-three-modes).
### `roles` / `assignments`
- `roles` maps a role name to `module → [permissions]`. `"*"` as the module key
means "all modules"; `"*"` in the permission list means "all permissions".
- `assignments` maps a username to the roles they hold; effective grants are the
union.
- `"*"` must be quoted - bare `*` is YAML alias syntax and fails to parse.
- Module keys and permissions are validated at startup against the modules
actually compiled in. An unknown module, an unexported permission, or an
assignment to an undefined role aborts startup with a clear message rather
than silently granting or denying access.
- Each module owns its permission vocabulary via `Permissions()`, so adding a
module automatically makes it available to wildcard roles and validatable for
restricted ones. Clients discover the live module/permission set at
`GET /api/_modules`, and a user's own grants at `GET /api/whoami`.
### `log_files`
An allowlist, keyed by unit, of log file paths the Services module is allowed to
read via the `source=file` log endpoints. The caller can only read paths an
admin has listed here - never an arbitrary file.
---
## Deployment notes
These notes capture the non-obvious operational decisions. They'll seed
the formal installation guide.
### 1. PAM service
**Nadir authenticates against its own PAM service, `/etc/pam.d/nadir`, and the
server creates that file on startup if it is missing** (see
`internal/auth/pamservice.go`). Here is why.
#### What went wrong with stock services
Originally we authenticated against the `"login"` service. On a Framework
laptop (and many other machines) `/etc/pam.d/login` pulls in `system-auth`,
whose auth stack lists `pam_fprintd.so` as `sufficient` **before**
`pam_unix.so`:
```
auth sufficient pam_fprintd.so # fingerprint, tried first
auth sufficient pam_unix.so nullok # password, only reached if fprintd fails
```
Our PAM conversation callback only answers the password prompt; it can't swipe
a finger. So `pam_fprintd` would start a fingerprint scan and **block until its
~30-second timeout** before falling through to the password check. Every login
took 30s. (It was never a network, D-Bus, systemd, or NSS problem —
`hostnamectl` was instant and there is no SSSD/LDAP on the box.)
Switching to `"passwd"` is not a fix either: `/etc/pam.d/passwd` has only a
`password` stack and no `auth` stack, so it can't verify a login.
#### The fix
Ship a dedicated, minimal service - exactly what `sshd`, `cockpit`, and
`polkit` do. `/etc/pam.d/nadir` contains only:
```
#%PAM-1.0
auth required pam_unix.so
account required pam_unix.so
```
That is a straight `/etc/shadow` password check plus an account-validity check
— no fingerprint, no systemd, no env loading, no DNS. Authentication drops from
~30s to milliseconds, and we stop inheriting whatever the distro's login stack
happens to do.
Notes:
- We omit `nullok` on purpose: this service is reachable over the network, and
`nullok` would let passwordless accounts log in.
- `EnsurePAMService()` **only writes the file when it is absent** - a missing
service falls through to `/etc/pam.d/other` (`pam_deny`), which looks identical
to "wrong credentials". If an admin customizes the file, nadir leaves it
untouched.
- `pam_unix` reads `/etc/shadow`, so the server must run as root.
### 2. TLS: three modes
Credentials and session cookies must never travel in cleartext. Nadir picks how
the connection is secured from `config.yaml`, in priority order:
1. **Behind a reverse proxy** (`trust_proxy: true`) - a proxy such as Traefik
terminates TLS and forwards plaintext to nadir on a trusted network. Keep
`secure_tls: true` (the browser↔proxy leg is HTTPS). This is the deployment
covered in note 3.
2. **Nadir terminates TLS** (`tls_cert` + `tls_key`) - point both at a PEM
certificate/key pair and nadir serves HTTPS directly. Use this when there is
no proxy.
3. **Self-signed (dev only)** - when none of the above is configured, nadir
generates a fresh in-memory self-signed certificate (valid for `localhost`
and the loopback addresses, one year). Browsers will warn; that's expected.
Never rely on this in production.
To create a persistent self-signed pair for mode 2 in development:
```bash
openssl req -x509 -newkey rsa:2048 -nodes \
-keyout key.pem -out cert.pem -days 365 \
-subj "/O=nadir-dev-local/CN=localhost" \
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1,IP:::1"
```
…then set `tls_cert`/`tls_key` to those paths.
### 3. Reverse proxy + VPN
When nadir runs behind a TLS-terminating reverse proxy (e.g. Traefik) on a
private overlay network, set `trust_proxy: true`. Nadir then serves plaintext
HTTP and trusts `X-Forwarded-For` (used by the login throttle) and the forwarded
`Host` (used by the CSRF same-origin check). **That trust is only safe if
nothing but the proxy can reach the app's port** - otherwise any client that
reaches it directly can forge those headers.
The recommended shape: the proxy and the app each sit on a WireGuard-based
overlay, and nadir binds to its overlay address so the public/LAN interfaces
never answer.
```yaml
server:
trust_proxy: true
secure_tls: true # browser↔proxy leg is HTTPS, so keep the cookie Secure
hostname: 100.64.0.189 # the app's overlay IP - only the VPN interface listens
port: 9999
```
**Netbird / Tailscale** assign peers out of `100.64.0.0/10` (RFC 6598 CGNAT),
which is not publicly routable - binding there means only VPN peers can connect.
**Plain WireGuard** is the same idea with a private range you pick (e.g.
`10.0.0.0/24`); bind to the app's address on the `wg0` interface.
Two things make the header trust airtight:
1. **Restrict the port to the proxy peer only.** Binding to the overlay limits
reachability to _all_ VPN peers, not just the proxy. Tighten it so only the
proxy can reach `:9999`:
- _Netbird_: an access-control policy allowing the proxy peer/group → the app
peer on tcp/9999, denying others.
- _Tailscale_: an ACL rule (`"src": ["tag:proxy"], "dst": ["tag:nadir:9999"]`).
- _Plain WireGuard_: a host firewall rule on the app, e.g.
`iptables -A INPUT -i wg0 ! -s <proxy-wg-ip> -p tcp --dport 9999 -j DROP`.
2. **Make the proxy overwrite client-supplied forwarded headers.** Otherwise a
client sending its own `X-Forwarded-For` / `X-Forwarded-Host` can have it
passed through. In Traefik, mark the overlay as trusted on the entrypoint:
```yaml
# traefik static config
entryPoints:
websecure:
address: ":443"
forwardedHeaders:
trustedIPs:
- 100.64.0.0/10 # or your wg subnet, e.g. 10.0.0.0/24
```
And ensure it forwards the original host (Traefik does by default; nginx needs
`proxy_set_header Host $host;`), since the CSRF check compares `Origin`
against `Host`.
With both in place, the only path to the app is proxy → overlay → app, and the
forwarded headers are trustworthy. Without step 1 you're trusting every peer on
the overlay - fine for a single-tenant network you fully control, risky on a
shared one.
### 4. Connecting a dashboard (machine clients)
To manage one or more Nadir instances via a central dashboard or non-interactive client, authenticate requests using a static Bearer token rather than interactive PAM credentials.
Here is how to authorize and connect a dashboard:
#### Step 1: Mint a token
Run `nadir token add <name>` (for example, `dashboard`) to generate a unique API key:
```bash
sudo nadir token add dashboard
```
This generates a secure token starting with `nad_`. **Copy this token immediately**; only its SHA-256 hash is stored in `/var/lib/nadir/tokens.db` (shared via SQLite WAL between server and CLI), and the raw key cannot be retrieved again.
#### Step 2: Authorize the token in `config.yaml`
Minting and authorizing are deliberately separate steps (safe default). A newly minted token does not grant any access.
To grant the token a role, edit the `assignments` map in your `config.yaml`:
```yaml
assignments:
dashboard: [admin] # or another role like [system_ops] or [auditor]
```
The audit log will record mutations performed by this token as `token:<name>` (e.g., `token:dashboard`), distinguishing it from human logins.
#### Step 3: Restart Nadir
While token creation and revocation (`nadir token rm`) are written to the database and take effect immediately, policy assignments live in `config.yaml`. To reload the configuration and authorize the new token name, you must restart the Nadir server:
```bash
sudo systemctl restart nadir
```
#### Step 4: Configure the dashboard client
Configure your client to include the token in the HTTP `Authorization` header of every API request:
```http
Authorization: Bearer nad_your_secret_token_here
```
#### Note on CORS / Cross-Origin requests
If your dashboard runs as a web application directly in the user's browser (cross-origin relative to the Nadir instance) and makes state-changing write requests (`POST`, `PUT`, `DELETE`), the browser will include an `Origin` header.
To defend against CSRF, Nadir's middleware rejects state-changing requests if an `Origin` header is present and does not match the request's `Host` header.
To connect a browser-based dashboard hosted on a different origin, choose one of these patterns:
1. **Server-to-Server Calls (Recommended):** Build the dashboard with a backend that calls Nadir's API. Because the backend is not a browser, it does not send an `Origin` header, allowing the requests to pass.
2. **Reverse Proxy:** Terminate the dashboard and the Nadir instance under the same origin (e.g., dashboard at `https://control.example.com/` and Nadir at `https://control.example.com/api/nadir-node-1/`), letting a reverse proxy route the requests.
3. **Header Rewriting:** Have a proxy in front of Nadir rewrite/strip the `Origin` header for authorized token requests before forwarding them to Nadir.
---
## Layout
```
cmd/ process entry point + CLI (run / install / logs …), TLS, service wiring
internal/auth PAM auth, sessions, login/logout, login throttle, PAM service install
internal/config config.yaml loader + startup validation
internal/meta /api/_modules, /api/whoami, /api/health discovery endpoints
internal/module the Module interface
internal/modules concrete modules:
system - info, hostname, time/timezone/NTP, locale/keymap, power
services - systemd unit control + journal/file logs (snapshot + SSE)
users - local accounts
groups - local groups
packages - dnf/apt/pacman install/remove/upgrade (streamed)
audit - read-only audit trail
networking - network interfaces, routing tables, DNS, and IP configurations
terminal - interactive PTY shell over WebSocket
internal/oscmd shared command runner (timeouts, stderr surfacing) + helpers
internal/rbac roles, permissions ("*" wildcards), HTTP middleware (RBAC + CSRF)
internal/audit SQLite-backed audit log writer
```
## API docs
With the server running, browse `https://<host>:<port>/docs` for the Scalar UI,
or fetch the raw OpenAPI document from `/openapi.json`.
---
## Built with LLM assistance
This project was built with the help of large language models - but every
architectural choice, security decision, and operational trade-off is the
author's. The LLM never drove; it was a power tool, not a co-pilot with the
wheel.
In practice, the workflow looks like this: the author designs the feature,
decides how it should fit into the existing module structure, specifies the API
surface, and defines the security and permission semantics. The LLM then
accelerates the mechanical side - scaffolding boilerplate, drafting
implementations from precise instructions, generating documentation, and
proposing test cases. Every line of output is reviewed, corrected where needed,
and integrated only when it meets the project's standards.
What the LLM provides is _commodity leverage_: it collapses the time between
"I know exactly what I want" and "it's written, tested, and documented." What
it does not provide is judgment - that stays with the person who understands the
system, its threat model, and its users.
---
## License
[MIT](./LICENSE)
## Credits
Favicon: [Orbit](https://lucide.dev/icons/orbit) from [Lucide](https://lucide.dev), recolored. Lucide icons are licensed under the [ISC License](https://github.com/lucide-icons/lucide/blob/main/LICENSE).