Add control server operations guide
This commit is contained in:
parent
14f6348bf4
commit
4cb3a41f1c
1 changed files with 560 additions and 0 deletions
560
docs/control-server-guide.md
Normal file
560
docs/control-server-guide.md
Normal file
|
|
@ -0,0 +1,560 @@
|
|||
# Control Server Operations Guide
|
||||
|
||||
**Host:** control (CT 127)
|
||||
**IP:** 192.168.1.127
|
||||
**Location:** pve2
|
||||
**User:** maddox
|
||||
**Last Updated:** January 23, 2026
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The control server is the centralized command center for managing the Proxmox cluster infrastructure. It provides:
|
||||
|
||||
- **Passwordless SSH** to all 13 managed hosts
|
||||
- **Ansible automation** for cluster-wide operations
|
||||
- **tmux sessions** for multi-host management
|
||||
- **Git-based configuration** synced to Forgejo
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Launch Interactive Menu
|
||||
```bash
|
||||
~/scripts/control-menu.sh
|
||||
```
|
||||
|
||||
### Launch Multi-Host SSH Session
|
||||
```bash
|
||||
~/scripts/ssh-manager.sh
|
||||
```
|
||||
|
||||
### Run Ansible Ad-Hoc Command
|
||||
```bash
|
||||
cd ~/clustered-fucks
|
||||
ansible all -m ping
|
||||
ansible docker_hosts -m shell -a "docker ps --format 'table {{.Names}}\t{{.Status}}'"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
/home/maddox/
|
||||
├── .ssh/
|
||||
│ ├── config # SSH host definitions
|
||||
│ ├── tmux-hosts.conf # tmux session configuration
|
||||
│ ├── id_ed25519 # SSH private key
|
||||
│ └── id_ed25519.pub # SSH public key (add to new hosts)
|
||||
│
|
||||
├── clustered-fucks/ # Git repo (synced to Forgejo)
|
||||
│ ├── ansible.cfg # Ansible configuration
|
||||
│ ├── inventory/
|
||||
│ │ ├── hosts.yml # Host inventory
|
||||
│ │ └── group_vars/
|
||||
│ │ └── all.yml # Global variables
|
||||
│ └── playbooks/
|
||||
│ ├── check-status.yml
|
||||
│ ├── docker-prune.yml
|
||||
│ ├── restart-utils.yml
|
||||
│ ├── update-all.yml
|
||||
│ └── deploy-utils.yml
|
||||
│
|
||||
└── scripts/
|
||||
├── ssh-manager.sh # tmux multi-host launcher
|
||||
├── control-menu.sh # Interactive Ansible menu
|
||||
└── add-host.sh # New host onboarding
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Managed Hosts
|
||||
|
||||
| Host | IP | User | Port | Type | Group |
|
||||
|------|-----|------|------|------|-------|
|
||||
| pve2 | .3 | root | 22 | Proxmox | proxmox_nodes |
|
||||
| pve-dell | .4 | root | 22 | Proxmox | proxmox_nodes |
|
||||
| replicant | .80 | maddox | 22 | VM | docker_hosts |
|
||||
| databases | .81 | root | 22 | VM | docker_hosts |
|
||||
| immich | .82 | root | 22 | VM | docker_hosts |
|
||||
| media-transcode | .120 | root | 22 | LXC | docker_hosts |
|
||||
| network-services | .121 | root | 22 | LXC | docker_hosts |
|
||||
| download-stack | .122 | root | 22 | LXC | docker_hosts |
|
||||
| docker666 | .123 | root | 22 | LXC | docker_hosts |
|
||||
| tailscale-home | .124 | root | 22 | LXC | docker_hosts |
|
||||
| dns-lxc | .125 | root | 22 | LXC | infrastructure |
|
||||
| nas | .251 | maddox | 44822 | NAS | legacy |
|
||||
| alien | .252 | maddox | 22 | Docker | legacy |
|
||||
|
||||
---
|
||||
|
||||
## Ansible Host Groups
|
||||
|
||||
| Group | Members | Use Case |
|
||||
|-------|---------|----------|
|
||||
| `all` | All 13 hosts | Connectivity tests |
|
||||
| `docker_hosts` | 8 hosts | Docker operations |
|
||||
| `all_managed` | 11 hosts | System updates |
|
||||
| `proxmox_nodes` | pve2, pve-dell | Node-level ops |
|
||||
| `infrastructure` | dns-lxc | Non-Docker infra |
|
||||
| `legacy` | nas, alien | Manual operations |
|
||||
| `vms` | replicant, databases, immich | VM-specific |
|
||||
| `lxcs` | 6 LXC containers | LXC-specific |
|
||||
|
||||
---
|
||||
|
||||
## Playbooks Reference
|
||||
|
||||
### check-status.yml
|
||||
Reports disk usage, memory usage, and container counts.
|
||||
|
||||
```bash
|
||||
ansible-playbook playbooks/check-status.yml
|
||||
```
|
||||
|
||||
**Target:** all_managed
|
||||
**Output:** Per-host status line (Disk=X% Mem=X% Containers=X)
|
||||
|
||||
---
|
||||
|
||||
### update-all.yml
|
||||
Runs apt update and upgrade on all Docker hosts.
|
||||
|
||||
```bash
|
||||
ansible-playbook playbooks/update-all.yml
|
||||
|
||||
# With reboot if required:
|
||||
ansible-playbook playbooks/update-all.yml -e "reboot=true"
|
||||
```
|
||||
|
||||
**Target:** docker_hosts
|
||||
**Note:** Checks for reboot requirement, notifies but doesn't auto-reboot unless `-e "reboot=true"`
|
||||
|
||||
---
|
||||
|
||||
### docker-prune.yml
|
||||
Cleans unused Docker resources (images, networks, build cache).
|
||||
|
||||
```bash
|
||||
ansible-playbook playbooks/docker-prune.yml
|
||||
```
|
||||
|
||||
**Target:** docker_hosts
|
||||
**Note:** dns-lxc will fail (no Docker) - this is expected
|
||||
|
||||
---
|
||||
|
||||
### restart-utils.yml
|
||||
Restarts the utils stack (watchtower, autoheal, docker-proxy) on all hosts.
|
||||
|
||||
```bash
|
||||
ansible-playbook playbooks/restart-utils.yml
|
||||
```
|
||||
|
||||
**Target:** docker_hosts
|
||||
**Note:** Uses host-specific `docker_appdata` variable for non-standard paths
|
||||
|
||||
---
|
||||
|
||||
### deploy-utils.yml
|
||||
Deploys standardized utils stack to a new host.
|
||||
|
||||
```bash
|
||||
ansible-playbook playbooks/deploy-utils.yml --limit new-host
|
||||
```
|
||||
|
||||
**Target:** docker_hosts
|
||||
**Note:** Creates directory structure and .env file only; compose file must be added separately
|
||||
|
||||
---
|
||||
|
||||
## Scripts Reference
|
||||
|
||||
### ssh-manager.sh
|
||||
|
||||
Launches a tmux session with SSH connections to all hosts.
|
||||
|
||||
```bash
|
||||
~/scripts/ssh-manager.sh
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Window 0: Control (local shell)
|
||||
- Windows 1-13: Individual host SSH sessions
|
||||
- Final window: Multi-View (all hosts in split panes)
|
||||
|
||||
**Navigation:**
|
||||
- `Ctrl+b` then window number to switch
|
||||
- `Ctrl+b d` to detach (keeps session running)
|
||||
- `tmux attach -t cluster` to reattach
|
||||
|
||||
---
|
||||
|
||||
### control-menu.sh
|
||||
|
||||
Interactive menu for common operations.
|
||||
|
||||
```bash
|
||||
~/scripts/control-menu.sh
|
||||
```
|
||||
|
||||
**Menu Options:**
|
||||
```
|
||||
[1] Ping All - Test connectivity
|
||||
[2] Check Status - Disk/memory/containers
|
||||
[3] Update All - apt upgrade docker hosts
|
||||
[4] Docker Prune - Clean unused resources
|
||||
[5] Restart Utils - Restart utils stack everywhere
|
||||
|
||||
[A] Ad-hoc Command - Run custom command
|
||||
[I] Inventory - Show host list
|
||||
[S] SSH Manager - Launch tmux session
|
||||
|
||||
[Q] Quit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### add-host.sh
|
||||
|
||||
Wizard for onboarding new hosts.
|
||||
|
||||
```bash
|
||||
~/scripts/add-host.sh
|
||||
```
|
||||
|
||||
**Steps:**
|
||||
1. Prompts for hostname, IP, user, port, description
|
||||
2. Tests SSH connectivity
|
||||
3. Copies SSH key if needed
|
||||
4. Adds to `~/.ssh/config`
|
||||
5. Adds to `~/.ssh/tmux-hosts.conf`
|
||||
|
||||
**Note:** Ansible inventory must be edited manually.
|
||||
|
||||
---
|
||||
|
||||
## Common Operations
|
||||
|
||||
### SSH to a Specific Host
|
||||
```bash
|
||||
ssh replicant
|
||||
ssh databases
|
||||
ssh nas # Uses port 44822 automatically
|
||||
```
|
||||
|
||||
### Run Command on All Docker Hosts
|
||||
```bash
|
||||
cd ~/clustered-fucks
|
||||
ansible docker_hosts -m shell -a "docker ps -q | wc -l"
|
||||
```
|
||||
|
||||
### Run Command on Specific Host
|
||||
```bash
|
||||
ansible replicant -m shell -a "df -h"
|
||||
```
|
||||
|
||||
### Copy File to All Hosts
|
||||
```bash
|
||||
ansible docker_hosts -m copy -a "src=/path/to/file dest=/path/to/dest"
|
||||
```
|
||||
|
||||
### Check Specific Service
|
||||
```bash
|
||||
ansible docker_hosts -m shell -a "docker ps --filter name=watchtower --format '{{.Status}}'"
|
||||
```
|
||||
|
||||
### View Ansible Inventory
|
||||
```bash
|
||||
ansible-inventory --graph
|
||||
ansible-inventory --list
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Git Workflow
|
||||
|
||||
### Repository Location
|
||||
- **Local:** `~/clustered-fucks/`
|
||||
- **Remote:** `ssh://git@192.168.1.81:2222/maddox/clustered-fucks.git`
|
||||
- **Web:** https://git.3ddbrewery.com/maddox/clustered-fucks
|
||||
|
||||
### Standard Workflow
|
||||
```bash
|
||||
cd ~/clustered-fucks
|
||||
|
||||
# Make changes to playbooks/inventory
|
||||
vim playbooks/new-playbook.yml
|
||||
|
||||
# Commit and push
|
||||
git add -A
|
||||
git commit -m "Add new playbook"
|
||||
git push origin main
|
||||
```
|
||||
|
||||
### Pull Latest Changes
|
||||
```bash
|
||||
cd ~/clustered-fucks
|
||||
git pull origin main
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Adding a New Host
|
||||
|
||||
### 1. Run Onboarding Script
|
||||
```bash
|
||||
~/scripts/add-host.sh
|
||||
```
|
||||
|
||||
### 2. Edit Ansible Inventory
|
||||
```bash
|
||||
vim ~/clustered-fucks/inventory/hosts.yml
|
||||
```
|
||||
|
||||
Add under appropriate group:
|
||||
```yaml
|
||||
new-host:
|
||||
ansible_host: 192.168.1.XXX
|
||||
ansible_user: root
|
||||
```
|
||||
|
||||
If non-standard appdata path:
|
||||
```yaml
|
||||
new-host:
|
||||
ansible_host: 192.168.1.XXX
|
||||
ansible_user: root
|
||||
docker_appdata: /custom/path/appdata
|
||||
```
|
||||
|
||||
### 3. Test Connection
|
||||
```bash
|
||||
ansible new-host -m ping
|
||||
```
|
||||
|
||||
### 4. Commit Changes
|
||||
```bash
|
||||
cd ~/clustered-fucks
|
||||
git add -A
|
||||
git commit -m "Add new-host to inventory"
|
||||
git push origin main
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### SSH Connection Refused
|
||||
```bash
|
||||
# Check if SSH is running on target
|
||||
ssh -v hostname
|
||||
|
||||
# If connection refused, access via Proxmox console:
|
||||
# For LXC: pct enter <CT_ID>
|
||||
# For VM: qm terminal <VM_ID>
|
||||
|
||||
# Inside container/VM:
|
||||
apt install openssh-server
|
||||
systemctl enable ssh
|
||||
systemctl start ssh
|
||||
```
|
||||
|
||||
### SSH Permission Denied
|
||||
```bash
|
||||
# Check key is in authorized_keys on target
|
||||
ssh-copy-id hostname
|
||||
|
||||
# If still failing, check permissions on target:
|
||||
# (via Proxmox console)
|
||||
chmod 700 ~
|
||||
chmod 700 ~/.ssh
|
||||
chmod 600 ~/.ssh/authorized_keys
|
||||
chown -R root:root ~/.ssh # or appropriate user
|
||||
```
|
||||
|
||||
### Ansible "Missing sudo password"
|
||||
The host is configured with `ansible_become: yes` but no password is set.
|
||||
|
||||
Fix: Either remove `ansible_become: yes` from inventory, or set up passwordless sudo on target:
|
||||
```bash
|
||||
echo "username ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/username
|
||||
```
|
||||
|
||||
### Playbook Skips Host
|
||||
Check if host is in the correct group:
|
||||
```bash
|
||||
ansible-inventory --graph
|
||||
```
|
||||
|
||||
Check host variables:
|
||||
```bash
|
||||
ansible-inventory --host hostname
|
||||
```
|
||||
|
||||
### Docker Command Not Found
|
||||
Host is in `docker_hosts` but doesn't have Docker. Move to `infrastructure` group:
|
||||
```yaml
|
||||
infrastructure:
|
||||
hosts:
|
||||
hostname:
|
||||
ansible_host: 192.168.1.XXX
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Non-Standard Configurations
|
||||
|
||||
### Hosts with Different Appdata Paths
|
||||
|
||||
| Host | Path |
|
||||
|------|------|
|
||||
| replicant | `/home/maddox/docker/appdata` |
|
||||
| docker666 | `/root/docker/appdata` |
|
||||
| All others | `/home/docker/appdata` |
|
||||
|
||||
These are handled via `docker_appdata` variable in inventory.
|
||||
|
||||
### Hosts with Non-Standard SSH
|
||||
|
||||
| Host | Port | User |
|
||||
|------|------|------|
|
||||
| nas | 44822 | maddox |
|
||||
|
||||
Configured in both `~/.ssh/config` and `inventory/hosts.yml`.
|
||||
|
||||
### Hosts Without Utils Stack
|
||||
|
||||
| Host | Reason |
|
||||
|------|--------|
|
||||
| tailscale-home | Only runs Headscale, no utils needed |
|
||||
| dns-lxc | No Docker installed |
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Update Ansible
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt upgrade ansible
|
||||
```
|
||||
|
||||
### Regenerate SSH Keys (if compromised)
|
||||
```bash
|
||||
# Generate new key
|
||||
ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519
|
||||
|
||||
# Distribute to all hosts (will prompt for passwords)
|
||||
for host in pve2 pve-dell replicant databases immich media-transcode network-services download-stack docker666 tailscale-home dns-lxc alien; do
|
||||
ssh-copy-id $host
|
||||
done
|
||||
|
||||
# NAS requires special handling
|
||||
ssh-copy-id -p 44822 maddox@192.168.1.251
|
||||
```
|
||||
|
||||
### Backup Configuration
|
||||
```bash
|
||||
cd ~/clustered-fucks
|
||||
git add -A
|
||||
git commit -m "Backup: $(date +%Y-%m-%d)"
|
||||
git push origin main
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Reference Files
|
||||
|
||||
### ~/.ssh/config
|
||||
```
|
||||
Host *
|
||||
StrictHostKeyChecking accept-new
|
||||
ServerAliveInterval 60
|
||||
ServerAliveCountMax 3
|
||||
|
||||
Host pve2
|
||||
HostName 192.168.1.3
|
||||
User root
|
||||
|
||||
Host pve-dell
|
||||
HostName 192.168.1.4
|
||||
User root
|
||||
|
||||
Host replicant
|
||||
HostName 192.168.1.80
|
||||
User maddox
|
||||
|
||||
Host databases
|
||||
HostName 192.168.1.81
|
||||
User root
|
||||
|
||||
Host immich
|
||||
HostName 192.168.1.82
|
||||
User root
|
||||
|
||||
Host media-transcode
|
||||
HostName 192.168.1.120
|
||||
User root
|
||||
|
||||
Host network-services
|
||||
HostName 192.168.1.121
|
||||
User root
|
||||
|
||||
Host download-stack
|
||||
HostName 192.168.1.122
|
||||
User root
|
||||
|
||||
Host docker666
|
||||
HostName 192.168.1.123
|
||||
User root
|
||||
|
||||
Host tailscale-home
|
||||
HostName 192.168.1.124
|
||||
User root
|
||||
|
||||
Host dns-lxc
|
||||
HostName 192.168.1.125
|
||||
User root
|
||||
|
||||
Host nas
|
||||
HostName 192.168.1.251
|
||||
User maddox
|
||||
Port 44822
|
||||
|
||||
Host alien
|
||||
HostName 192.168.1.252
|
||||
User maddox
|
||||
```
|
||||
|
||||
### ~/clustered-fucks/ansible.cfg
|
||||
```ini
|
||||
[defaults]
|
||||
inventory = inventory/hosts.yml
|
||||
remote_user = root
|
||||
host_key_checking = False
|
||||
retry_files_enabled = False
|
||||
gathering = smart
|
||||
fact_caching = jsonfile
|
||||
fact_caching_connection = /tmp/ansible_facts
|
||||
fact_caching_timeout = 86400
|
||||
stdout_callback = yaml
|
||||
forks = 10
|
||||
|
||||
[privilege_escalation]
|
||||
become = False
|
||||
|
||||
[ssh_connection]
|
||||
pipelining = True
|
||||
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
| Date | Change |
|
||||
|------|--------|
|
||||
| 2026-01-23 | Initial deployment, all hosts connected, playbooks tested |
|
||||
Loading…
Reference in a new issue