Disaster Recovery Plan

This document outlines the procedures to restore critical services in the event of a catastrophic failure. The primary backup mechanism is ArchiveForge, which stores compressed tar.gz archives of service data on the NAS at 192.168.1.251.

Guiding Principles

Prioritize Critical Services: Restore essential services first (e.g., Traefik, Authelia, ArchiveForge), followed by high-priority applications.
Assume Total Loss: This plan assumes the primary application host (192.168.1.252) is unrecoverable and a new host is being provisioned.
Test Regularly: This plan should be tested quarterly to ensure its effectiveness.

Phase 1: Infrastructure Restoration

This phase focuses on bringing the core infrastructure back online on a new host.

Provision New Host:
- Install a fresh OS (e.g., Ubuntu Server).
- Install Docker and Docker Compose.
- Configure networking to match the old host's static IP (192.168.1.252).
Mount External Storage:
- Mount the NAS storage to the new host. Ensure the mount points are identical to the previous setup (e.g., /volume1/Media, /volume1/docker/backup).
- Verify read/write access.
Restore Core Services:
- Traefik: Restore the Traefik configuration from its backup location (if not part of appdata) or from a backup. Start Traefik.
- Authelia: Restore the Authelia configuration and start the service.
- ArchiveForge: Restore the ArchiveForge service. This is critical for restoring other applications.

Phase 2: Application Service Restoration

This phase details the process of restoring individual application services from the ArchiveForge backups.

General Restoration Steps

The general process for restoring a service from an ArchiveForge backup is as follows:

Identify the Latest Backup: Locate the most recent backup for the desired service in the ArchiveForge backup directory on the NAS (e.g., /volume1/docker/backup/ArchiveForge/daily/...).
Stop the Service: If the service is running (e.g., with a fresh but empty configuration), stop it:
```
cd /mnt/docker-storage/appdata/[service-name]
docker-compose down
```
Restore the Data: Extract the backup archive into the service's appdata directory. This will overwrite the existing configuration and data.
```
tar -xzf /path/to/backup/[service-name]-YYYYMMDD-HHMMSS.tar.gz -C /mnt/docker-storage/appdata/[service-name]
```
Verify Permissions: Ensure the restored files have the correct ownership and permissions. This is especially important if the PUID and PGID are used in the docker-compose.yml.

Start the Service:

cd /mnt/docker-storage/appdata/[service-name]
docker-compose up -d

Verify Functionality: Check the container logs and access the service's web UI to ensure it's running correctly and the data has been restored.

Example: Restoring Readarr

Locate Backup: Find the latest readarr backup on the NAS.

Stop Readarr:

cd /mnt/docker-storage/appdata/readarr
docker-compose down

Restore Data:

# Example path, replace with actual backup file
tar -xzf /volume1/docker/backup/ArchiveForge/daily/2025-12-09/readarr-20251209-020000.tar.gz -C /mnt/docker-storage/appdata/readarr

Start and Verify:
```
docker-compose up -d
docker-compose logs -f
```
Access https://readarr.3ddbrewery.com to confirm your library and settings are restored.

Phase 3: External Database Restoration

For services that use the external database on the NAS (192.168.1.251), a separate restoration procedure is required. This procedure depends on how that database is backed up (e.g., mysqldump snapshots).

This section needs to be completed once the backup strategy for the external database is fully documented.

Identify Backup: Locate the latest SQL dump file.

Restore Dump: Use the appropriate database command to restore the backup.

-- For MySQL/MariaDB
mysql -u [username] -p [database_name] < /path/to/backup.sql

-- For PostgreSQL
psql -U [username] -d [database_name] -f /path/to/backup.sql

Verify: Check the database to ensure the data has been restored correctly.

Next Review: This document should be reviewed and updated quarterly, or whenever there is a significant change to the infrastructure.

4.5 KiB Raw Blame History