198 lines
7.6 KiB
Markdown
198 lines
7.6 KiB
Markdown
# Troubleshooting Guide
|
|
|
|
_Last updated: 2026-01-05_
|
|
|
|
This guide provides solutions to common issues encountered in this Docker-based infrastructure.
|
|
|
|
## Issue: Container is restarting or won't start
|
|
|
|
**Symptoms:**
|
|
- `docker ps` shows the container is `restarting` or `exited`.
|
|
- `docker-compose up -d` command fails with an error.
|
|
|
|
**Diagnosis:**
|
|
1. **Check the logs:** The first step is always to check the container's logs.
|
|
```bash
|
|
docker-compose logs -f <service-name>
|
|
```
|
|
Look for error messages, stack traces, or any indication of what might be wrong.
|
|
|
|
2. **Check dependencies:** If the container depends on other services (e.g., a database), ensure those services are running and healthy.
|
|
```bash
|
|
docker-compose ps
|
|
```
|
|
|
|
3. **Check configuration:**
|
|
- **Environment variables:** Ensure all required environment variables are set correctly in the `.env` file or `docker-compose.yml`.
|
|
- **Volumes:** Verify that all volume paths are correct and that the files and directories on the host have the correct permissions. The user running the Docker container (often specified with `PUID` and `PGID`) needs to have read and write access to the volume paths.
|
|
- **Ports:** Check for port conflicts. If another service on the host is using the same port, the container will fail to start. Use `sudo lsof -i -P -n | grep LISTEN` to check for listening ports.
|
|
|
|
**Resolution:**
|
|
- Once the root cause is identified from the logs or configuration check, address the issue. This may involve:
|
|
- Correcting an environment variable.
|
|
- Fixing file permissions on a volume.
|
|
- Changing a port mapping.
|
|
- Restarting a dependency.
|
|
- After applying the fix, try starting the container again:
|
|
```bash
|
|
docker-compose up -d --force-recreate <service-name>
|
|
```
|
|
|
|
## Issue: 502 Bad Gateway from Traefik
|
|
|
|
**Symptoms:**
|
|
- Accessing a service through its domain (e.g., `https://books.3ddbrewery.com`) results in a "502 Bad Gateway" error from Traefik.
|
|
|
|
**Diagnosis:**
|
|
1. **Check the Traefik dashboard:** The Traefik dashboard (if accessible) provides a wealth of information about routers, services, and middleware. Look for any errors related to the service in question.
|
|
|
|
2. **Check Traefik's logs:**
|
|
```bash
|
|
docker logs traefik
|
|
```
|
|
Look for errors related to the service, such as "no servers found".
|
|
|
|
3. **Check the service's logs:**
|
|
```bash
|
|
docker-compose logs -f <service-name>
|
|
```
|
|
The service itself might be crashing or unhealthy.
|
|
|
|
4. **Check network connectivity:**
|
|
- Ensure the service is connected to the `traefik_proxy` network in its `docker-compose.yml`.
|
|
- From the Traefik container, try to ping the service's container.
|
|
```bash
|
|
docker exec -it traefik /bin/sh
|
|
ping <container_name>
|
|
```
|
|
|
|
5. **Check Traefik labels:**
|
|
- Ensure the `traefik.http.services.<service-name>.loadbalancer.server.port` label in the `docker-compose.yml` file is set to the correct port that the container is exposing.
|
|
- Verify that all Traefik labels are correctly formatted.
|
|
|
|
**Resolution:**
|
|
- **Service not on `traefik_proxy` network:** Add the service to the `traefik_proxy` network in its `docker-compose.yml`.
|
|
- **Incorrect port:** Correct the port in the `traefik.http.services.<service-name>.loadbalancer.server.port` label.
|
|
- **Service not running:** Troubleshoot the service using the "Container is restarting" guide above.
|
|
|
|
## Issue: 404 Not Found from Traefik
|
|
|
|
**Symptoms:**
|
|
- Accessing a service through its domain results in a "404 Not Found" error.
|
|
|
|
**Diagnosis:**
|
|
1. **Check the Traefik dashboard:** Verify that a router has been created for the domain you are trying to access.
|
|
2. **Check the `rule` label:** Ensure the `traefik.http.routers.<service-name>.rule` label is set to the correct `Host(...)`.
|
|
3. **Check DNS:** Make sure your DNS is correctly pointing the domain to the IP address of the Traefik server.
|
|
|
|
**Resolution:**
|
|
- **Incorrect rule:** Correct the `Host(...)` rule in the `docker-compose.yml` file.
|
|
- **DNS issue:** Correct the DNS record for the domain.
|
|
|
|
## Issue: Authentication Failures
|
|
|
|
**Symptoms:**
|
|
- Being unable to log in to a service that is protected by Authelia.
|
|
- Seeing "Unauthorized" or "Forbidden" errors.
|
|
|
|
**Diagnosis:**
|
|
1. **Check Authelia's logs:**
|
|
```bash
|
|
docker logs authelia
|
|
```
|
|
Look for any errors related to the authentication attempt.
|
|
|
|
2. **Check the application's logs:** The application might be rejecting the authentication for some reason.
|
|
```bash
|
|
docker-compose logs -f <service-name>
|
|
```
|
|
In the case of `books_webv2`, check the backend logs for any errors related to the `Remote-User` header.
|
|
|
|
3. **Check the Traefik middleware:** Ensure the `traefik.http.routers.<service-name>.middleware` label is correctly set to `authelia-brewery` or `authelia-fails`.
|
|
|
|
**Resolution:**
|
|
- **Restart Authelia:** Sometimes, simply restarting Authelia can resolve issues.
|
|
```bash
|
|
docker restart authelia
|
|
```
|
|
- **Check user credentials:** Double-check the username and password.
|
|
- **Check Authelia configuration:** Review Authelia's `configuration.yml` for any errors.
|
|
|
|
## Issue: MariaDB/MySQL Replication Stopped
|
|
|
|
**⚠️ CURRENT STATUS**: As of January 2026, `node` database replication has been **intentionally disabled**. All applications connect directly to the primary server (`192.168.1.251`). This section is retained for reference if replication is re-enabled in the future.
|
|
|
|
**Symptoms:**
|
|
- Secondary database server shows `Replica_IO_Running` or `Replica_SQL_Running` as `No`.
|
|
- `Seconds_Behind_Source` is not `0` or shows a large number.
|
|
- Applications using the secondary database have stale data.
|
|
|
|
**Diagnosis:**
|
|
1. **Check replication status on secondary server:** Connect to the secondary database server using phpMyAdmin or MySQL client and run:
|
|
```sql
|
|
SHOW REPLICA STATUS\G
|
|
```
|
|
Or for older versions:
|
|
```sql
|
|
SHOW SLAVE STATUS\G
|
|
```
|
|
|
|
2. **Check key fields:**
|
|
- `Replica_IO_Running`: Should be `Yes`
|
|
- `Replica_SQL_Running`: Should be `Yes`
|
|
- `Seconds_Behind_Source`: Should be `0`
|
|
- `Last_Error`: Should be empty - if there's an error here, it will indicate what went wrong
|
|
|
|
3. **Check primary server status:**
|
|
```sql
|
|
SHOW MASTER STATUS;
|
|
```
|
|
Note the `File` and `Position` values.
|
|
|
|
4. **Check binary log settings:** Ensure binary logging is enabled on the primary server:
|
|
```sql
|
|
SHOW VARIABLES LIKE 'log_bin';
|
|
```
|
|
|
|
**Resolution:**
|
|
|
|
**Common Fix - Restart Replication:**
|
|
```sql
|
|
-- On secondary server
|
|
STOP REPLICA;
|
|
START REPLICA;
|
|
SHOW REPLICA STATUS\G
|
|
```
|
|
|
|
**If there's a specific error:**
|
|
- **Skip one transaction (if error is known to be safe):**
|
|
```sql
|
|
STOP REPLICA;
|
|
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
|
|
START REPLICA;
|
|
```
|
|
**⚠️ Warning:** Only use this if you understand the error and know it's safe to skip.
|
|
|
|
**If replication is completely broken:**
|
|
- **Re-establish replication from current position:**
|
|
1. Get current position from primary:
|
|
```sql
|
|
-- On primary
|
|
SHOW MASTER STATUS;
|
|
```
|
|
2. Reset and reconfigure replica:
|
|
```sql
|
|
-- On secondary
|
|
STOP REPLICA;
|
|
CHANGE MASTER TO
|
|
MASTER_LOG_FILE='<file from primary>',
|
|
MASTER_LOG_POS=<position from primary>;
|
|
START REPLICA;
|
|
SHOW REPLICA STATUS\G
|
|
```
|
|
|
|
**Prevention:**
|
|
- Monitor replication status regularly
|
|
- Ensure both servers have sufficient disk space
|
|
- Check network connectivity between primary and secondary servers
|
|
- Review MariaDB error logs: `/var/log/mysql/error.log`
|