silverbullet-notes/docs/06-troubleshooting.md

# Troubleshooting Guide

_Last updated: 2026-01-05_

This guide provides solutions to common issues encountered in this Docker-based infrastructure.

## Issue: Container is restarting or won't start

**Symptoms:**
- `docker ps` shows the container is `restarting` or `exited`.
- `docker-compose up -d` command fails with an error.

**Diagnosis:**
1.  **Check the logs:** The first step is always to check the container's logs.
    ```bash
    docker-compose logs -f <service-name>
    ```
    Look for error messages, stack traces, or any indication of what might be wrong.

2.  **Check dependencies:** If the container depends on other services (e.g., a database), ensure those services are running and healthy.
    ```bash
    docker-compose ps
    ```

3.  **Check configuration:**
    -   **Environment variables:** Ensure all required environment variables are set correctly in the `.env` file or `docker-compose.yml`.
    -   **Volumes:** Verify that all volume paths are correct and that the files and directories on the host have the correct permissions. The user running the Docker container (often specified with `PUID` and `PGID`) needs to have read and write access to the volume paths.
    -   **Ports:** Check for port conflicts. If another service on the host is using the same port, the container will fail to start. Use `sudo lsof -i -P -n | grep LISTEN` to check for listening ports.

**Resolution:**
-   Once the root cause is identified from the logs or configuration check, address the issue. This may involve:
    -   Correcting an environment variable.
    -   Fixing file permissions on a volume.
    -   Changing a port mapping.
    -   Restarting a dependency.
-   After applying the fix, try starting the container again:
    ```bash
    docker-compose up -d --force-recreate <service-name>
    ```

## Issue: 502 Bad Gateway from Traefik

**Symptoms:**
-   Accessing a service through its domain (e.g., `https://books.3ddbrewery.com`) results in a "502 Bad Gateway" error from Traefik.

**Diagnosis:**
1.  **Check the Traefik dashboard:** The Traefik dashboard (if accessible) provides a wealth of information about routers, services, and middleware. Look for any errors related to the service in question.

2.  **Check Traefik's logs:**
    ```bash
    docker logs traefik
    ```
    Look for errors related to the service, such as "no servers found".

3.  **Check the service's logs:**
    ```bash
    docker-compose logs -f <service-name>
    ```
    The service itself might be crashing or unhealthy.

4.  **Check network connectivity:**
    -   Ensure the service is connected to the `traefik_proxy` network in its `docker-compose.yml`.
    -   From the Traefik container, try to ping the service's container.
        ```bash
        docker exec -it traefik /bin/sh
        ping <container_name>
        ```

5.  **Check Traefik labels:**
    -   Ensure the `traefik.http.services.<service-name>.loadbalancer.server.port` label in the `docker-compose.yml` file is set to the correct port that the container is exposing.
    -   Verify that all Traefik labels are correctly formatted.

**Resolution:**
-   **Service not on `traefik_proxy` network:** Add the service to the `traefik_proxy` network in its `docker-compose.yml`.
-   **Incorrect port:** Correct the port in the `traefik.http.services.<service-name>.loadbalancer.server.port` label.
-   **Service not running:** Troubleshoot the service using the "Container is restarting" guide above.

## Issue: 404 Not Found from Traefik

**Symptoms:**
-   Accessing a service through its domain results in a "404 Not Found" error.

**Diagnosis:**
1.  **Check the Traefik dashboard:** Verify that a router has been created for the domain you are trying to access.
2.  **Check the `rule` label:** Ensure the `traefik.http.routers.<service-name>.rule` label is set to the correct `Host(...)`.
3.  **Check DNS:** Make sure your DNS is correctly pointing the domain to the IP address of the Traefik server.

**Resolution:**
-   **Incorrect rule:** Correct the `Host(...)` rule in the `docker-compose.yml` file.
-   **DNS issue:** Correct the DNS record for the domain.

## Issue: Authentication Failures

**Symptoms:**
-   Being unable to log in to a service that is protected by Authelia.
-   Seeing "Unauthorized" or "Forbidden" errors.

**Diagnosis:**
1.  **Check Authelia's logs:**
    ```bash
    docker logs authelia
    ```
    Look for any errors related to the authentication attempt.

2.  **Check the application's logs:** The application might be rejecting the authentication for some reason.
    ```bash
    docker-compose logs -f <service-name>
    ```
    In the case of `books_webv2`, check the backend logs for any errors related to the `Remote-User` header.

3.  **Check the Traefik middleware:** Ensure the `traefik.http.routers.<service-name>.middleware` label is correctly set to `authelia-brewery` or `authelia-fails`.

**Resolution:**
-   **Restart Authelia:** Sometimes, simply restarting Authelia can resolve issues.
    ```bash
    docker restart authelia
    ```
-   **Check user credentials:** Double-check the username and password.
-   **Check Authelia configuration:** Review Authelia's `configuration.yml` for any errors.

## Issue: MariaDB/MySQL Replication Stopped

**⚠️ CURRENT STATUS**: As of January 2026, `node` database replication has been **intentionally disabled**. All applications connect directly to the primary server (`192.168.1.251`). This section is retained for reference if replication is re-enabled in the future.

**Symptoms:**
-   Secondary database server shows `Replica_IO_Running` or `Replica_SQL_Running` as `No`.
-   `Seconds_Behind_Source` is not `0` or shows a large number.
-   Applications using the secondary database have stale data.

**Diagnosis:**
1.  **Check replication status on secondary server:** Connect to the secondary database server using phpMyAdmin or MySQL client and run:
    ```sql
    SHOW REPLICA STATUS\G
    ```
    Or for older versions:
    ```sql
    SHOW SLAVE STATUS\G
    ```

2.  **Check key fields:**
    -   `Replica_IO_Running`: Should be `Yes`
    -   `Replica_SQL_Running`: Should be `Yes`
    -   `Seconds_Behind_Source`: Should be `0`
    -   `Last_Error`: Should be empty - if there's an error here, it will indicate what went wrong

3.  **Check primary server status:**
    ```sql
    SHOW MASTER STATUS;
    ```
    Note the `File` and `Position` values.

4.  **Check binary log settings:** Ensure binary logging is enabled on the primary server:
    ```sql
    SHOW VARIABLES LIKE 'log_bin';
    ```

**Resolution:**

**Common Fix - Restart Replication:**
```sql
-- On secondary server
STOP REPLICA;
START REPLICA;
SHOW REPLICA STATUS\G
```

**If there's a specific error:**
-   **Skip one transaction (if error is known to be safe):**
    ```sql
    STOP REPLICA;
    SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
    START REPLICA;
    ```
    **⚠️ Warning:** Only use this if you understand the error and know it's safe to skip.

**If replication is completely broken:**
-   **Re-establish replication from current position:**
    1.  Get current position from primary:
        ```sql
        -- On primary
        SHOW MASTER STATUS;
        ```
    2.  Reset and reconfigure replica:
        ```sql
        -- On secondary
        STOP REPLICA;
        CHANGE MASTER TO
            MASTER_LOG_FILE='<file from primary>',
            MASTER_LOG_POS=<position from primary>;
        START REPLICA;
        SHOW REPLICA STATUS\G
        ```

**Prevention:**
-   Monitor replication status regularly
-   Ensure both servers have sufficient disk space
-   Check network connectivity between primary and secondary servers
-   Review MariaDB error logs: `/var/log/mysql/error.log`