Deployment
Production setup, Docker, reverse proxy, monitoring, and HA considerations
Production Checklist
Use gunicorn (not Flask dev server) — omit
--devflagSet
server.external_urlto the public HTTPS URLUse a strong
admin_api.token_secret(32+ random bytes)Store secrets in environment variables, not config files
Set
database.sslmode: requireorverify-fullEnable
database.auto_setup: truefor first deploy, then disableSet appropriate
server.workers(2-4 x CPU cores)Configure
server.max_requeststo restart workers periodicallyEnable rate limiting (
security.rate_limits.enabled: true)Set
logging.format: jsonfor structured log ingestionRestrict CA private key file permissions to
0400Put ACMEEH behind a reverse proxy for TLS termination
Enable CRL for revocation checking
CLI Reference
All operations are invoked through the acmeeh module:
python -m acmeeh -c CONFIG [options] [command]
Global Flags
Flag |
Description |
|---|---|
|
Configuration file path (required) |
|
Enable debug output with full tracebacks |
|
Validate config and exit |
|
Use Flask development server |
|
Show version |
Subcommands
Command |
Description |
|---|---|
|
Start the server (default action when no subcommand is given) |
|
Check database connectivity and schema |
|
Run database schema migration |
|
Test CA backend with an ephemeral CSR |
|
Force CRL rebuild (requires |
|
Create an admin user |
|
Inspect order with authorizations and challenges |
|
Inspect certificate details |
|
Inspect account with contacts and order count |
Tip
Quick Validation
Use --validate-only in CI/CD pipelines to verify configuration changes before deploying:
python -m acmeeh -c /etc/acmeeh/config.yaml --validate-only
Environment Variable Substitution
ACMEEH config files support environment variable references that are resolved before JSON Schema validation. This allows you to keep secrets out of config files entirely.
database:
password: ${DB_PASSWORD}
host: ${DB_HOST:-localhost}
port: ${DB_PORT:-5432}
Syntax |
Behavior |
|---|---|
|
Required — startup fails if the variable is not set |
|
Uses the default value if the variable is not set |
Environment variables are resolved during additional_checks() in the config class, which runs after YAML parsing but before JSON Schema validation. This means the substituted values are still subject to full schema validation.
Gunicorn Configuration
ACMEEH runs gunicorn in production mode. All gunicorn settings are configured via YAML:
server:
external_url: https://acme.example.com
bind: 0.0.0.0
port: 8443
workers: 8 # 2-4x CPU cores
worker_class: sync
timeout: 30
graceful_timeout: 30
keepalive: 2
max_requests: 1000 # restart workers after 1000 requests
max_requests_jitter: 50 # add randomness to prevent thundering herd
Start the production server:
PYTHONPATH=src DB_PASSWORD=secret python -m acmeeh -c /etc/acmeeh/config.yaml
WSGI Entry Point
For advanced deployments you can bypass the python -m acmeeh wrapper and use gunicorn (or any WSGI server) directly via the WSGI entry point:
export ACMEEH_CONFIG=/etc/acmeeh/config.yaml
gunicorn "acmeeh.server.wsgi:app"
This is useful when you need full control over gunicorn flags (e.g., --preload, custom logging config, or --certfile / --keyfile for direct TLS). The ACMEEH_CONFIG environment variable tells the WSGI module where to find the configuration file.
Docker
ACMEEH ships with a production-ready Dockerfile, docker-compose.yaml,
and fully parameterized docker/config.yaml. See the Docker page
for the complete guide, including build ARGs, environment variables, and
common operations.
Quick start:
cp docker/.env.example .env # set POSTGRES_PASSWORD
mkdir -p certs # place root.pem + root-key.pem
docker compose up -d
curl http://localhost:8443/livez
Reverse Proxy Setup
ACMEEH should sit behind a reverse proxy that handles TLS termination. Enable proxy support in config:
proxy:
enabled: true
trusted_proxies:
- 172.16.0.0/12
- 10.0.0.0/8
Nginx Example
upstream acmeeh {
server 127.0.0.1:8443;
}
server {
listen 443 ssl http2;
server_name acme.example.com;
ssl_certificate /etc/nginx/tls/cert.pem;
ssl_certificate_key /etc/nginx/tls/key.pem;
location / {
proxy_pass http://acmeeh;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# ACME clients may send large JWS payloads
client_max_body_size 64k;
}
}
Caddy Example
acme.example.com {
reverse_proxy localhost:8443
}
Health Check Endpoints
ACMEEH exposes three health check endpoints designed for container orchestrators, load balancers, and monitoring systems.
GET /livez — Liveness Probe
Minimal liveness check. Returns 200 OK if the process is running and able to serve HTTP. No backend checks are performed.
{
"alive": true,
"version": "1.0.0"
}
Use this for Kubernetes liveness probes or basic load balancer health checks.
GET /healthz — Comprehensive Health Check
Deep health check that verifies all subsystems. Returns 200 OK when all components are healthy, or 503 Service Unavailable if the database, CA backend, or CRL subsystem is unhealthy.
{
"status": "ok",
"checks": {
"database": {
"status": "ok",
"pool": {
"size": 10,
"available": 8,
"waiting": 0
}
},
"ca_backend": { "status": "ok" },
"crl": { "status": "ok", "stale": false },
"workers": {
"challenge": true,
"cleanup": true,
"expiration": true
},
"smtp": { "status": "ok" },
"dns_resolver": { "status": "ok" }
},
"shutting_down": false
}
When the connection pool is exhausted (all connections in use), the database check is skipped to avoid blocking the health probe, and the response reports "database": "pool_exhausted" with "status": "degraded":
{
"status": "degraded",
"checks": {
"database": "pool_exhausted",
"ca_backend": { "status": "ok" },
...
},
"shutting_down": false
}
Note
503 Triggers
The /healthz endpoint returns 503 if any of the following are unhealthy: database (including pool_exhausted), ca_backend, or crl (when CRL is enabled and stale). Non-critical subsystems like SMTP and DNS resolver are reported but do not affect the HTTP status code.
GET /readyz — Readiness Probe
Kubernetes readiness probe. Returns 200 OK when the server is ready to accept traffic, or 503 Service Unavailable with a reason when it is not.
Success response:
{
"ready": true
}
Failure responses:
{
"ready": false,
"reason": "database unavailable"
}
When the connection pool is critically exhausted:
{
"ready": false,
"reason": "Connection pool exhausted",
"pool": { "size": 20, "available": 0, "waiting": 5 }
}
Use this for Kubernetes readiness probes so that traffic is only routed to instances that have completed startup and can serve requests.
Signal Handling & Graceful Shutdown
ACMEEH handles Unix signals for clean lifecycle management.
SIGTERM / SIGINT — Graceful Shutdown
Sending SIGTERM or SIGINT initiates a graceful shutdown sequence:
The server stops accepting new connections
In-flight requests are allowed to complete for up to
server.graceful_timeoutsecondsChallenges in
PROCESSINGstate are drained back toPENDINGso they will be retried on next startupBackground workers (challenge, cleanup, expiration) stop cleanly after their current cycle
Database connection pool is drained and closed
SIGHUP — Config Hot-Reload (Unix only)
Sending SIGHUP triggers a live configuration reload without restarting the process. Only a subset of settings can be safely reloaded at runtime:
Safely Reloaded |
Requires Restart |
|---|---|
|
CA backend settings |
|
Database settings |
|
Server bind/port/workers |
|
Challenge types |
Warning
Reload Limitations
CA backend, database, server, and challenge type settings are not reloaded by SIGHUP. Changes to these settings require a full process restart.
Background Workers
ACMEEH runs three background workers that perform periodic maintenance tasks. Each worker operates independently and uses PostgreSQL advisory locks for leader election in multi-instance deployments.
Challenge Worker
Reprocesses challenges that have been stuck in PROCESSING state beyond a configurable threshold. This handles cases where a validation attempt was interrupted (e.g., by a restart or crash).
challenges:
background_worker:
enabled: false # default: false
poll_seconds: 10 # how often to check for stale challenges
stale_seconds: 300 # age threshold before a PROCESSING challenge is retried
Uses PostgreSQL advisory lock ID 712003.
Cleanup Worker
Runs multiple independent maintenance tasks, each on its own interval:
Nonce garbage collection —
nonce.gc_interval_seconds(default: 300)Order expiry —
order.cleanup_interval_seconds(default: 3600)Stale processing recovery —
order.stale_processing_threshold_seconds(default: 600)Audit log retention — purges old audit records per configured retention period
Rate limit GC — cleans up expired rate limit entries
Authorization/challenge/order/notice retention — removes expired records per configured retention periods
Uses PostgreSQL advisory lock ID 712001.
Expiration Worker
Sends certificate expiration warning notifications to account contacts when certificates approach their expiry date.
notifications:
expiration_warning_days: [30, 14, 7, 1]
expiration_check_interval_seconds: 3600
Uses PostgreSQL advisory lock ID 712002. Deduplicates notifications via the certificate_expiration_notices database table so that each warning is sent only once per certificate per threshold.
Note
HA Leader Election
In multi-instance deployments, all three workers use PostgreSQL advisory locks for leader election. Only one instance runs each worker at a time. No additional coordination (e.g., Redis, ZooKeeper) is needed — the database handles it.
Email Notifications
ACMEEH can send email notifications for certificate expiration warnings and other events. Notifications are recorded in the database and optionally delivered via SMTP.
Notification Configuration
notifications:
enabled: true
expiration_warning_days: [30, 14, 7, 1]
expiration_check_interval_seconds: 3600
max_retries: 3
retry_delay_seconds: 60
retry_backoff_multiplier: 2.0
retry_max_delay_seconds: 3600
batch_size: 50
SMTP Configuration
smtp:
enabled: true
host: smtp.example.com
port: 587
use_tls: true
username: acmeeh@example.com
password: ${SMTP_PASSWORD}
from_address: acmeeh@example.com
cc: [] # addresses CC'd on every notification
bcc: [] # addresses BCC'd (envelope only)
timeout_seconds: 30
templates_path: /etc/acmeeh/templates # optional custom Jinja2 templates
Graceful Degradation
The notification system degrades gracefully depending on configuration:
Scenario |
Behavior |
|---|---|
|
Complete no-op — no notifications recorded, no emails sent |
|
Notifications are recorded in the database (audit trail) but not emailed |
SMTP delivery failure |
Notification marked as |
Maintenance Mode
ACMEEH supports a maintenance mode via the admin API that allows you to gracefully pause new certificate issuance during planned upgrades or CA maintenance windows.
Enabling Maintenance Mode
Enable via the admin API (requires admin authentication):
# Enable maintenance mode
curl -X POST https://acme.example.com/api/maintenance \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"enabled": true}'
# Check current status
curl https://acme.example.com/api/maintenance \
-H "Authorization: Bearer <admin-token>"
Behavior During Maintenance
Operation |
Behavior |
|---|---|
New order creation |
Returns |
Pre-authorization creation |
Returns |
Order finalization |
Allowed (in-progress orders can complete) |
Challenge validation |
Allowed (in-progress challenges can complete) |
Certificate downloads |
Allowed |
Account operations |
Allowed |
Tip
Planned Upgrades
Enable maintenance mode before a planned upgrade, perform the upgrade, then disable maintenance mode. ACME clients that respect the Retry-After header will automatically retry after the maintenance window.
Database Sizing
Scale |
Certificates |
Connections |
Disk |
|---|---|---|---|
Small |
< 1,000 |
|
100 MB |
Medium |
1,000 - 50,000 |
|
1 GB |
Large |
50,000+ |
|
10+ GB |
Tip
Connection Pool
Set database.max_connections to roughly server.workers x 2. PostgreSQL’s default max_connections is 100, which is usually sufficient.
Connection Pool Pressure Guard
ACMEEH automatically sheds load when the database connection pool is under pressure using a four-tier model that runs on every request before any database work:
Tier |
Retry-After |
Condition |
|---|---|---|
Growth headroom |
(allowed) |
Pool has not reached |
Exhausted |
|
All connections in use ( |
Critical |
|
Available connections at or below 30% of pool max (or 10% for pools > 20). Hard reject. |
Pressure |
|
Available connections at or below 50% of pool max (or 20% for pools > 20) and requests are waiting. Soft reject. |
Health check endpoints (/livez, /healthz, /readyz) are always exempt from this guard so that monitoring remains functional even during pool exhaustion.
This is transparent to ACME clients — well-behaved clients will retry after the Retry-After delay. If you see frequent 503 responses in logs, increase database.max_connections or add more ACMEEH instances.
Note
Recovery Probes
When the pool is fully exhausted, the guard periodically allows a single request through (every 2 seconds) to prevent a deadlock where all connections are held by in-flight requests that cannot complete because the guard rejects every new request. This ensures the pool can eventually drain.
Monitoring
Prometheus Metrics
Enable the built-in metrics endpoint:
metrics:
enabled: true
path: /metrics
auth_required: false
Scrape https://acme.example.com/metrics with Prometheus. The following metrics are exposed:
Metric |
Type |
Description |
|---|---|---|
|
gauge |
Server uptime in seconds |
|
counter |
Total accounts created |
|
counter |
Total accounts deactivated |
|
counter |
Total certificates issued |
|
counter |
Total certificates revoked |
|
counter |
Total orders created |
|
counter |
Challenge validations (labeled: success, retry, failure) |
|
counter |
Total challenges expired |
|
counter |
Challenge worker poll cycles |
|
counter |
Challenge worker errors |
|
counter |
Cleanup task runs (labeled by task name) |
|
counter |
Cleanup task errors (labeled by task name) |
|
counter |
Expiration warnings sent |
|
counter |
Expiration worker errors |
|
counter |
CA signing errors |
|
counter |
Total HTTP requests (labeled by method and status code) |
Structured Logging
Set logging.format: json to output structured JSON logs suitable for log aggregation systems (ELK, Loki, Splunk):
logging:
level: INFO
format: json
audit:
enabled: true
file: /var/log/acmeeh/audit.log
High Availability
ACMEEH is stateless at the application layer — all state is in PostgreSQL. This means you can run multiple instances behind a load balancer.
Multi-Instance Setup
Deploy 2+ ACMEEH instances with the same config (same
external_url)Point all instances at the same PostgreSQL database
Load balance across instances (round-robin or least-connections)
Use PostgreSQL replication for database HA
Note
CRL Worker
The CRL rebuild worker, like all background workers, uses PostgreSQL advisory locks for leader election. Only one instance runs the CRL worker at a time, regardless of how many ACMEEH instances are deployed. No additional coordination is needed.
Backup & Recovery
Database: Regular
pg_dumpbackups. The database contains all accounts, orders, certificates, and audit logs.CA Keys: Back up the root CA private key securely (encrypted, offline). Loss of the CA key means you cannot issue new certificates or rebuild CRLs.
Configuration: Version-control your config YAML (excluding secrets which should be in env vars).
Systemd Service
[Unit]
Description=ACMEEH ACME Server
After=network.target postgresql.service
[Service]
Type=simple
User=acmeeh
Group=acmeeh
WorkingDirectory=/opt/acmeeh
Environment=PYTHONPATH=src
EnvironmentFile=/etc/acmeeh/env
ExecStart=/opt/acmeeh/.venv/bin/python -m acmeeh -c /etc/acmeeh/config.yaml
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Create /etc/acmeeh/env with your secrets:
DB_PASSWORD=your-database-password
ADMIN_TOKEN_SECRET=your-jwt-secret