Troubleshooting
Order your debugging from the outside in: confirm the record is discoverable, then the answer returns, then the handshake completes, then the upstream responds. Most dial failures stop at one of the first two steps.
Before you start, re-run the daemon with verbose logs so the rest of these steps have context:
RUST_LOG=openhost_daemon=debug,openhost_pkarr=debug openhostd run“The DHT / relays can’t find my record”
Section titled ““The DHT / relays can’t find my record””Symptom: openhost-resolve oh://<pubkey>/ returns no record, or openhost-dial never moves past “dialing”.
Check:
-
Look for a successful initial publish in the daemon log:
INFO openhost-pkarr: initial publish succeeded attempt=1If you see
initial publish retries exhaustedinstead, every configured relay is rejecting the record or unreachable. Check your outbound network. -
Confirm the public relays accept your pubkey. The bundled defaults are
https://pkarr.pubky.appandhttps://relay.iroh.network. Run the resolver with a single explicit relay to isolate:Terminal window openhost-resolve oh://<pubkey>/ --relay https://pkarr.pubky.app --fast--fastskips the 1.5 s grace window so you get a clean success/failure. -
Verify clock skew. The protocol enforces a ±2-hour freshness window on records (
spec/01-wire-format.md §3). A machine with a badly-set clock will publish records the resolver immediately rejects.timedatectl status(Linux) orsntp -sS pool.ntp.org(macOS) both work.
”A relay is returning 5xx”
Section titled “”A relay is returning 5xx””Symptom: The publisher logs warn! lines with HTTP 500/502/503 from the relay host.
The bundled default relays are shared public infrastructure, rate-limited and occasionally unavailable. Override them in ~/.config/openhost/daemon.toml:
[pkarr]relays = [ "https://your-own-relay.example.com",]Mainline DHT publishing still happens regardless of the relay list; relays are a convenience for faster lookup. A daemon with relays = [] publishes only to the DHT and is still discoverable — just slower.
”The client times out with PollAnswerTimeout”
Section titled “”The client times out with PollAnswerTimeout””Symptom: openhost-dial errors with openhost-dial: failed to round-trip request: PollAnswerTimeout(30) (or whatever --timeout you passed).
Three common causes, in order of likelihood:
-
The client’s pubkey isn’t in the daemon’s watched list. The daemon only polls offer records under pubkeys listed in
pkarr.offer_poll.watched_clients. If you’re using the ephemeral keypairopenhost-dialgenerates, that pubkey changes every invocation — add it explicitly or switch to a persisted identity via--identity <path>. -
Allowlist enforcement is on and the client isn’t paired. Check
openhostd pair list; the client pubkey must appear. Withenforce_allowlist = true(the default as ofv0.1.0), an unpaired offer is silently dropped after unseal, and no answer is ever produced. -
The residual answer-size gap. Real WebRTC answer SDPs — after full ICE trickle — still exceed the BEP44 1000-byte packet budget on some configurations, even with PR #15’s fragmentation. The daemon produces the answer and queues it, but the publisher evicts it. You can confirm by checking the daemon log for:
WARN openhost-pkarr: answer entry evicted — packet would exceed BEP44 1000-byte limitThis is tracked as the next line item after Phase 2 in
ROADMAP.md; there is no clean workaround atv0.1.0.
”The DTLS handshake fails”
Section titled “”The DTLS handshake fails””Symptom: Daemon log shows webrtc error: handshake failed, or the client gets an openhost-dial: WebRtcSetup error after a successful poll.
Check:
-
Fingerprint pin agrees on both sides. The resolved record’s
dtls_fpmust equal the daemon’s own “up” line:Terminal window openhost-resolve oh://<pubkey>/ | grep dtls_fpCompared with the daemon’s:
INFO openhost_daemon::app: openhostd: up … dtls_fp=AB:CD:…A mismatch usually means the client resolved a cached or stale record — retry with
--fastto skip the grace window and pick up the freshest. -
Cert rotation crossed mid-dial.
dtls.rotate_secsdefaults to 86400 (24 h). If your daemon rotated between the resolver fetching the record and the handshake starting, the fingerprint won’t match. Retry. -
UDP traffic is blocked. WebRTC needs outbound UDP to the STUN servers and to the eventual peer. Corporate / hotel networks sometimes drop it. A quick test:
nc -u -v stun.l.google.com 19302from both sides.
”openhostd pair add doesn’t seem to take effect”
Section titled “”openhostd pair add doesn’t seem to take effect””Symptom: Paired a client, but the daemon still rejects their offers.
The pair-DB file watcher reloads the allow list within ~250 ms; look for this on the daemon side:
INFO openhost_daemon::pair_watcher: openhostd: pair-DB file watcher armedINFO openhost_daemon::app: openhostd: pairing DB reloaded; republishing source=file-watcherIf you see the “armed” line but never the “reloaded” line, the watcher is running but not seeing file events. Two common reasons:
- Network filesystem. inotify (Linux) and FSEvents (macOS) do not fire reliably on NFS, SMB, or FUSE mounts. If
~/.config/openhost/allow.tomlis on a remote filesystem, move the pair DB to a local path viapairing.db_pathin your config, or fall back to SIGHUP on Unix (kill -HUP $(pgrep openhostd)) / a restart on Windows. - Spawn failure. If the watcher never armed, you’ll see a
warn!at daemon startup:pair-DB file watcher could not be started. Check the path exists and the parent directory is writable.
Still stuck
Section titled “Still stuck”Capture the RUST_LOG=openhost_daemon=debug,openhost_pkarr=debug output from the daemon plus the exact openhost-dial invocation and open an issue on GitHub. Bug reports that include the openhost-resolve --json output for your host are dramatically easier to act on.