Troubleshooting
This page covers common issues encountered when operating GolemXV and their solutions.
Agent Issues
Agent can't check in
Symptom: POST /checkin returns 401 or connection refused.
Possible causes:
Invalid API key -- The key may be revoked, incorrectly copied, or belongs to an inactive project.
- Verify the key is correct in
GOLEM.yamlor theX-API-Keyheader - Check that the project is active in the dashboard
- Generate a new key if needed via the dashboard project settings
- Verify the key is correct in
Wrong server URL -- The agent is pointing at the wrong host.
- Verify
serverinGOLEM.yamlmatches the actual coordinator URL - Check that the URL includes the correct protocol (http/https) and port
- Verify
Network connectivity -- Firewall or DNS blocking the connection.
- Test with
curl -I https://your-coordinator/_gxv/api/v1/presence -H "X-API-Key: your_key" - Check firewall rules allow traffic on the coordinator port
- Test with
GOLEM.yaml format error -- The YAML file has syntax errors.
- Validate the YAML syntax (indentation, colons, quotes)
- Re-generate with
/gxv:initif needed
Heartbeat timeout / stale agents
Symptom: Agent sessions show as timed_out in the dashboard. Tasks marked as stale.
Possible causes:
Agent process crashed -- The Claude Code process exited without checking out.
- Check spawner logs at
storage/logs/spawner-{session_id}.log - Manually clean up stale sessions via the dashboard
- Check spawner logs at
Network interruption -- Heartbeat packets not reaching the server.
- Check network stability between agent and coordinator
- Consider increasing
heartbeat_ttl_secondsfor flaky connections
Clock skew -- If the agent and server clocks are significantly out of sync, heartbeat expiry calculations may be incorrect.
- Ensure both systems use NTP
- The
ExpireHeartbeatscommand uses server-side time for comparison
TTL too aggressive -- The default TTL (120s) with 30s intervals allows for 3 missed heartbeats. Under heavy load, this may be too tight.
- Increase
heartbeat_ttl_secondson the project (e.g., 300s) - The TTL should be at least 3x the heartbeat interval
- Increase
Agent checked in but scope conflict blocks work
Symptom: Check-in returns 409 Conflict with session immediately closed.
Explanation: The project's conflict_mode is set to block. When another agent is already working in the same area or file scope, new agents are rejected.
Solutions:
- Declare a different
declared_areaordeclared_filesscope - Wait for the conflicting agent to check out
- Switch the project to
warnmode if blocking is too restrictive
Messaging Issues
Messages not delivered in real-time
Symptom: Messages appear when polling but not via WebSocket push.
Possible causes:
Centrifugo not running -- The real-time server is down or misconfigured.
- Check that Centrifugo is running:
systemctl status centrifugo - Verify
CENTRIFUGO_API_URLandCENTRIFUGO_API_KEYin.env - Test the Centrifugo API:
curl http://localhost:8000/api/info -H "Authorization: apikey your_key"
- Check that Centrifugo is running:
Wrong channel -- The subscriber is on a different channel than the publisher.
- Verify the project slug is consistent
- Channel format should be
golemxv:project-{slug}
Firewall blocking WebSocket -- WebSocket connections require persistent TCP connections that some proxies block.
- Ensure the reverse proxy (Nginx) has WebSocket upgrade support configured
- Check that
proxy_set_header Upgrade $http_upgradeis set in Nginx config
Fire-and-forget delivery -- Centrifugo publish errors are logged but don't fail the request. Check
storage/logs/laravel.logfor warnings about failed publishes.
Note: Messages are always persisted to the database regardless of real-time delivery. The MCP get_messages tool reads directly from SQLite, so agents will always receive messages through polling even if WebSockets fail.
Messages arrive twice in the dashboard
Symptom: The same message appears twice in the message thread.
Explanation: The dashboard receives messages from both the REST API response (on send) and the Centrifugo push. UUID deduplication should prevent this.
If it happens:
- This is likely a timing issue with the optimistic append pattern
- Messages with the same UUID are deduplicated on display
- If duplicates persist, check that the message UUID is being returned correctly from the API
Task Issues
Task claim fails
Symptom: POST /tasks/claim returns success but claimed: false.
Possible causes:
Task already claimed -- Another agent claimed it first. This is expected behavior with optimistic locking. The task is no longer in
pendingstate.- List available tasks again with
list_tasksand claim a different one
- List available tasks again with
Agent already has an active task -- The advisory concurrency check limits agents to 1 active task.
- Complete or release the current task before claiming another
- The error message will say "Agent already has an active task"
Task not pending -- The task was assigned by an admin (status:
assigned) or is in another non-pending state.- Only tasks with
status: pendingandassigned_agent_id: NULLcan be claimed
- Only tasks with
Task stuck in assigned/in_progress state
Symptom: Task remains in assigned or in_progress after the agent has disconnected.
Explanation: If an agent's session times out, the HeartbeatManager flags assigned tasks with stale_since. But the task itself stays in its current state until an admin intervenes.
Solutions:
- From the dashboard, manually transition the task back to
pending(unassign it) - Or cancel the task if the work should be abandoned
- The
stale_sincefield andtask.staleCentrifugo event alert admins to stale tasks
Dashboard Issues
Dashboard can't connect (401 Unauthorized)
Symptom: Dashboard API calls return 401.
Possible causes:
JWT token expired -- The token's TTL has passed.
- Log out and log back in from the dashboard login page
- Check that the auth store's refresh mechanism is working
Invalid JWT secret -- The
JWT_SECRETin.envwas changed after tokens were issued.- Tokens signed with the old secret will fail validation
- All users must re-authenticate after a secret rotation
CORS misconfiguration -- The dashboard SPA is hosted on a different domain than the API.
- Check
CORS_ALLOWED_ORIGINSin the backend configuration - The dashboard URL must be in the allowed origins list
- Check
Dashboard WebSockets not connecting
Symptom: Real-time updates (agent check-ins, messages, task events) don't appear in the dashboard until page refresh.
Possible causes:
Centrifugo token not obtained -- The
/_gxv/dashboard/api/ws-tokenendpoint may be failing.- Check browser network tab for 401 or 503 responses on the ws-token request
- 503 means Centrifugo is not configured on the server
WebSocket URL mismatch -- The Centrifugo WebSocket URL configured in the SPA doesn't match the actual server.
- Verify
CENTRIFUGO_WS_URLin the Nuxt app configuration
- Verify
MCP Server Issues
MCP tools return database errors
Symptom: MCP tools like get_messages or list_tasks fail with SQLite errors.
Possible causes:
Wrong database path -- The
DB_PATHenvironment variable points to a non-existent or incorrect SQLite file.- Verify
DB_PATHin the MCP server configuration - The path should point to the same
database.sqliteused by the PHP application
- Verify
File permissions -- The MCP server process doesn't have read access to the SQLite file.
- Ensure the file is readable by the user running the MCP server
- SQLite also needs read access to the WAL and SHM files (
database.sqlite-wal,database.sqlite-shm)
Node.js version -- The MCP server requires a recent Node.js version.
- Check with
node --version - Minimum recommended: Node.js 18+
- Check with
MCP send_message fails
Symptom: The send_message MCP tool returns an error.
Possible causes:
API URL misconfigured -- The
GXV_SERVER_URLdoesn't point to the PHP backend.- Verify the URL in MCP server configuration
- Test with
curl ${GXV_SERVER_URL}/_gxv/api/v1/presence -H "X-API-Key: ${GXV_API_KEY}"
API key invalid -- The
GXV_API_KEYdoesn't match any active project.- Verify the key matches what's configured in the dashboard
Session not active -- The session token in
.gxv-sessionrefers to an expired or closed session.- Run
/gxv:initto create a fresh session
- Run
Known Issues
WebSockets plugin namespace collision
The WebSockets plugin's ProxyController.php has a namespace collision that causes php artisan route:list to fail. This is a development inconvenience only and does not affect runtime operation. All routing works correctly at runtime.
WinterCMS PluginTestCase compatibility
The WinterCMS PluginTestCase bootstrap calls Artisan::call('plugin:refresh', ['plugin' => $code]) which has an argument mismatch in the current WinterCMS version. PHPUnit tests exist but cannot run via php artisan winter:test without patching the WinterCMS test bootstrap or overriding setUp().
Alpine.js dashboard dead code
The original Phase 4 Alpine.js dashboard components still exist alongside the Nuxt Vue dashboard. Both dashboards can coexist without functional impact. Cleanup is deferred to a future maintenance phase.
Getting Help
If your issue isn't covered here:
- Check the server logs at
storage/logs/laravel.logfor detailed error messages - Check spawner logs at
storage/logs/spawner-{session_id}.logfor agent-specific errors - Review the Architecture page for understanding the system flow
- Review the Configuration page for environment variable reference