n8n service stuck — multiple pods scheduled but never reach ContainerCreating (likely stuck PVC on a node)

Region: hnd1 (Tokyo)
Service template: n8n (PREBUILT_V2)
Service ID: 68837633e1ea6f93dd5106ed
Environment ID: 688375f2c094e8c11f72c306

Hi team — my n8n service has been down for ~7 hours and I can't recover
it from the dashboard or via the API. Looking for help force-clearing
what looks like a stuck PVC on a node.

WHAT I DID
1. Used the dashboard "Change Image Version" dialog to upgrade
 n8nio/n8n from 1.123.26 → 1.123.46.
2. Image pulled successfully (40s). New pod started, then was killed
 ~2 seconds later. Another pod was scheduled immediately, also killed,
 and so on — a restart loop.
3. I tried clicking Restart and rolling the version back to 1.123.26
 from the dashboard a couple of times. After that the dashboard
 started showing "Stopping" / "Restarting" indefinitely.

WHAT THE RUNTIME LOG SHOWS NOW
The previous pod (866dfbd64f-nq2q8) received SIGTERM and shut down
normally. Then this sequence repeats — no progress past "Scheduled":

 Pod 65bdbd45cb-62cxl - Scheduled / Pulling / Pulled / Started / Killing
 Pod b67bf6678-dzbd9 - Scheduled (no Pulling / Created / Started)
 Pod 5697f7c4c4-77nqx - Scheduled (no Pulling / Created / Started)
 Pod 86c999dbc8-9tsdv - Scheduled (no Pulling / Created / Started)
 Pod 578db9b69d-qhcc5 - Scheduled (no Pulling / Created / Started)
 Pod 5587dbdb45-7p2hj - Scheduled (no Pulling / Created / Started)
 Pod 7b4fd7c754-pmdwg - Scheduled (no Pulling / Created / Started)

All assigned to node:
 ip-172-31-46-46.ap-northeast-1.compute.internal

WHAT I TRIED VIA API (no progress)
- suspendService → status went to SUSPENDED
- waited 3 minutes
- restartService → status went STARTING but pod never reaches
 ContainerCreating; healthz at the public URL has returned 502 the
 whole time
- Re-checked logs: still only "Scheduled" lines for the new pods,
 no Pulling / Created / Started

DIAGNOSIS (best guess)
Classic stuck-PVC pattern: a previous pod is in zombie Terminating
state on that node and the kubelet hasn't released the ReadWriteOnce
"data" volume (mount: /root/.n8n). Every new pod gets scheduled to
that node but can't ContainerCreate because the volume is still
attached to the zombie. The Zeabur control plane reports STARTING
but the kubelet is wedged.

WHAT I'M ASKING
Could someone please:
1. Force-delete the stuck pods (--grace-period=0 --force) so the
 PVC detaches, OR
2. Cordon/drain that node and let the pod reschedule elsewhere.

The data volume itself must NOT be deleted — it contains the n8n
encryption key, without which my existing credentials become
unreadable.

Postgres service (separate, ID 68837633e1ea6f93dd51070f) is healthy
and I've already triggered a manual backup of it as a safety net.

Thanks in advance — happy to provide anything else needed.
Forum

0 Replies