Recently, due to the deprecation of the existing zbpack v1 infrastructure, Zeabur has been rolling out the next-generation build system (zbpack v2) to all users. We are aware that this migration introduced certain compatibility issues. This post explains the reasons, how we addressed them, and how you can mitigate the issues if you encounter them.
Here is how the build infrastructure behind Zeabur works:
Registry v1 was built using the distribution/distribution registry. A number of issues with it prompted us to migrate to a self-designed registry v2:
blob unknown
and reject the manifest. As a result, images couldn’t be pulled at startup.distribution/distribution
favors global blob deduplication, which prevents us from knowing which blobs are unused without reading all manifests. The official garbage collection tool requires stop‑the‑world downtime. At Zeabur’s registry scale this was unworkable, causing a huge accumulation of blobs in R2 and severe performance issues.We designed registry v2 to build OCI images and push them directly to an R2 bucket, then use Cloudflare Workers to expose a read‑only Pull API that maps the bucket’s OCI image layout to the OCI Distribution Specification. This dramatically improves performance, maximizes multipart upload efficiency, and avoids many registry‑related issues. At the same time, we scope blob deduplication within each repository, so deleting a repository allows us to GC its blobs—greatly simplifying operations without stop‑the‑world downtime.
However, as you can see, the push flow changed significantly in registry v2. This has already been implemented in zbpack v2, but zbpack v1’s complex push process and heavy reliance on the buildkit CLI for image building made porting difficult. For roughly the past month, projects still on zbpack v1 continued to build against registry v1, and we only switched them to zbpack v2 manually when users reported zbpack v1 failing to start.
As registry v1 became increasingly overloaded and error‑prone, and related support tickets surged, we had to consider moving the image‑handling part of zbpack v1 to zbpack v2.
Sharp‑eyed developers may have noticed that the zbpack (v1) GitHub repo was un‑archived and received many Dockerfile‑related updates. In fact, this was preparation for the compatibility layer that bridges zbpack v1 to zbpack v2.
We wanted to keep zbpack v1’s Dockerfile generation capability but switch to zbpack v2 for the actual build after the Dockerfile is produced. Therefore, we exposed v1’s Dockerfile generation as a public function, had the build service call it to generate a Dockerfile, and then passed that to the build machines running zbpack v2.
However, zbpack v1 was originally designed to run on build machines, so many parts needed to be implemented or adapted:
Therefore, we implemented a zbpack v1 compatibility layer that mirrors how the v1 build machines invoked zbpack v1. We had already fixed most obvious issues (like code detection) before the global rollout, and internal tests on staging machines didn’t show mis‑detections. On‑call engineers also closely monitored the rollout. However, once deployed to all machines, we discovered many new issues that the compatibility layer hadn’t accounted for. For example:
ZBPACK_
to be ignored.ZBPACK_DOCKERFILE_NAME
behavior to differ from before.Given that some scenarios lacked representative test environments, and rolling back could have had a larger impact, during on‑call we chose to proceed and iterate quickly—fix, test, and roll out—while also providing workarounds to customers. Between 8/26 and 8/27 we quickly completed the zbpack compatibility layer, added plan type and plan meta displays, and set up a Mainland China CDN to accelerate registry v2 downloads.
We sincerely thank all customers who reported these issues; your feedback helped surface edge cases the compatibility layer missed and prompted us to investigate and fix them.
The issues mentioned above have been implemented or fixed in the compatibility layer. If anything else remains incomplete, please open a ticket and our engineers will take care of it.
Known issues at the moment:
PORT=<public port>
(e.g., PORT=8080
).Looking back at this incident, the core causes include:
If you were affected by this incident, we can provide credits proportional to the duration of impact as compensation. We will be more cautious with future rollouts of these features, and we greatly appreciate everyone who helped uncover issues in the compatibility layer.