Migrating from Alauda Container Platform Tracing
This document describes how to migrate an existing tracing deployment based on the legacy Alauda Container Platform (ACP) Tracing stack — Alauda Build of Jaeger (Jaeger 1.60.0) plus Alauda Build of OpenTelemetry — to Alauda Distributed Tracing based on Jaeger v2 (2.16.0) plus Alauda Build of OpenTelemetry v2.
The migration is performed in two stages:
-
OpenTelemetry stack migration. Replace the legacy
Alauda Build of OpenTelemetryOperator and Collector withAlauda Build of OpenTelemetry v2. Application pods are rolled out so that the v2 Java agent is injected. After this stage, telemetry is collected by the v2 Collector but still written to the legacy Jaeger backend. This stage is performed by following Migrating from Alauda Build of OpenTelemetry to Alauda Build of OpenTelemetry v2. -
Jaeger backend migration. Deploy a new
Jaeger v2instance alongside the legacy Jaeger, switch the v2 Collector's trace exporter to the new backend, and uninstall the legacy Jaeger after the legacy retention period has elapsed.
After the cutover, new trace data flows into the new Jaeger v2 backend while the legacy Jaeger keeps serving previously stored traces. Once the legacy retention period elapses (default 7 days), the legacy stack is uninstalled.
TOC
OverviewWhat changes between ACP Tracing and Alauda Distributed TracingMigration outage windowsMigration flow at a glanceTrace data continuity strategyPrerequisitesPre-migration tasksMigrate Alauda Build of OpenTelemetry to v2Inventory the legacy Jaeger deploymentBack up the legacy Jaeger resourcesVerify Elasticsearch capacityMigration procedureObservation periodCleanup(Optional) Enabling dual export to the legacy JaegerEnable dual exportStop the dual exportRollbackFAQOverview
What changes between ACP Tracing and Alauda Distributed Tracing
For changes to the OpenTelemetry Operator, Collector, and Instrumentation resources (including the now-required spec.java.image field, Service Mesh v1 incompatibility, and Collector configuration schema migration), see What changes between v1 and v2 in the OpenTelemetry v2 migration guide.
The legacy Alauda Build of Jaeger Operator owns a separate CRD (jaegertracing.io/v1.Jaeger) and does not conflict with the v2 OpenTelemetry Operator. It is therefore kept running during the migration so that the legacy Jaeger continues to serve historical trace data.
Migration outage windows
Trace ingestion is interrupted in two places:
-
During the OpenTelemetry v1 → v2 migration, between the time the legacy
OpenTelemetryCollectoris deleted and the time the v2OpenTelemetryCollectorbecomes ready. See Migration outage window in the OpenTelemetry v2 migration guide. -
During the Jaeger cutover, when the v2 Collector is patched to redirect its trace exporter from the legacy Jaeger to the new Jaeger v2 backend. The Collector deployment is rolled, so a short ingestion gap may occur during the rollout.
Application pods continue to run normally throughout both windows, but telemetry generated during the gaps may be temporarily buffered and can be dropped if it cannot be exported in time. Plan each stage during a low-traffic window and notify telemetry consumers (developers, SRE, Kiali users) in advance.
The legacy Jaeger query path remains available throughout the migration, so previously stored traces can still be searched in the legacy Jaeger UI while the new pipeline is being brought up.
Migration flow at a glance
Trace data continuity strategy
By following the Recreate the OpenTelemetryCollector resources guidance during the OpenTelemetry v2 migration, the v2 OpenTelemetry Collector is deployed in the same namespace and with the same Service name (otel-collector in cpaas-system) as the legacy Collector. Applications that export OTLP to otel-collector.cpaas-system keep working without any configuration change.
After the Jaeger cutover, trace data is partitioned cleanly by time:
- Traces ingested before the cutover remain only in the legacy Jaeger. They are queryable through the legacy Jaeger UI at
<platform-url>/clusters/<cluster>/acp/jaegerfor as long as the legacyjaeger-es-index-cleanerretains them — by default 7 days from the index creation date. - Traces ingested after the cutover are written only to the new Jaeger v2 backend. They are queryable through the new Jaeger UI at
<platform-url>/clusters/<cluster>/jaeger.
The legacy Jaeger instance and its Operator are intentionally kept running during the observation period so that users can continue to search recent historical traces via the legacy UI. After the legacy retention has fully aged out (≥ 7 days), the legacy stack is uninstalled in Cleanup.
Communicate the split to your users in advance: the new Jaeger UI does not contain pre-cutover traces. During the observation period, users searching for a trace that started before the cutover should fall back to the legacy Jaeger UI.
Some teams prefer to dual-export every span to both the new Jaeger and the legacy Jaeger during the observation period, so that post-cutover traces appear in both UIs. This is useful for parallel validation, gradual UI switchover across teams, or a faster in-place rollback path. It approximately doubles the Elasticsearch write load and storage during the observation period and adds an extra cleanup step. See (Optional) Enabling dual export to the legacy Jaeger.
Dual export does not backfill pre-cutover traces into the new Jaeger; pre-cutover traces are only ever queryable through the legacy Jaeger UI.
Prerequisites
- An active ACP CLI (
kubectl) session by a cluster administrator with thecluster-adminrole. - The legacy ACP Tracing stack (
Alauda Build of JaegerOperator with aJaegerinstance, plusAlauda Build of OpenTelemetrywith anOpenTelemetryCollectorand one or moreInstrumentationresources) is currently installed. - Elasticsearch 8.x is reachable from the cluster, and you have an Elasticsearch user with permission to create ILM policies, index templates, and index aliases.
- The
jqandenvsubstcommand-line tools are installed on the workstation that runs the migration commands. - Telemetry consumers (developers, Kiali users, SRE dashboards) and application owners are notified of the planned outage windows.
Pre-migration tasks
Migrate Alauda Build of OpenTelemetry to v2
Before migrating the Jaeger backend, complete the migration from Alauda Build of OpenTelemetry to Alauda Build of OpenTelemetry v2 by following Migrating from Alauda Build of OpenTelemetry to Alauda Build of OpenTelemetry v2. That guide covers:
- Backing up and uninstalling the legacy
Alauda Build of OpenTelemetryOperator and itsOpenTelemetryCollectorandInstrumentationresources. - Installing the v2 Operator.
- Preparing a Java auto-instrumentation image and recreating
OpenTelemetryCollectorandInstrumentationresources, including setting the now-requiredspec.java.imagefield. - Rolling out application pods so that the new Java agent is injected.
At the end of this stage:
- The v2
Alauda Build of OpenTelemetryOperator is installed and the legacy Operator is uninstalled. - The v2
OpenTelemetryCollectorincpaas-systemis running with its trace exporter still pointing at the legacy Jaeger (the natural outcome of recreating the Collector from the v1 backup). - All
Instrumentationresources havespec.java.imageset, and application pods have been rolled out with the v2 Java agent.
Trace ingestion continues to flow into the legacy Jaeger until Switch the v2 OpenTelemetry Collector to the new Jaeger in this guide.
Inventory the legacy Jaeger deployment
Capture the current state of the legacy Jaeger so that you understand the migration scope and can produce backups for rollback.
-
List the legacy
Alauda Build of JaegerOperator andJaegerinstances: -
Record the legacy Elasticsearch endpoint, credentials, and index prefix referenced by the legacy
Jaegerresource. They will also be reused by the new Jaeger v2 instance.
Back up the legacy Jaeger resources
Export the legacy Jaeger resources so that you can rebuild them (and roll back if needed):
The backup files are only used as configuration references and rollback artifacts. When you rebuild Jaeger on v2, follow the v2 conventions described in Installing Alauda Distributed Tracing.
Verify Elasticsearch capacity
The new Jaeger writes to a separate index family (acp-<cluster>-jaeger-*) while the legacy indices (acp-tracing-<cluster>-jaeger-*) age out over the legacy retention period. Plan for one extra full retention's worth of trace storage in Elasticsearch. If you opt into (Optional) Enabling dual export to the legacy Jaeger, plan for approximately twice the steady-state storage during the observation period.
Migration procedure
Deploy the new Jaeger v2 instance
Follow Installing Alauda Build of Jaeger v2 in the Alauda Distributed Tracing installation guide. The new Jaeger v2 instance is deployed in a dedicated namespace (jaeger-system by default) so that it does not collide with the legacy jaeger-prod instance in cpaas-system.
Only follow the Installing Alauda Build of Jaeger v2 section linked above. Do not execute the Deploying the OpenTelemetry Collector section of the same installation guide — the application-facing v2 OpenTelemetry Collector was already deployed in cpaas-system during Stage 1 (OpenTelemetry v2 migration). Running that section would create a duplicate otel Collector in jaeger-system that no application talks to.
When you reach the variable-setup step, keep the default index prefix so that the new Jaeger writes to indices that are clearly separated from the legacy ones:
After completing the installation steps, verify that:
- The Jaeger Pod in
jaeger-systemisReady. - The Jaeger UI is reachable at
<platform-url>/clusters/<cluster>/jaeger. It is empty at this point, since no exporter is yet writing to it. - The Service
jaeger-collector.jaeger-system.svc.cluster.localaccepts OTLP gRPC on port4317— this is the endpoint that the v2 OpenTelemetry Collector will export to in the next step.
Switch the v2 OpenTelemetry Collector to the new Jaeger
After the OpenTelemetry v1 → v2 migration, the recreated otel OpenTelemetryCollector in cpaas-system still writes traces to the legacy Jaeger because its trace exporter was inherited from the v1 backup. The trace-related portion of its spec.config typically looks like this (other fields are unrelated to this step and are omitted):
- The legacy-Jaeger exporter — inherited from the v1 backup — is named
otlpand targets the legacy Jaeger collector's headless Service incpaas-system.balancer_name: round_robindistributes spans across the headless Service endpoints. - The trace pipeline sends spans to
debug(logs) andotlp(legacy Jaeger).
Patch the Collector to (1) add a new otlp/jaeger-v2 exporter pointing at the new Jaeger v2 collector Service in jaeger-system, (2) remove the legacy otlp exporter by setting it to null, and (3) replace the trace pipeline's exporter list with [debug, otlp/jaeger-v2]:
service.pipelines.traces.exporters is an array, and a merge patch replaces arrays in their entirety rather than appending. The patch above lists every exporter that must remain in the trace pipeline (debug, otlp/jaeger-v2). If your trace pipeline contains additional custom exporters, add them to this list before applying the patch.
If the inherited legacy-Jaeger exporter on your Collector is not named otlp (for example, your v1 backup used jaeger or a different OTLP variant), substitute that name in the null removal step accordingly.
The new exporter is named otlp/jaeger-v2 rather than reusing the otlp name so that, if you later choose to also write to the legacy Jaeger during the observation period, the additional exporter can be patched back in symmetrically as otlp/jaeger-v1. See (Optional) Enabling dual export to the legacy Jaeger.
Trace ingestion is restored at this point. New traces are written to the new Jaeger v2 backend and become searchable in the new Jaeger UI.
Verify the migration
-
Confirm that both
OpenTelemetryCollectorresources are healthy:Example output:
Both
cpaas-system/otel(the application-facing Collector, redeployed by the OpenTelemetry v1 → v2 migration) andjaeger-system/jaeger(the new Jaeger v2 backend, deployed in Deploy the new Jaeger v2 instance) must reportREADYas<ready>/<desired>with<ready>equal to<desired>(typically1/1) andMANAGEMENTasmanaged. If theREADYcolumn shows0/1or is empty, inspect the Collector pod logs in the corresponding namespace before continuing. -
Generate sample traces with
telemetrygenand verify they appear in the new Jaeger UI:The new Jaeger UI at
<platform-url>/clusters/<cluster>/jaegershould list thejaeger-migration-checkservice and its traces. -
Confirm that the new index family is being created in Elasticsearch and that the legacy index family is intact:
You should see legacy indices matching
acp-tracing-<cluster>-jaeger-*(date-stamped, no longer growing) and new indices matchingacp-<cluster>-jaeger-*-000001(rollover, growing). -
Spot-check that a real business request produces a trace in the new Jaeger UI. Pick one or two already-instrumented applications, trigger a representative request, and look up its traceID in the new Jaeger UI.
Observation period
Keep the legacy Jaeger running for at least 7 days after the cutover, matching the legacy esIndexCleaner.numberOfDays retention. During this window:
- The legacy Jaeger UI continues to serve any pre-cutover traces that have not yet been deleted by the legacy
jaeger-es-index-cleaner. The cleaner removes each index ~7 days after its creation date; once all pre-cutover indices have been cleaned, the legacy Jaeger has no useful data left and can be uninstalled. - The new Jaeger accumulates traces produced after the cutover. Validate dashboards, alerts, and Kiali integrations against the new Jaeger UI and the v2 Java agent metric names.
- Communicate the split clearly: tell users that the new Jaeger UI shows traces from the cutover onwards, and that older traces remain in the legacy Jaeger UI.
If pre-cutover traces have no business value, you can shorten or skip the observation period and proceed directly to Cleanup; the trade-off is that any traces older than the cutover are no longer reachable as soon as the legacy stack is removed.
Once the legacy Jaeger instance is deleted in Uninstall the legacy Jaeger instance, pre-cutover traces are no longer recoverable. Confirm that nobody still relies on them before proceeding.
Cleanup
Uninstall the legacy Jaeger instance
If you opted into (Optional) Enabling dual export to the legacy Jaeger, first remove the legacy exporter from the v2 Collector pipeline as described in Stop the dual export. The legacy Jaeger collector Service must still exist while the v2 Collector references it; otherwise the OTLP exporter accumulates connection errors until the Collector is patched.
Delete the legacy Jaeger instance and its supporting resources:
Uninstall the Alauda Build of Jaeger Operator:
The jaegers.jaegertracing.io CRD can also be deleted if no other Jaeger resources remain in the cluster:
Disable the legacy feature switch and clean up legacy indices
-
In the ACP web console, open Feature Switch and disable
acp-tracing-ui. The platform-customized Observability → Tracing view is no longer functional after the legacy Alauda Build of OpenTelemetry v1 Operator is uninstalled. Update internal documentation and runbooks to point to the Jaeger UI URL<platform-url>/clusters/<cluster>/jaeger. -
The legacy
jaeger-es-index-cleanerCronJob was deleted together with the legacyJaegerinstance, so any remainingacp-tracing-<cluster>-jaeger-*indices in Elasticsearch are no longer auto-rotated. Delete them manually:
(Optional) Enabling dual export to the legacy Jaeger
Configuring the v2 OpenTelemetry Collector to write every span to both the new Jaeger and the legacy Jaeger is an opt-in alternative to the default single-export pipeline. Consider it when one of the following applies:
- Parallel validation. You want to compare the new Jaeger against the legacy Jaeger on production traffic during the observation period before fully relying on the new backend.
- Faster in-place rollback. If the new Jaeger turns out to have problems during the observation period, you can drop
otlp/jaeger-v2from the trace pipeline with a single patch and the legacy Jaeger keeps receiving traces uninterrupted. - Gradual UI switchover. Different teams plan to switch from the legacy UI to the new UI on independent schedules and you want both UIs to show post-cutover data during the transition.
Trade-offs:
- Elasticsearch write load and storage usage roughly double during the observation period because every span is indexed in both index families.
- The trace pipeline must be reverted by an extra patch in Stop the dual export before the legacy Jaeger is uninstalled.
- Two export pipelines must be monitored for failures (
otelcol_exporter_send_failed_spans_totalfor bothotlp/jaeger-v2andotlp/jaeger-v1).
Dual export does not backfill pre-cutover traces into the new Jaeger; pre-cutover traces are only ever queryable through the legacy Jaeger UI.
Enable dual export
After Switch the v2 OpenTelemetry Collector to the new Jaeger has converged, the otel Collector has otlp/jaeger-v2 as the only trace exporter writing to a Jaeger backend (the trace pipeline is [debug, otlp/jaeger-v2]). Patch the Collector to add the legacy Jaeger back as a second exporter otlp/jaeger-v1, mirroring the legacy-Jaeger endpoint, headless-Service load balancing, and TLS settings of the original otlp exporter that was removed in the switch step:
service.pipelines.traces.exporters is an array, and a merge patch replaces arrays in their entirety rather than appending. The patch above lists every exporter that must remain in the trace pipeline (debug, otlp/jaeger-v2, otlp/jaeger-v1). If your trace pipeline contains additional custom exporters, add them to this list before applying the patch.
After the patch is applied, every span is written to both Jaeger backends.
Stop the dual export
After the observation period, before Uninstall the legacy Jaeger instance, remove the legacy exporter from the trace pipeline:
After this patch, the v2 Collector writes traces only to the new Jaeger. You can now proceed with Uninstall the legacy Jaeger instance.
Rollback
For rollback of the OpenTelemetry v1 → v2 stage, see Rollback in the OpenTelemetry v2 migration guide. For the Jaeger migration stage performed in this document, choose the rollback path that matches the phase you are in:
FAQ
Do applications need to update their OTLP endpoints?
No. By following the Recreate the OpenTelemetryCollector resources guidance in the OpenTelemetry v2 migration guide, the v2 OpenTelemetry Collector is deployed in the same namespace (cpaas-system), with the same Service name (otel-collector) and the same ports (4317/4318). Workloads that exported to otel-collector.cpaas-system:4317 keep working without any change.
Will the new Jaeger UI show traces from before the cutover?
No. Pre-cutover traces are stored only in the legacy Jaeger and remain queryable through the legacy Jaeger UI for the duration of the legacy retention window (default 7 days). The new Jaeger UI starts showing traces from the cutover onwards. Communicate this to users in advance, and direct them to the legacy UI for older traces during the observation period.
When should I enable dual export?
The default single-export path is sufficient for most migrations. Enable dual export ((Optional) Enabling dual export to the legacy Jaeger) only when one of the following applies:
- You need to validate the new Jaeger against the legacy Jaeger using production traffic before relying on it.
- Different teams will switch from the legacy UI to the new UI on independent schedules and you want both UIs to show post-cutover data during the transition.
- You want the fastest possible in-place rollback path during the observation period.
Be aware that dual export approximately doubles the Elasticsearch write load and storage during the observation period, and adds an extra patch step before the legacy Jaeger is uninstalled.
Can the legacy and the new Jaeger share the same Elasticsearch index?
No. The legacy Jaeger uses date-stamped daily indices (acp-tracing-<cluster>-jaeger-span-YYYY-MM-DD), while Jaeger v2 uses rollover aliases (acp-<cluster>-jaeger-span-write / -read backed by *-000001, *-000002, …). The schemas and lifecycle are managed differently, so the two index families must remain separate. Keep the default prefixes shown in Deploy the new Jaeger v2 instance to avoid collisions.
How much extra Elasticsearch storage is required during the observation period?
With the default single-export path, the new index family grows from zero while the legacy index family ages out over the legacy retention; plan for one extra full retention's worth of trace storage as a steady-state buffer. With dual export enabled, plan for approximately twice the steady-state size of the legacy index family for the duration of the observation period, plus headroom for retries and ingestion bursts.
Can Service Performance Monitoring (SPM) be enabled as part of the migration?
SPM is optional and can be enabled at any time after Verify the migration. Follow (Optional) Enabling Service Performance Monitoring (SPM) to add the spanmetrics connector to the v2 OpenTelemetry Collector and configure a metrics backend in the new Jaeger.