Losing metrics with the same name and different tags using Otel-java agent, alloy and mimir

1 week ago 9
ARTICLE AD BOX

I am using the OpenTelemetry java agent to read and push micrometer metrics from my Java app to Alloy, and from Alloy onwards onto Mimir, which is then read in grafana.

micrometer -> opentel agent -> alloy -> mimir -> grafana

I have three metrics:

// Registered in service A this.incomingCounter = Counter.builder("direct_auth_requests_total") .tag("outcome", "incoming") .register(meterRegistry); // Regitered in service B this.completedCounter = Counter.builder("direct_auth_requests_total").tag("outcome", "completed").register(meterRegistry); this.failedCounter = Counter.builder("direct_auth_requests_total").tag("outcome", "failed").register(meterRegistry);

They always get incremented in the other incoming -> completed or failed
MeterRegistry is injected via spring.

I know they are getting incremented correctly and I can see them independently using the prometheus endpoint exposed by spring-boot.

However, in Grafana I only get the first, the outcome=incoming metric. The rest seem to get lost. I have other counters that follow the same pattern of having one name multiple tags and they experience the same issue. Only the first one ever registers in grafana.

Here is the relevant parts of the alloy config:

// OTLP Receiver otelcol.receiver.otlp "collector" { grpc { endpoint = "0.0.0.0:4317" } http { endpoint = "0.0.0.0:4318" } output { metrics = [otelcol.processor.attributes.add_attributes.input] } } // Attributes Processor to add system_name and server_name labels otelcol.processor.attributes "add_attributes" { action { key = "region" value = "{{ system_region }}" action = "insert" } action { key = "system" value = "{{ system_name }}" action = "insert" } action { key = "host" value = "{{ inventory_hostname }}" action = "insert" } output { metrics = [otelcol.processor.transform.sanitize_resource_attrs.input] } } // Delete oversized resource attributes from METRICS before sending to Mimir // These attributes get added to target_info and exceed Mimir's 2048 byte limit otelcol.processor.transform "sanitize_resource_attrs" { error_mode = "propagate" metric_statements { context = "resource" statements = [ // Delete attributes that can exceed 2048 bytes "delete_key(resource.attributes, \"process.command_args\")", "delete_key(resource.attributes, \"process.command_line\")", "delete_key(resource.attributes, \"process.executable.path\")", ] } output { metrics = [otelcol.processor.batch.mimir.input] } } // Basic Auth credentials for Tempo, Mimir, and Loki otelcol.auth.basic "creds" { username = "{{ username }}" password = "{{ password }}" } // Mimir Exporter otelcol.exporter.otlphttp "mimir" { client { endpoint = "{{ mimir_endpoint }}/otel" headers = { "X-Scope-OrgID" = "{{ org_id }}", } auth = otelcol.auth.basic.creds.handler } } otelcol.processor.batch "mimir" { output { metrics = [otelcol.exporter.otlphttp.mimir.input] } }

We are using the mimir /otel endpoint. The prometheus endpoint caused a myrriad of other problems. The Alloy logs are empty, and I can see in the alloy graph metrics seem to go through (Which makes sense as clearly the first counter goes through)

Read Entire Article