Observability

Testplan provides built-in observability through OpenTelemetry tracing and logging, allowing you to monitor and analyze test execution. This feature enables you to monitor test execution flow, timing, performance bottlenecks, and export log output correlated with your traces to Loki.

Overview

The observability feature integrates OpenTelemetry to create spans and collect logs for various levels of test execution:

  1. Testplan level: Top-level span for the entire test plan

  2. Test level: Spans for individual test runnables (MultiTest, PyTest, GTest, etc.)

  3. Testsuite level: Spans for test suites (MultiTest only)

  4. Testcase level: Spans for individual test cases (MultiTest only)

For test types other than MultiTest (such as PyTest, GTest, JUnit), only the entire test execution is traced as a single span, without breaking down into individual suites or cases.

Each span captures timing information, attributes, and status (pass/fail), allowing you to visualize the complete test execution in your observability platform. When logging is enabled, all logs are automatically correlated with their corresponding spans via trace_id and span_id.

Configuration

Environment Variables

To enable OpenTelemetry tracing, set the following environment variables.

The exporter type is auto-detected from the endpoint URL: endpoints starting with http:// or https:// use the HTTP exporter, otherwise the gRPC exporter is used.

Required Variables (all exporters):

# OTLP exporter endpoint
# HTTP endpoint (uses HTTP exporter):
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://your-otlp-endpoint:4318/v1/traces
# gRPC endpoint (uses gRPC exporter):
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=your-otlp-endpoint:4317

# Resource attributes (key-value format)
export OTEL_RESOURCE_ATTRIBUTES="service.name=my-testplan,environment=staging,team=qa"

Additional Required Variables (gRPC exporter only):

# TLS certificates for gRPC mTLS connection
export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca-cert.pem
export OTEL_EXPORTER_OTLP_CLIENT_KEY=/path/to/client-key.pem
export OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE=/path/to/client-cert.pem

Additional Variables for Logging:

# Loki endpoint for log export
export OTEL_EXPORTER_LOKI_ENDPOINT=https://your-loki-endpoint:3100

Optional Variables:

# Batch span processor delay in milliseconds (default: 200)
export OTEL_BSP_SCHEDULE_DELAY=500

Tracing

Enabling Tracing

Use the --otel-traces command line flag:

# Enable tracing with the flag
python my_testplan.py --otel-traces <LEVEL>
Where <LEVEL> can be:
  • Plan: Trace at the Testplan level

  • Test: Trace at the Testplan and Test levels

  • TestSuite: Trace at the Testplan, Test, and Testsuite levels

  • TestCase: Trace at all levels including Testcase

You can also set the tracing level programmatically in your test plan definition:

from testplan import test_plan
from testplan.common.utils.observability import TraceLevel

@test_plan(name="MyTestPlan", otel_traces=TraceLevel.TESTCASE)
def main(plan):
    # Your test plan definition here
    pass

Trace Hierarchy

The tracing hierarchy follows the test structure:

Testplan (root span)
├── MultiTest
│   ├── TestSuite1
│   │   ├── testcase_1
│   │   ├── testcase_2
│   │   └── testcase_3
│   └── TestSuite2
│       ├── testcase_4
│       └── testcase_5
├── PyTest
└── GTest

Each span includes:

  • Name: The name of the test/suite/case

  • Attributes: Metadata like test level, status, etc.

  • Status: Pass/error based on test results

  • Timing: Start and end timestamps

Trace Context Propagation

When you need to integrate Testplan traces into an existing distributed trace, use the --otel-traceparent flag to specify the parent trace context in W3C Trace Context format:

# Link Testplan execution to an existing trace
python my_testplan.py --otel-traces TestCase --otel-traceparent "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01"

The traceparent format is: version-trace_id-parent_span_id-trace_flags

If you are only concerned with setting a specific trace ID and don’t need to link to an actual parent span, you can use a dummy span ID of all zeros:

# Set trace ID without parent span linkage
python my_testplan.py --otel-traces TestCase --otel-traceparent "00-0af7651916cd43dd8448eb211c80319c-0000000000000000-01"

This allows your Testplan execution to appear as a child span in your broader system trace, enabling end-to-end observability across multiple test executions. A common use case is to start a parent trace and have multiple Testplan runs execute in parallel as child spans under that trace. You can generate a traceparent with a script like the following:

# start_trace.py
import sys
from testplan.common.utils.observability import tracing

def main():
    tracing._setup()
    with tracing.span("Trace Start"):
        tracing._inject_root_context()
        print(tracing._get_traceparent())

if __name__ == "__main__":
    sys.exit(main())

Then pass the output to multiple parallel Testplan runs:

# Start the parent trace and capture the traceparent
TRACEPARENT=$(python start_trace.py | tail -n 1)

# Launch multiple testplan runs in parallel under the same parent trace
python testplan1.py --otel-traces TestCase --otel-traceparent "$TRACEPARENT" &
python testplan2.py --otel-traces TestCase --otel-traceparent "$TRACEPARENT" &
python testplan3.py --otel-traces TestCase --otel-traceparent "$TRACEPARENT" &
wait

Deterministic Trace IDs

For better traceability in CI/CD pipelines, you can generate deterministic trace IDs based on build and testplan identifiers. This allows you to correlate traces with specific builds and test executions:

# generate_traceparent.py
import sys

def generate_deterministic_traceparent(build_id, testplan_name):
    """Generate a deterministic trace ID from build and testplan identifiers."""
    # Format: {BUILD_ID}{TESTPLAN_NAME}0000000 (padded to 32 hex chars)
    # Format should be different enough to avoid collisions
    trace_id_base = f"{build_id}{testplan_name}"
    trace_id = trace_id_base.ljust(32, '0')[:32]
    # Use dummy span ID since we only care about trace ID grouping
    parent_span_id = "0000000000000000"
    trace_flags = "01"
    return f"00-{trace_id}-{parent_span_id}-{trace_flags}"

if __name__ == "__main__":
    build_id = sys.argv[1]  # e.g., "BUILD123"
    testplan_name = sys.argv[2]  # e.g., "smoke_tests"
    print(generate_deterministic_traceparent(build_id, testplan_name))

Usage:

# Generate traceparent for a specific build and testplan
TRACEPARENT=$(python generate_traceparent.py BUILD123 smoke_tests)
python testplan.py --otel-traces TestCase --otel-traceparent "$TRACEPARENT"

Manual Span Creation

For custom instrumentation within your tests, you can manually create spans:

Context Manager Style

from testplan.common.utils.observability import tracing

@testcase
def my_test(self, env, result):
    with tracing.span("database_query", db="postgres", query="SELECT"):
        # Your code here
        result.true(perform_database_query())

Start/End Style

@testcase
def my_test(self, env, result):
    # Start a span
    span = tracing.start_span(
        "complex_operation",
        operation="data_processing",
        record_count=1000
    )

    try:
        result.true(process_data())
    finally:
        # End the span
        tracing.end_span(span)

Setting Span Attributes

@testcase
def my_test(self, env, result):
    with tracing.span("api_call") as span:
        response = call_api()

        # Add custom attributes to the span
        tracing.set_span_attrs(
            span=span,
            status_code=response.status_code,
            response_time_ms=response.elapsed.total_seconds() * 1000
        )

        result.equal(response.status_code, 200)

Marking Spans as Failed

@testcase
def my_test(self, env, result):
    with tracing.span("validation") as span:
        data = fetch_data()

        if not validate(data):
            tracing.set_span_as_failed(
                span=span,
                description="Data validation failed"
            )
            result.fail("Invalid data")

See example here.

Distributed Tracing

When running tests in distributed environments (e.g., with pools), tracing context is automatically propagated across process boundaries.

Automatic Environment Variable Propagation

For remote pools such as RemotePool, OpenTelemetry environment variables (those prefixed with OTEL_) are automatically passed to remote workers, so no additional configuration is typically needed:

import os
from testplan import test_plan
from testplan.runners.pools import RemotePool
from testplan.runners.pools.tasks import Task

@test_plan(name="DistributedTracing")
def main(plan):
    # OTEL_ environment variables are automatically propagated
    pool = RemotePool(
        name="MyPool",
        size=4
    )
    plan.add_resource(pool)

    # Tasks will inherit the root trace context
    for idx in range(10):
        task = Task(
            target='make_multitest',
            module='tasks',
            path='.'
        )
        plan.schedule(task, resource='MyPool')

Explicit Environment Variable Configuration

If you need to customize the OpenTelemetry configuration for remote workers (e.g., the credentials path is not accessible on the remote), you can explicitly pass the environment variables via the env parameter:

import os
from testplan import test_plan
from testplan.runners.pools import RemotePool
from testplan.runners.pools.tasks import Task

@test_plan(name="DistributedTracing")
def main(plan):
    # Explicitly configure OTEL environment variables for remote workers
    env_dict = {
        'OTEL_RESOURCE_ATTRIBUTES': 'service.name=worker-service,environment=remote',
        'OTEL_EXPORTER_OTLP_TRACES_ENDPOINT': 'https://remote-otlp-endpoint:4317',
        'OTEL_EXPORTER_OTLP_HEADERS': 'api-key=remote-key',
        'OTEL_EXPORTER_OTLP_CERTIFICATE': '/path/to/remote-ca-cert.pem',
    }

    # Add a pool with custom OTEL environment variables
    pool = RemotePool(
        name="MyPool",
        size=4,
        env=env_dict  # Override default OTEL configuration
    )
    plan.add_resource(pool)

    # Tasks will use the custom configuration
    for idx in range(10):
        task = Task(
            target='make_multitest',
            module='tasks',
            path='.'
        )
        plan.schedule(task, resource='MyPool')