.. _Observability: Observability ************* Testplan provides built-in observability through OpenTelemetry tracing and logging, allowing you to monitor and analyze test execution. This feature enables you to monitor test execution flow, timing, performance bottlenecks, and export log output correlated with your traces to Loki. Overview ======== The observability feature integrates OpenTelemetry to create spans and collect logs for various levels of test execution: 1. **Testplan level**: Top-level span for the entire test plan 2. **Test level**: Spans for individual test runnables (MultiTest, PyTest, GTest, etc.) 3. **Testsuite level**: Spans for test suites (MultiTest only) 4. **Testcase level**: Spans for individual test cases (MultiTest only) For test types other than MultiTest (such as PyTest, GTest, JUnit), only the entire test execution is traced as a single span, without breaking down into individual suites or cases. Each span captures timing information, attributes, and status (pass/fail), allowing you to visualize the complete test execution in your observability platform. When logging is enabled, all logs are automatically correlated with their corresponding spans via trace_id and span_id. Configuration ------------- Environment Variables +++++++++++++++++++++ To enable OpenTelemetry tracing, set the following environment variables. The exporter type is auto-detected from the endpoint URL: endpoints starting with ``http://`` or ``https://`` use the HTTP exporter, otherwise the gRPC exporter is used. **Required Variables (all exporters):** .. code-block:: bash # OTLP exporter endpoint # HTTP endpoint (uses HTTP exporter): export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://your-otlp-endpoint:4318/v1/traces # gRPC endpoint (uses gRPC exporter): export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=your-otlp-endpoint:4317 # Resource attributes (key-value format) export OTEL_RESOURCE_ATTRIBUTES="service.name=my-testplan,environment=staging,team=qa" **Additional Required Variables (gRPC exporter only):** .. code-block:: bash # TLS certificates for gRPC mTLS connection export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca-cert.pem export OTEL_EXPORTER_OTLP_CLIENT_KEY=/path/to/client-key.pem export OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE=/path/to/client-cert.pem **Additional Variables for Logging:** .. code-block:: bash # Loki endpoint for log export export OTEL_EXPORTER_LOKI_ENDPOINT=https://your-loki-endpoint:3100 **Optional Variables:** .. code-block:: bash # Batch span processor delay in milliseconds (default: 200) export OTEL_BSP_SCHEDULE_DELAY=500 Tracing ======= Enabling Tracing ---------------- Use the ``--otel-traces`` command line flag: .. code-block:: bash # Enable tracing with the flag python my_testplan.py --otel-traces Where ```` can be: * ``Plan``: Trace at the Testplan level * ``Test``: Trace at the Testplan and Test levels * ``TestSuite``: Trace at the Testplan, Test, and Testsuite levels * ``TestCase``: Trace at all levels including Testcase You can also set the tracing level programmatically in your test plan definition: .. code-block:: python from testplan import test_plan from testplan.common.utils.observability import TraceLevel @test_plan(name="MyTestPlan", otel_traces=TraceLevel.TESTCASE) def main(plan): # Your test plan definition here pass Trace Hierarchy --------------- The tracing hierarchy follows the test structure: .. code-block:: text Testplan (root span) ├── MultiTest │ ├── TestSuite1 │ │ ├── testcase_1 │ │ ├── testcase_2 │ │ └── testcase_3 │ └── TestSuite2 │ ├── testcase_4 │ └── testcase_5 ├── PyTest └── GTest Each span includes: * **Name**: The name of the test/suite/case * **Attributes**: Metadata like test level, status, etc. * **Status**: Pass/error based on test results * **Timing**: Start and end timestamps Trace Context Propagation ++++++++++++++++++++++++++ When you need to integrate Testplan traces into an existing distributed trace, use the ``--otel-traceparent`` flag to specify the parent trace context in W3C Trace Context format: .. code-block:: bash # Link Testplan execution to an existing trace python my_testplan.py --otel-traces TestCase --otel-traceparent "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" The traceparent format is: ``version-trace_id-parent_span_id-trace_flags`` If you are only concerned with setting a specific trace ID and don't need to link to an actual parent span, you can use a dummy span ID of all zeros: .. code-block:: bash # Set trace ID without parent span linkage python my_testplan.py --otel-traces TestCase --otel-traceparent "00-0af7651916cd43dd8448eb211c80319c-0000000000000000-01" This allows your Testplan execution to appear as a child span in your broader system trace, enabling end-to-end observability across multiple test executions. A common use case is to start a parent trace and have multiple Testplan runs execute in parallel as child spans under that trace. You can generate a traceparent with a script like the following: .. code-block:: python # start_trace.py import sys from testplan.common.utils.observability import tracing def main(): tracing._setup() with tracing.span("Trace Start"): tracing._inject_root_context() print(tracing._get_traceparent()) if __name__ == "__main__": sys.exit(main()) Then pass the output to multiple parallel Testplan runs: .. code-block:: bash # Start the parent trace and capture the traceparent TRACEPARENT=$(python start_trace.py | tail -n 1) # Launch multiple testplan runs in parallel under the same parent trace python testplan1.py --otel-traces TestCase --otel-traceparent "$TRACEPARENT" & python testplan2.py --otel-traces TestCase --otel-traceparent "$TRACEPARENT" & python testplan3.py --otel-traces TestCase --otel-traceparent "$TRACEPARENT" & wait Deterministic Trace IDs +++++++++++++++++++++++ For better traceability in CI/CD pipelines, you can generate deterministic trace IDs based on build and testplan identifiers. This allows you to correlate traces with specific builds and test executions: .. code-block:: python # generate_traceparent.py import sys def generate_deterministic_traceparent(build_id, testplan_name): """Generate a deterministic trace ID from build and testplan identifiers.""" # Format: {BUILD_ID}{TESTPLAN_NAME}0000000 (padded to 32 hex chars) # Format should be different enough to avoid collisions trace_id_base = f"{build_id}{testplan_name}" trace_id = trace_id_base.ljust(32, '0')[:32] # Use dummy span ID since we only care about trace ID grouping parent_span_id = "0000000000000000" trace_flags = "01" return f"00-{trace_id}-{parent_span_id}-{trace_flags}" if __name__ == "__main__": build_id = sys.argv[1] # e.g., "BUILD123" testplan_name = sys.argv[2] # e.g., "smoke_tests" print(generate_deterministic_traceparent(build_id, testplan_name)) Usage: .. code-block:: bash # Generate traceparent for a specific build and testplan TRACEPARENT=$(python generate_traceparent.py BUILD123 smoke_tests) python testplan.py --otel-traces TestCase --otel-traceparent "$TRACEPARENT" Manual Span Creation -------------------- For custom instrumentation within your tests, you can manually create spans: Context Manager Style +++++++++++++++++++++ .. code-block:: python from testplan.common.utils.observability import tracing @testcase def my_test(self, env, result): with tracing.span("database_query", db="postgres", query="SELECT"): # Your code here result.true(perform_database_query()) Start/End Style +++++++++++++++ .. code-block:: python @testcase def my_test(self, env, result): # Start a span span = tracing.start_span( "complex_operation", operation="data_processing", record_count=1000 ) try: result.true(process_data()) finally: # End the span tracing.end_span(span) Setting Span Attributes ++++++++++++++++++++++++ .. code-block:: python @testcase def my_test(self, env, result): with tracing.span("api_call") as span: response = call_api() # Add custom attributes to the span tracing.set_span_attrs( span=span, status_code=response.status_code, response_time_ms=response.elapsed.total_seconds() * 1000 ) result.equal(response.status_code, 200) Marking Spans as Failed ++++++++++++++++++++++++ .. code-block:: python @testcase def my_test(self, env, result): with tracing.span("validation") as span: data = fetch_data() if not validate(data): tracing.set_span_as_failed( span=span, description="Data validation failed" ) result.fail("Invalid data") See :ref:`example here `. Distributed Tracing ------------------- When running tests in distributed environments (e.g., with :ref:`pools `), tracing context is automatically propagated across process boundaries. **Automatic Environment Variable Propagation** For remote pools such as RemotePool, OpenTelemetry environment variables (those prefixed with ``OTEL_``) are automatically passed to remote workers, so no additional configuration is typically needed: .. code-block:: python import os from testplan import test_plan from testplan.runners.pools import RemotePool from testplan.runners.pools.tasks import Task @test_plan(name="DistributedTracing") def main(plan): # OTEL_ environment variables are automatically propagated pool = RemotePool( name="MyPool", size=4 ) plan.add_resource(pool) # Tasks will inherit the root trace context for idx in range(10): task = Task( target='make_multitest', module='tasks', path='.' ) plan.schedule(task, resource='MyPool') **Explicit Environment Variable Configuration** If you need to customize the OpenTelemetry configuration for remote workers (e.g., the credentials path is not accessible on the remote), you can explicitly pass the environment variables via the ``env`` parameter: .. code-block:: python import os from testplan import test_plan from testplan.runners.pools import RemotePool from testplan.runners.pools.tasks import Task @test_plan(name="DistributedTracing") def main(plan): # Explicitly configure OTEL environment variables for remote workers env_dict = { 'OTEL_RESOURCE_ATTRIBUTES': 'service.name=worker-service,environment=remote', 'OTEL_EXPORTER_OTLP_TRACES_ENDPOINT': 'https://remote-otlp-endpoint:4317', 'OTEL_EXPORTER_OTLP_HEADERS': 'api-key=remote-key', 'OTEL_EXPORTER_OTLP_CERTIFICATE': '/path/to/remote-ca-cert.pem', } # Add a pool with custom OTEL environment variables pool = RemotePool( name="MyPool", size=4, env=env_dict # Override default OTEL configuration ) plan.add_resource(pool) # Tasks will use the custom configuration for idx in range(10): task = Task( target='make_multitest', module='tasks', path='.' ) plan.schedule(task, resource='MyPool')