Skip to main content

Logging & Observability

All GospeLib services emit structured JSON logs, expose health endpoints, instrument with OpenTelemetry, and expose profiling endpoints for Pyroscope. This guide covers the implementation patterns for each language.

Observability Stack

SignalGoPythonFrontend (Next.js)Backend
Logszerolog → stdoutstructlog → stdoutFaro SDK → AlloyLoki
Metricspromhttp /metricsprometheus-fastapi /metricsFaro SDK (Web Vitals)Prometheus
Tracesotelchi middlewareFastAPIInstrumentorFaro SDK → AlloyTempo
Profilespprof /debug/pprof/pyroscope-io SDK (push)Pyroscope
Errorssentry-go SDKsentry-sdkFaro SDK + SentrySentry

See Observability for the infrastructure setup and Grafana access.

Log Format

All services emit JSON-structured logs with these standard fields:

{
"level": "info",
"service": "gateway",
"trace_id": "abc123",
"method": "GET",
"path": "/api/v1/passages/gen.1.1",
"status": 200,
"latency_ms": 12,
"timestamp": "2026-03-07T12:00:00Z"
}

Alloy tails Docker container logs, parses the level and service fields, and stamps an env label before pushing to Loki.

Go Logging (zerolog)

Setup

Configure zerolog in cmd/server/main.go:

import (
"os"
"github.com/rs/zerolog"
"github.com/rs/zerolog/log"
)

func main() {
zerolog.TimeFieldFormat = zerolog.TimeFormatUnix
log.Logger = zerolog.New(os.Stdout).
With().
Timestamp().
Str("service", "gateway").
Logger()
}

Usage

log.Info().
Str("passage_id", "gen.1.1").
Int("status", 200).
Dur("latency", elapsed).
Msg("passage retrieved")

log.Error().
Err(err).
Str("user_id", userID).
Msg("failed to fetch entitlements")

Middleware logging

Chi's middleware.Logger emits structured access logs automatically. For custom request logging:

func RequestLogger(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
ww := middleware.NewWrapResponseWriter(w, r.ProtoMajor)
next.ServeHTTP(ww, r)
log.Info().
Str("method", r.Method).
Str("path", r.URL.Path).
Int("status", ww.Status()).
Dur("latency", time.Since(start)).
Str("request_id", middleware.GetReqID(r.Context())).
Msg("request")
})
}

Python Logging (structlog)

Setup

Configure structlog in the app factory or a dedicated logger.py:

import structlog

structlog.configure(
processors=[
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
)

log = structlog.get_logger()

Usage

log.info("passage retrieved", passage_id="gen.1.1", latency_ms=12)
log.error("graph query failed", error=str(exc), query=cypher_template)

FastAPI middleware logging

@app.middleware("http")
async def log_requests(request: Request, call_next):
start = time.monotonic()
response = await call_next(request)
latency = (time.monotonic() - start) * 1000
log.info(
"request",
method=request.method,
path=request.url.path,
status=response.status_code,
latency_ms=round(latency, 2),
request_id=request.headers.get("x-request-id", ""),
)
return response

Health Endpoints

Every service must expose two health endpoints:

EndpointPurposeReturns 200 when…
GET /healthLiveness probeThe process is running
GET /readyReadiness probeAll dependencies are reachable

Go example

r.Get("/health", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"status":"ok"}`))
})

r.Get("/ready", func(w http.ResponseWriter, r *http.Request) {
if err := db.Ping(r.Context()); err != nil {
w.WriteHeader(503)
w.Write([]byte(`{"status":"not ready","error":"db unreachable"}`))
return
}
w.Write([]byte(`{"status":"ready"}`))
})

Python example

@router.get("/health")
async def health():
return {"status": "ok"}

@router.get("/ready")
async def ready(graph: GraphClient = Depends(get_graph_client)):
try:
await graph.query("RETURN 1")
return {"status": "ready"}
except Exception as e:
return JSONResponse(status_code=503, content={"status": "not ready", "error": str(e)})

OpenTelemetry Tracing

Go

Use otelchi middleware:

import "go.opentelemetry.io/contrib/instrumentation/github.com/go-chi/chi/otelchi"

r.Use(otelchi.Middleware("gateway"))

Python

Use opentelemetry-instrumentation-fastapi:

from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

FastAPIInstrumentor.instrument_app(app)

Traces are exported to Tempo via the OTLP gRPC exporter (port 4317 on Alloy) and visualized in Grafana.

Continuous Profiling (Pyroscope)

Go — pprof (pull-based)

All Go services expose standard pprof endpoints on the metrics port (the same http.ServeMux that serves /metrics):

import "net/http/pprof"

mux := http.NewServeMux()
mux.Handle("/metrics", promhttp.Handler())
mux.HandleFunc("/debug/pprof/", pprof.Index)
mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
mux.HandleFunc("/debug/pprof/trace", pprof.Trace)

Alloy scrapes these endpoints via pyroscope.scrape and pushes the profiles to Pyroscope. The scrape targets are defined in infra/alloy/config.alloy:

Servicepprof address
gatewaygateway:8081
authauth:8201
billingbilling:8301
notificationsnotifications:8501

Python — Pyroscope SDK (push-based)

Python services use the pyroscope-io SDK which pushes profiles directly to Pyroscope. This is configured in each service's create_app():

import pyroscope

if settings.GOSPELIB_PYROSCOPE_URL:
pyroscope.configure(
application_name="content",
server_address=settings.GOSPELIB_PYROSCOPE_URL,
tags={"env": settings.NODE_ENV},
)

The GOSPELIB_PYROSCOPE_URL env var defaults to http://pyroscope:4040 in the Docker Compose dev stack.

Request ID Propagation

The gateway injects X-Request-ID on every request. Downstream services must:

  1. Read the header from the incoming request
  2. Include it in all log entries
  3. Forward it when calling other services

This creates a correlation trail across all services for a single user request.

Frontend Observability (Faro)

The Next.js web app is instrumented with Grafana Faro for browser-side telemetry. Faro captures JavaScript errors, console output, Web Vitals, navigation traces, and session replay recordings.

Initialization

Faro initializes lazily via apps/web/lib/faro.ts, loaded by the FaroInit component in the Providers tree. It only activates when NEXT_PUBLIC_FARO_COLLECTOR_URL is set.

import { initFaro, getFaro } from '../lib/faro';

// Faro initializes automatically on mount via FaroInit.
// To push a custom event or set user context:
const faro = getFaro();
faro?.api.pushEvent('passage_opened', { passage_id: 'gen.1.1' });
faro?.api.setUser({ id: userId, email: userEmail });

What gets captured automatically

  • Errors — uncaught exceptions and promise rejections with full stack traces
  • Consoleconsole.error and console.warn output (debug/trace filtered out)
  • Web Vitals — LCP, FID, CLS, TTFB, INP
  • Traces — page navigations, fetch requests, user interactions (connected to backend traces via W3C traceparent)
  • Session replay — DOM recording for post-hoc playback of user sessions

Browser → backend trace correlation

Faro's TracingInstrumentation automatically propagates traceparent headers on fetch requests to the API. This means a frontend navigation span and the backend gateway/content/AI spans appear in the same Tempo trace — you can follow a user action from click to database query.

Sentry Integration

Configure Sentry in each service via the SENTRY_DSN environment variable. Sentry catches unhandled exceptions and provides stack traces, breadcrumbs, and release tracking.

SENTRY_DSN=https://xxx@sentry.io/xxx