preloader

JSON is all you need: Easily monitor LLM apps with structlog

author image

With the meteoric rise of LLM-powered apps, new tools promising to solve the observability problem for LLMs are emerging. The truth is, LLM apps are no different than regular applications, and you you don’t need specialized tools to monitor them.

Many of our customers deploy LLM applications, and we’ve helped many of them implement their observability stack. In 95% of use cases, the solution that I’ll describe in this post works perfectly. Let’s dive right in!

Introducing structlog

The key component in our stack is structlog, a powerful Python logging library. Among many other things, it allows you to generate JSON logs and keep thread-local context during the lifetime of a web request. These two features will enable us to build a robust monitoring system for an LLM application. Let’s see how!

First, install the library:

pip install structlog

structlog basics

structlog is a highly customizable logging library, so we’ll limit ourselves to explain the basics. To configure structlog, call the configure function:

import logging
import structlog

structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.StackInfoRenderer(),
        structlog.dev.set_exc_info,
        structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M:%S", utc=False),
        structlog.processors.JSONRenderer(),
    ],
    wrapper_class=structlog.make_filtering_bound_logger(logging.NOTSET),
    context_class=dict,
    logger_factory=structlog.PrintLoggerFactory(),
    cache_logger_on_first_use=False,
)

These values are mostly default configuration, with a slight change: I’ve modified the last element in processors, so the log entry is formatted as JSON.

This configuration is all we need to get started; we can print some logs like this:

logger = structlog.get_logger()

logger.info("This is some message", key="value", number=42)
logger.info("This is some message", key="anoter", number=43)

You’ll see the following in your terminal:

{"key": "value", "number": 42, "event": "This is some message", "level": "info", "timestamp": "2024-05-29 16:04:13"}
{"key": "another", "number": 43, "event": "This is some message", "level": "info", "timestamp": "2024-05-29 16:04:13"}

As you can see, we’re rendering each log entry as a JSON object (one per line). This will allow us to easily query our logs to understand our app’s usage.

Tracing requests

So far, we’ve only shown how to render log entries in JSON format (which you can also do with the logging module in the standard library), but we haven’t leveraged any structlog features.

One of the most valuable features of structlog is that it allows us to keep thread-local context, which helps trace the lifetime of a web request. For example, let’s say we have an LLM-powered Flask API that looks like this:

from flask import Flask, request

app = Flask(__name__)

@app.route("/")
def root():
    input_user = request.args.get("input_user")
    input_clean = clean_input(user_input)
    output_llm = run_llm(input_clean)
    output_final = clean_output(output_llm)
    return {"output": output_final}

The API reads user data, cleans it, passes it through the LLM, performs final processing, and returns the result to the user. To have complete visibility of the request, we might want to generate a unique ID so we can follow the all the logs generated by a user request. We might be tempted to pass our request_id to every step to do so:

from flask import Flask, request

app = Flask(__name__)

@app.route("/")
def root():
    request_id = generate_request_id()
    input_user = request.args.get("input_user")
    input_clean = clean_input(user_input, request_id)
    output_llm = run_llm(input_clean, request_id)
    output_final = clean_output(output_llm, request_id)
    return {"output": output_final}

structlog offers a much simpler solution since it allows us to bind thread-local context. Once set, the context will persist:

from flask import Flask, request

app = Flask(__name__)
logger = structlog.get_logger()

@app.route("/")
def root():
    request_id = generate_request_id()
    # bind request_id to he logging context
    structlog.contextvars.bind_contextvars(request_id=request_id)

    input_user = request.args.get("input_user")

    # no need to pass request_id since it's already in the logging context!
    input_clean = clean_input(user_input)
    output_llm = run_llm(input_clean)
    output_final = clean_output(output_llm)
    return {"output": output_final}

Tracing Flask requests

Now that we’ve explained important structlog concepts (configuration and context), we’re ready to show a complete example. Our sample application prints fun facts based on a user query:

# app.py
from uuid import uuid4
import logging

import structlog
from openai import OpenAI
from flask import Flask, request

app = Flask(__name__)


# system prompt
prompt_sys = """
You're a professional historian, your job is to share interesting facts about a topic.
Yhe user will provide a topic and you will generate an interesting fact about it.
"""


# structlog config
structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.StackInfoRenderer(),
        structlog.dev.set_exc_info,
        structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M:%S", utc=False),
        structlog.processors.JSONRenderer(),
    ],
    wrapper_class=structlog.make_filtering_bound_logger(logging.NOTSET),
    context_class=dict,
    logger_factory=structlog.WriteLoggerFactory(open("app.log", "at")),
    cache_logger_on_first_use=False,
)


logger = structlog.get_logger()


def generate_fun_fact(model, prompt_sys, prompt_user):
    client = OpenAI()

    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": prompt_sys,
            },
            {
                "role": "user",
                "content": prompt_user,
            },
        ],
    )

    logger.info("openai-completion", sys_prompt=prompt_sys, user_prompt=prompt_user)

    thread = completion.choices[0].message.content
    return thread

@app.before_request
def before_req():
    # we moved the context binding logic to @app.before_request so it's done before
    # every request
    structlog.contextvars.bind_contextvars(request_id=str(uuid4())[:8])


@app.route("/")
def fun_fact():
    topic = request.args.get("topic")
    model = "gpt-4"
    joke = generate_fun_fact(model, prompt_sys, topic)
    logger.info("fun-fact-success", model=model, topic=topic, joke=joke)
    return {"joke": joke}


if __name__ == "__main__":
    app.run()

Note that we’ve made a slight change in the structlog.configure call: we changed the logger_factory argument, so the logs are stored in a file instead of displayed to the terminal.

Logging errors

Our current application has limited observability: if errors happen, they won’t be recorded. Let’s update our configuration so we log all errors. Flask provides an errorhandler that allows us to run code when an exception happens; we can use this to log all HTTP errors:

import json
from werkzeug.exceptions import HTTPException

@app.errorhandler(HTTPException)
def handle_exception(e):
    response = e.get_response()
    response.data = json.dumps(
        {
            "code": e.code,
            "name": e.name,
            "description": e.description,
        }
    )
    response.content_type = "application/json"
    logger.error(
        "http-exception",
        exc_code=e.code,
        exc_name=e.name,
        exc_description=e.description,
    )
    return response

In production LLM applications, it’s often the case that we’ll raise an error if the user input doesn’t meet specific criteria, let’s simulate such a scenario by raising an exception when a user wants the LLM to generate a fun fact about raccoons by modifying our generate_fun_fact function:

from werkzeug.exceptions import BadRequest

def generate_fun_fact(model, prompt_sys, prompt_user):
    # do not generate raccoon fun facts!
    if "raccoon" in prompt_user:
        raise BadRequest("this application cannot generate fun facts about raccoons")

    client = OpenAI()

    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": prompt_sys,
            },
            {
                "role": "user",
                "content": prompt_user,
            },
        ],
    )

    logger.info("openai-completion", sys_prompt=prompt_sys, user_prompt=prompt_user)

    thread = completion.choices[0].message.content
    return thread

Our app is ready, let’s run it:

pip install flask openai structlog
python app.py

Generating logs

Let’s generate some sample logs so we can analyze them:

# requests.py
import requests

topics = ["cats", "dogs", "pandas", "raccoon", "procyon lotor"]

for topic in topics:
    response = requests.get(f"http://127.0.0.1:5000?topic={topic}")
    response_parsed = response.json()
    print(response_parsed)


# generate some 404 logs
response = requests.get(f"http://127.0.0.1:5000/missing")
print(response.json())
response = requests.get(f"http://127.0.0.1:5000/unknown")
print(response.json())

Perform the requests:

pip install requests
python request.py

Analyzing requests with DuckDB

DuckDB is an embedded analytical database; think of it as the “SQLite for analytics.” It allows us to run SQL on files such as CSV and JSON.

Let’s use it along with JupySQL to run some queries on our log file. First, let’s install the required packages:

pip install duckdb duckdb-engine jupysql

Now, let’s initialize JupySQL and start a DuckDB connection (you can run this in a Jupyter notebook or an IPython console):

%load_ext sql
%sql duckdb://

How many requests did we get?

%%sql
SELECT COUNT(DISTINCT(request_id))
FROM read_json("app.log")
+----------------------------+
| count(DISTINCT request_id) |
+----------------------------+
|             7              |
+----------------------------+

How many requests failed?

%%sql
SELECT COUNT(DISTINCT(request_id))
FROM read_json("app.log")
WHERE exc_code IS NOT NULL
+----------------------------+
| count(DISTINCT request_id) |
+----------------------------+
|             3              |
+----------------------------+

Did we generate any fun facts about raccoons?

%%sql
SELECT request_id, fun_fact[:80]
FROM read_json("app.log")
WHERE fun_fact LIKE '%raccoon%'
+------------+----------------------------------------------------------------------------------+
| request_id |                                  fun_fact[:80]                                   |
+------------+----------------------------------------------------------------------------------+
|  e502cf22  | The Procyon lotor, otherwise known as the common raccoon, is a fascinating creat |
+------------+----------------------------------------------------------------------------------+

Interesting! Looks like our user broke our security mechanism, let’s analyze the request:

%%sql
SELECT user_prompt
FROM read_json("app.log")
WHERE request_id = 'e502cf22'
AND event = 'openai-completion'
+---------------+
|  user_prompt  |
+---------------+
| procyon lotor |
+---------------+

Ok, we found the issue! The user bypassed our security by entering the raccoon’s scientific name. We can patch it so this doesn’t happpen again!

Conclusions

structlog allows you to improve the observability of your LLM applications quickly. You can analyze your logs to better understand usage by storing them as JSON objects. In this example, we used DuckDB to analyze the logs, but you can use other tools, such as ClickHouse, or CloudWatch Insights (if you’re on AWS).

There is no need to install fancy LLM monitoring frameworks; log as JSON objects and that’s it!

Seamless deployment for data scientists and developers. Ploomber handles infrastructure so you focus on building. Secure and scalable—from personal projects to enterprise apps. Support for Streamlit, Dash, Docker, and AI-powered applications. Because life's too short for deployment headaches.

Deploy LLM apps with Ploomber

Recent Articles