With the meteoric rise of LLM-powered apps, new tools promising to solve the observability problem for LLMs are emerging. The truth is, LLM apps are no different than regular applications, and you you don’t need specialized tools to monitor them.
Many of our customers deploy LLM applications, and we’ve helped many of them implement their observability stack. In 95% of use cases, the solution that I’ll describe in this post works perfectly. Let’s dive right in!
Introducing structlog
The key component in our stack is structlog
, a powerful Python logging library. Among many other things, it allows you to generate JSON logs and keep thread-local context during the lifetime of a web request. These two features will enable us to build a robust monitoring system for an LLM application. Let’s see how!
First, install the library:
pip install structlog
structlog
basics
structlog
is a highly customizable logging library, so we’ll limit ourselves to explain the basics. To configure structlog
, call the configure
function:
import logging
import structlog
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.processors.add_log_level,
structlog.processors.StackInfoRenderer(),
structlog.dev.set_exc_info,
structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M:%S", utc=False),
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.make_filtering_bound_logger(logging.NOTSET),
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
cache_logger_on_first_use=False,
)
These values are mostly default configuration, with a slight change: I’ve modified the last element in processors
, so the log entry is formatted as JSON.
This configuration is all we need to get started; we can print some logs like this:
logger = structlog.get_logger()
logger.info("This is some message", key="value", number=42)
logger.info("This is some message", key="anoter", number=43)
You’ll see the following in your terminal:
{"key": "value", "number": 42, "event": "This is some message", "level": "info", "timestamp": "2024-05-29 16:04:13"}
{"key": "another", "number": 43, "event": "This is some message", "level": "info", "timestamp": "2024-05-29 16:04:13"}
As you can see, we’re rendering each log entry as a JSON object (one per line). This will allow us to easily query our logs to understand our app’s usage.
Tracing requests
So far, we’ve only shown how to render log entries in JSON format (which you can also do with the logging
module in the standard library), but we haven’t leveraged any structlog
features.
One of the most valuable features of structlog
is that it allows us to keep thread-local context, which helps trace the lifetime of a web request. For example, let’s say we have an LLM-powered Flask API that looks like this:
from flask import Flask, request
app = Flask(__name__)
@app.route("/")
def root():
input_user = request.args.get("input_user")
input_clean = clean_input(user_input)
output_llm = run_llm(input_clean)
output_final = clean_output(output_llm)
return {"output": output_final}
The API reads user data, cleans it, passes it through the LLM, performs final processing, and returns the result to the user. To have complete visibility of the request, we might want to generate a unique ID so we can follow the all the logs generated by a user request. We might be tempted to pass our request_id
to every step to do so:
from flask import Flask, request
app = Flask(__name__)
@app.route("/")
def root():
request_id = generate_request_id()
input_user = request.args.get("input_user")
input_clean = clean_input(user_input, request_id)
output_llm = run_llm(input_clean, request_id)
output_final = clean_output(output_llm, request_id)
return {"output": output_final}
structlog
offers a much simpler solution since it allows us to bind thread-local context. Once set, the context will persist:
from flask import Flask, request
app = Flask(__name__)
logger = structlog.get_logger()
@app.route("/")
def root():
request_id = generate_request_id()
# bind request_id to he logging context
structlog.contextvars.bind_contextvars(request_id=request_id)
input_user = request.args.get("input_user")
# no need to pass request_id since it's already in the logging context!
input_clean = clean_input(user_input)
output_llm = run_llm(input_clean)
output_final = clean_output(output_llm)
return {"output": output_final}
Tracing Flask requests
Now that we’ve explained important structlog
concepts (configuration and
context), we’re ready to show a complete example. Our sample application prints fun facts
based on a user query:
# app.py
from uuid import uuid4
import logging
import structlog
from openai import OpenAI
from flask import Flask, request
app = Flask(__name__)
# system prompt
prompt_sys = """
You're a professional historian, your job is to share interesting facts about a topic.
Yhe user will provide a topic and you will generate an interesting fact about it.
"""
# structlog config
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.processors.add_log_level,
structlog.processors.StackInfoRenderer(),
structlog.dev.set_exc_info,
structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M:%S", utc=False),
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.make_filtering_bound_logger(logging.NOTSET),
context_class=dict,
logger_factory=structlog.WriteLoggerFactory(open("app.log", "at")),
cache_logger_on_first_use=False,
)
logger = structlog.get_logger()
def generate_fun_fact(model, prompt_sys, prompt_user):
client = OpenAI()
completion = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": prompt_sys,
},
{
"role": "user",
"content": prompt_user,
},
],
)
logger.info("openai-completion", sys_prompt=prompt_sys, user_prompt=prompt_user)
thread = completion.choices[0].message.content
return thread
@app.before_request
def before_req():
# we moved the context binding logic to @app.before_request so it's done before
# every request
structlog.contextvars.bind_contextvars(request_id=str(uuid4())[:8])
@app.route("/")
def fun_fact():
topic = request.args.get("topic")
model = "gpt-4"
joke = generate_fun_fact(model, prompt_sys, topic)
logger.info("fun-fact-success", model=model, topic=topic, joke=joke)
return {"joke": joke}
if __name__ == "__main__":
app.run()
Note that we’ve made a slight change in the structlog.configure
call: we changed
the logger_factory
argument, so the logs are stored in a file instead of displayed
to the terminal.
Logging errors
Our current application has limited observability: if errors happen, they won’t be recorded.
Let’s update our configuration so we log all errors. Flask provides
an errorhandler
that allows us to run code when an exception happens; we can use
this to log all HTTP errors:
import json
from werkzeug.exceptions import HTTPException
@app.errorhandler(HTTPException)
def handle_exception(e):
response = e.get_response()
response.data = json.dumps(
{
"code": e.code,
"name": e.name,
"description": e.description,
}
)
response.content_type = "application/json"
logger.error(
"http-exception",
exc_code=e.code,
exc_name=e.name,
exc_description=e.description,
)
return response
In production LLM applications, it’s often the case that we’ll raise an error if
the user input doesn’t meet specific criteria, let’s simulate such a scenario by
raising an exception when a user wants the LLM to generate a fun fact about
raccoons by modifying our generate_fun_fact
function:
from werkzeug.exceptions import BadRequest
def generate_fun_fact(model, prompt_sys, prompt_user):
# do not generate raccoon fun facts!
if "raccoon" in prompt_user:
raise BadRequest("this application cannot generate fun facts about raccoons")
client = OpenAI()
completion = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": prompt_sys,
},
{
"role": "user",
"content": prompt_user,
},
],
)
logger.info("openai-completion", sys_prompt=prompt_sys, user_prompt=prompt_user)
thread = completion.choices[0].message.content
return thread
Our app is ready, let’s run it:
pip install flask openai structlog
python app.py
Generating logs
Let’s generate some sample logs so we can analyze them:
# requests.py
import requests
topics = ["cats", "dogs", "pandas", "raccoon", "procyon lotor"]
for topic in topics:
response = requests.get(f"http://127.0.0.1:5000?topic={topic}")
response_parsed = response.json()
print(response_parsed)
# generate some 404 logs
response = requests.get(f"http://127.0.0.1:5000/missing")
print(response.json())
response = requests.get(f"http://127.0.0.1:5000/unknown")
print(response.json())
Perform the requests:
pip install requests
python request.py
Analyzing requests with DuckDB
DuckDB is an embedded analytical database; think of it as the “SQLite for analytics.” It allows us to run SQL on files such as CSV and JSON.
Let’s use it along with JupySQL to run some queries on our log file. First, let’s install the required packages:
pip install duckdb duckdb-engine jupysql
Now, let’s initialize JupySQL and start a DuckDB connection (you can run this in a Jupyter notebook or an IPython console):
%load_ext sql
%sql duckdb://
How many requests did we get?
%%sql
SELECT COUNT(DISTINCT(request_id))
FROM read_json("app.log")
+----------------------------+
| count(DISTINCT request_id) |
+----------------------------+
| 7 |
+----------------------------+
How many requests failed?
%%sql
SELECT COUNT(DISTINCT(request_id))
FROM read_json("app.log")
WHERE exc_code IS NOT NULL
+----------------------------+
| count(DISTINCT request_id) |
+----------------------------+
| 3 |
+----------------------------+
Did we generate any fun facts about raccoons?
%%sql
SELECT request_id, fun_fact[:80]
FROM read_json("app.log")
WHERE fun_fact LIKE '%raccoon%'
+------------+----------------------------------------------------------------------------------+
| request_id | fun_fact[:80] |
+------------+----------------------------------------------------------------------------------+
| e502cf22 | The Procyon lotor, otherwise known as the common raccoon, is a fascinating creat |
+------------+----------------------------------------------------------------------------------+
Interesting! Looks like our user broke our security mechanism, let’s analyze the request:
%%sql
SELECT user_prompt
FROM read_json("app.log")
WHERE request_id = 'e502cf22'
AND event = 'openai-completion'
+---------------+
| user_prompt |
+---------------+
| procyon lotor |
+---------------+
Ok, we found the issue! The user bypassed our security by entering the raccoon’s scientific name. We can patch it so this doesn’t happpen again!
Conclusions
structlog
allows you to improve the observability of your LLM applications quickly.
You can analyze your logs to better understand usage by storing them as JSON objects.
In this example, we used DuckDB to analyze the logs, but you can use other tools, such
as ClickHouse, or
CloudWatch Insights
(if you’re on AWS).
There is no need to install fancy LLM monitoring frameworks; log as JSON objects and that’s it!