# Get Your ML Data Back: Export Your MLFlow Data to SQLite

In this blog post, I’ll show you how you can export your MLFlow data to a SQLite database so you can easily filter, search and aggregate all your Machine Learning experiments with SQL.

Note: You can get this post in notebook format from here.

## The problem with MLFlow

MLFlow is one of the most widely used tools for experiment tracking, yet, its features for comparing experiments are severely limited. One basic operation I do with my Machine Learning experiments is to put plots side by side (e.g., confusion matrices or ROC curves); this simple operation isn’t possible in MLFlow (the issue has been open since April 2020).

Another critical limitation are its filtering and search capabilities. MLFlow defines its own filter DSL which is a limited version of a SQL WHERE clause. Even though MLFlow supports several SQL databases as backend, you have to use this limited DSL which only allows basic filtering but does not support aggregation or any other operation that you can trivially do with some lines of SQL. For me, this is a major drawback since often I want to perform more than a basic filter.

So let’s see how we can get our data out of MLFlow and into a SQLite database so we can easily explore our Machine Learning experiments! Under the hood, this will use the SQLiteTracker from the sklearn-evaluation package.

## Setting up

The code is in a Python package that you can install with pip:

pip install mlflow2sql


## Optional: Generate sample data

If you don’t have some sample MLFlow data, I wrote some utility functions to generate a few experiments and log them. However, you need to install a few extra dependencies:

pip install "mlflow2sql[demo]"


If you’re running this from a notebook, you can run the following cel to install the dependencies:

%pip install "mlflow2sql[demo]" --quiet


Console output (1/1):

Note: you may need to restart the kernel to use updated packages.


You can download some sample MLFlow data from here. Alternatively, you can generate it by running the following in a Python session (this will create the data in the ./mlflow-data directory):

import shutil
from pathlib import Path

from mlflow2sql import generate, export, ml

# clean up the directory, if any
path = Path("mlflow-data")
if path.exists():
shutil.rmtree(path)

# generate sample data - takes a few minutes
with generate.start_mlflow(backend_store_uri="mlflow-data",
default_artifact_root="mlflow-data"):
ml.run_default()


Console output (1/3):

[2022-12-21 14:28:49 -0600] [31007] [INFO] Starting gunicorn 20.1.0
[2022-12-21 14:28:49 -0600] [31007] [INFO] Listening at: http://127.0.0.1:5000 (31007)
[2022-12-21 14:28:49 -0600] [31007] [INFO] Using worker: sync
[2022-12-21 14:28:49 -0600] [31009] [INFO] Booting worker with pid: 31009
[2022-12-21 14:28:49 -0600] [31010] [INFO] Booting worker with pid: 31010
[2022-12-21 14:28:49 -0600] [31011] [INFO] Booting worker with pid: 31011
[2022-12-21 14:28:49 -0600] [31012] [INFO] Booting worker with pid: 31012


Console output (2/3):

Training GradientBoostingClassifier...
Executed 10/30 so far...
Executed 20/30 so far...
Executed 30/30 so far...
Training SVC...
Executed 10/12 so far...
Training RandomForestClassifier...
Executed 10/54 so far...
Executed 20/54 so far...
Executed 30/54 so far...
Executed 40/54 so far...
Executed 50/54 so far...


Console output (3/3):

[2022-12-21 14:32:29 -0600] [31007] [INFO] Handling signal: term
[2022-12-21 14:32:29 -0600] [31010] [INFO] Worker exiting (pid: 31010)
[2022-12-21 14:32:29 -0600] [31011] [INFO] Worker exiting (pid: 31011)
[2022-12-21 14:32:29 -0600] [31009] [INFO] Worker exiting (pid: 31009)
[2022-12-21 14:32:29 -0600] [31012] [INFO] Worker exiting (pid: 31012)
[2022-12-21 14:32:30 -0600] [31007] [INFO] Shutting down: Master


## Getting all MLFlow runs

Once you have some MLFlow data, start a Python session, define the BACKEND_STORE_URI and DEFAULT_ARTIFACT_ROOT variables in the next code snippet.

Note that the two variables correspond to the two arguments with the same name that you pass when starting the MLFlow server:

mlflow server --backend-store-uri BACKEND_STORE_URI --default-artifact-root DEFAULT_ARTIFACT_ROOT


Note: If you didn’t pass those arguments when running mlflow server, the default value is ./mlruns.

from mlflow2sql import export, ml

# these two variables are ./mlruns by default
BACKEND_STORE_URI = "mlflow-data"
DEFAULT_ARTIFACT_ROOT = "mlflow-data"

# note: this was tested with MLFlow versions 1.30.0 and 2.0.1
runs = export.find_runs(backend_store_uri=BACKEND_STORE_URI,
default_artifact_root=DEFAULT_ARTIFACT_ROOT)


We can get the number of experiments we extracted:

len(runs)


Console output (1/1):

96


## Importing into a SQLite database

Let’s now initialize our experiment tracker (documentation available here), and insert our MLFlow experiments:

from sklearn_evaluation import SQLiteTracker

# clean up the database, if any
db = Path("experiments.db")

if db.exists():

tracker = SQLiteTracker('experiments.db')
tracker.insert_many((run.to_dict() for run in runs))


Let’s check the number of extracted experiments:

len(tracker)


Console output (1/1):

96


Great! We got all of them in the database. Let’s start exploring them with SQL!

## Querying experiments with SQL

SQLiteTracker creates a table with all our experiments and it stores the logged data in a JSON object with three keys: metrics, params, and artifacts:

tracker.query("""
SELECT
uuid,
json_extract(parameters, '$.metrics') as metrics, json_extract(parameters, '$.params') as params,
json_extract(parameters, '$.artifacts') as artifacts FROM experiments LIMIT 3 """)  Console output (1/1): metricsparamsartifacts uuid 42dc4d08{"recall":[["1671654737960"],["0.8666260657734...{"max_depth":10,"model_name":"RandomForestClas...{"confusion_matrix_png":"<img src=\"data:image... 9cef8649{"recall":[["1671654632100"],["0.8331303288672...{"probability":true,"model_name":"SVC","C":2.0...{"confusion_matrix_png":"<img src=\"data:image... 053f4f97{"recall":[["1671654699203"],["0.8623629719853...{"max_depth":10,"model_name":"RandomForestClas...{"confusion_matrix_png":"<img src=\"data:image... We can extract the metrics easily. Note that MLFlow stores metrics in the following format: (timestamp 1) (value 1) (timestamp 2) (value 2)  Hence, when parsing it, we create two lists. One with the timestamps and another one with the values: [[ts1, ts2], [val1, val2]]  Let’s extract the metrics values and sort by F1 score, note that .query() returns a pandas.DataFrame by default: tracker.query(""" SELECT uuid, json_extract(parameters, '$.metrics.f1[1][0]') as f1,
json_extract(parameters, '$.metrics.precision[1][0]') as precision, json_extract(parameters, '$.metrics.recall[1][0]') as recall,
json_extract(parameters, '$.params.model_name') as model_name FROM experiments ORDER BY f1 DESC LIMIT 10 """)  Console output (1/1): f1precisionrecallmodel_name uuid 1fb0cd230.80400235432607410.77790432801822330.8319123020706456RandomForestClassifier 07043ccb0.80328831473869640.77551020408163260.833130328867235RandomForestClassifier b9a0c2640.8025700934579440.77104377104377110.8367844092570037RandomForestClassifier 9dfec05b0.80186752261453160.76974789915966390.8367844092570037RandomForestClassifier d14170de0.80139982502187230.76888640179071070.8367844092570037RandomForestClassifier 3da71d720.80058565153733540.77100958826847150.8325213154689404RandomForestClassifier cdccf8580.79943582510578280.7446137677351550.8629719853836785RandomForestClassifier 461ec80d0.79874750925135220.74986638161411010.8544457978075518RandomForestClassifier 9cef86490.79813302217036160.76595744680851060.833130328867235SVC db59ecf80.79788172992056490.7717700626067160.8258221680876979RandomForestClassifier We can also extract the artifacts and display them inline by passing as_frame=False and render_plots=True: results = tracker.query(""" SELECT uuid, json_extract(parameters, '$.metrics.f1[1][0]') as f1,
json_extract(parameters, '$.params.model_name') as model_name, json_extract(parameters, '$.metrics.precision[1][0]') as precision,
json_extract(parameters, '$.metrics.recall[1][0]') as recall, json_extract(parameters, '$.artifacts.confusion_matrix_png') as confusion_matrix,
json_extract(parameters, '$.artifacts.precision_recall_png') as precision_recall FROM experiments ORDER BY f1 DESC LIMIT 2 """, as_frame=False, render_plots=True) results  Console output (1/1): uuidf1model_nameprecisionrecallconfusion_matrixprecision_recall 1fb0cd230.8040023543260741RandomForestClassifier0.77790432801822330.8319123020706456 07043ccb0.8032883147386964RandomForestClassifier0.77551020408163260.833130328867235 ## Displaying image artifacts Looking at the plots is a bit difficult in the table view, but we can zoom in and extract them. Let’s get a tab view from our top two experiments. Here are the confusion matrices: results.get("confusion_matrix")  Console output (1/1): And here’s the precision-recall curve: results.get("precision_recall")  Console output (1/1): ## Filtering and sorting In our generated experiments, we trained some Support Vector Machines, Gradient Boosting and Random Forest. Let’s filter by Random Forest, sort by F1 score and extract their parameters. We can easily do this with SQL! tracker.query(""" SELECT uuid, json_extract(parameters, '$.metrics.f1[1][0]') as f1,
json_extract(parameters, '$.params.model_name') as model_name, json_extract(parameters, '$.metrics.precision[1][0]') as precision,
json_extract(parameters, '$.metrics.recall[1][0]') as recall, json_extract(parameters, '$.params.max_depth') as max_depth,
json_extract(parameters, '$.params.n_estimators') as n_estimators, json_extract(parameters, '$.params.criterion') as criterion
FROM experiments
WHERE model_name = 'RandomForestClassifier'
ORDER BY f1 DESC
LIMIT 3
""", as_frame=False, render_plots=True)


Console output (1/1):

uuidf1model_nameprecisionrecallmax_depthn_estimatorscriterion
1fb0cd230.8040023543260741RandomForestClassifier0.77790432801822330.8319123020706456None50gini
07043ccb0.8032883147386964RandomForestClassifier0.77551020408163260.833130328867235None50log_loss
b9a0c2640.802570093457944RandomForestClassifier0.77104377104377110.8367844092570037None100entropy

## Aggregating and plotting

Finally, let’s do something a bit more sophisticated. Let’s get all the Random Forests, group by number of trees (n_estimators), and take the mean of our metrics. This will allow us to see what’s the effect of increasing the number of trees in the model’s perfomance:

df = tracker.query(
"""
SELECT
json_extract(parameters, '$.params.n_estimators') as n_estimators, AVG(json_extract(parameters, '$.metrics.precision[1][0]')) as precision,
AVG(json_extract(parameters, '$.metrics.recall[1][0]')) as recall, AVG(json_extract(parameters, '$.metrics.f1[1][0]')) as f1
FROM experiments
WHERE json_extract(parameters, '$.params.model_name') = 'RandomForestClassifier' GROUP BY n_estimators """, as_frame=True, ).set_index("n_estimators") df  Console output (1/1): precisionrecallf1 n_estimators 50.7122300.7930710.749882 100.7316490.7921910.759146 150.7271180.8262280.772904 250.7340890.8413180.783521 500.7385590.8464610.788263 1000.7358300.8493030.788023 import matplotlib.pyplot as plt fig, ax = plt.subplots() for metric in ["f1", "recall", "precision"]: df[metric].plot(ax=ax, marker="o", style="--") ax.legend() ax.grid() ax.set_title("Effect of increasing n_estimators") ax.set_ylabel("Metric") _ = ax.set_xticks(df.index)  Console output (1/1): We can see that the increase in performance diminishes rapidly, after 50 estimators, there isn’t much performance gain. This analysis allows us to find these type of insights and focus on other parameters for improving performance. Finally, let’s group by model type and see how our models are doing on average: df = tracker.query( """ SELECT json_extract(parameters, '$.params.model_name') as model_name,
AVG(json_extract(parameters, '$.metrics.precision[1][0]')) as precision, AVG(json_extract(parameters, '$.metrics.recall[1][0]')) as recall,
AVG(json_extract(parameters, '\$.metrics.f1[1][0]')) as f1
FROM experiments
GROUP BY model_name
""",
as_frame=True,
).set_index("model_name")

ax = plt.gca()
df.plot(kind="barh", ax=ax)
ax.grid()


Console output (1/1):

We see that Random Forest and Gradient Boosting have comparable performance, on average (although take into account that we ran more Random Forest experiments), and SVC has lower performance.

## Closing remarks

In this blog post, we showed how to export MLFlow data to a SQLite database, which allows us to use SQL to explore, aggregate and analyze our Machine Learning experiments, a lot better than MLFlow’s limited querying capabilities!

There are a few limitations to this first implementation, it’ll only work if you’re using the local filesystem for storing your MLFlow experiments and artifacts. And only .txt, .json and .png artifacts are supported. If you have any suggestions, feel free to open an issue on GitHub, or send us a message on Slack!

Found an error? Click here to let us know.