Posts
Introducing ploomber
Today I am announcing the release of ploomber, a library to accelerate Data Science and Machine Learning pipeline experimentation. This post describes the motivation, core features, short and medium-term objectives.
Motivation When I started working on Data Science projects back in 2015, I realized that there were no standard practices for developing pipelines; which caused teams to develop fragile, hardly reproducible software. During one of my first projects, our pipeline consisted of a bunch of shell, SQL and Python scripts loosely glued together; each member would edit a “master” shell script so we could “reproduce” our end result, but since our pipeline would take several hours to run, no one would test that such script actually worked, hence, there was no guarantee that given a clean environment, the pipeline would execute without errors, let alone give the same final output.