preloader
blog-post

Build modular and production-ready pipelines with Ploomber and RStudio

author image

Today we’re happy to announce Ploomber’s integration with RStudio. You can now use Ploomber to develop modular and production-ready R pipelines interactively.

diagram

Adding support for R Markdown

RStudio’s most popular file format is R Markdown .Rmd, which allows users to mix R code and Markdown in a text file:

Some text

```{r}
# some code
df = read.csv(upstream$clean$data)
head(df)
```

R Markdown and RStudio are highly popular in the data science community, and we want to bring Ploomber to them. All the features available to Python users work for R: incremental builds, parallelization, even running in the cloud!

Furthermore, people may feel more comfortable with R and others with Python; with Ploomber, you can use both R and Python in the same pipeline. For example, let’s say you’re working with some data and need to apply a statistical method that has only been implemented in R; you can easily integrate a Python script with an R Markdown file:

# load data with Python
- source: tasks/load.py
  product:
    nb: out/load.html
    # output data
    data: out/raw.parquet

# process data with R (use out/raw.parquet as input)
- source: tasks/some-statistical-method.Rmd
  product:
    nb: out/report.html
    data: out/results.parquet

However, bear in mind that this makes the project setup more complex, so if possible, consider using a single language.

Note that the .Rmd file in the example above generates an output report out/report.html: Ploomber converts .Rmd files to Jupyter notebooks (.ipynb) at runtime and then executes it, so every pipeline execution generates an output report. So you may change the extension (like we did in the example) to .html if you want Ploomber to convert the output report. Furthermore, you can use plain .R if you prefer, and they’ll work in the same way, and if you’re using another editor that supports .R or .Rmd works as well.

Setting up

An R installation and the IRKernel package must be installed and configured for Ploomber to execute pipelines. Check out our documentation to learn more.

Try it out!

rstudio

Try out an example pipeline by executing the following commands in the shell terminal:

# install ploomber
pip install ploomber --upgrade

# get R example
ploomber examples -n templates/spec-api-r -o example

# execute
ploomber build

# inject input paths to each task
ploomber nb --inject

Now open plot.Rmd and start running things interactively, then go back to the terminal and run:

ploomber build

To rerun your pipeline.

Running in the cloud

Do you need more computing power? We got you covered; you can export your pipelines and execute them in Kubernetes, Airflow, AWS Batch, and SLURM without code changes. Check out our documentation for details.

Closing remarks

Many of the teams using Ploomber use both R and Python to develop their pipelines, and they asked us for recommendations to enhance collaboration. Let us help you ship data products faster. Join our community, and we’ll be happy to answer all your questions.


Found an error? Click here to let us know.

comments powered by Disqus

Recent Articles

Try Ploomber Cloud Now

Get Started
*