Combining R and Python using Reticulate
By Vamshidhar Gangu, Research Computing, NUS Information Technology
While starting a data science project, one of the important decisions to make is choosing what programming language or libraries to use? And the two programming languages that might immediately come to mind are R and Python. Tough both of them are excellent tools in their own right, they are often conceived as rivals instead of options.
Using best of both worlds
Most of the people in data science community are committed to only one programming language but wished to have some of the capabilities of other. For instance, Python is more widely used in data processing with objected-oriented capabilities along with incredible community support but unlike R, it doesn’t have specialised packages for statistical computing. Similarly, R consists of packages for almost any statistical application one can think of and equipped with excellent visualisation libraries like ggplot2 but can be a memory glutton and slow when dealing with large datasets. With reticulate, we can now combine both R and Python and utilise the strengths of both worlds.
Reticulate
Using reticulate, one can easily interoperate between R and Python. It works by embedding a Python session within a R session and thus provide seamless interface between the two. Once installed from CRAN, one can choose python interpreter of your choice. In addition, reticulate provides functionalities to choose existing virtualenv, conda and miniconda environments. There exists more than one way to call python within your R project
Python in R Markdown
Using reticulate, one can use both python and R chunks within a same notebook, with full access to each other’s objects. Built in conversions for many Python object types is provided, including NumPy arrays and Pandas data frames. A kmeans clustering example is demonstrated below using sklearn and ggplot2.
Importing Python Modules
One can import python modules using import() function. Here we import os module and calls the listdir() function
Source Python scripts
Another option for integration is sourcing python scripts. Once can source any python script just as you would source a R script using source_python() function.
Python REPL
One of the notable benefits of reticulate is that it allows the use of both Python and R REPL within a single console, which is particularly useful during the EDA (Exploratory Data Analysis) phase. To run python interactively, you can call the repl_python() function which provides a Python REPL method within your R session. All objects created within Python REPL can be accessed from R using py object exported from reticulate. Here we can see that reading input, data filtering is done in pandas using Python REPL and the visualisation is done using ggplot2
Resources
You can refer to the below resources for more information about reticulate package
● RStudio documentation on reticulate is pretty comprehensive with lot of examples
● https://www.mango-solutions.com/snakes-in-a-package-combining-python-and-r-with-reticulate/
● https://medium.com/data-newday/a-polyglots-dream-reticulated-python-r-2f8d0f542847