I have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any ideas on the best way to accomplish this?
See Question&Answers more detail:osI have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any ideas on the best way to accomplish this?
See Question&Answers more detail:osAs an alternative for those who would prefer not having to install R in order to accomplish this task (r2py requires it), there is a new package "pyreadr" which allows reading RData and Rds files directly into python without dependencies.
It is a wrapper around the C library librdata, so it is very fast.
You can install it easily with pip:
pip install pyreadr
As an example you would do:
import pyreadr
result = pyreadr.read_r('/path/to/file.RData') # also works for Rds
# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1
The repo is here: https://github.com/ofajardo/pyreadr
Disclaimer: I am the developer of this package.