Python Codes
python_codes.Rmd
Introduction
After generating the data in R and exporting the necessary information, the following codes can be used to get the data in python.
Generate data in R
If you want to just transform your data in R and export it to python, the pins or reticulate package works just fine. But here is how you can do it with rpwf, which wraps around pins.
library(rpwf)
library(pins)
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
tmp_dir <- tempdir() # Temp folder
- Create a
pins::board_<your board type>
and pass it torpwf_connect_db()
.
- Create a database called
"db.SQLite"
withrpwf_connect_db()
.
board <- board_temp()
db_con <- rpwf_connect_db(paste(tmp_dir, "db.SQLite", sep = "/"), board)
- Create two recipes and pass them to
rpwf_data_set()
r <- recipe(mpg ~ ., data = mtcars) |>
step_normalize(all_numeric_predictors()) |>
rpwf_tag_recipe("r")
r1 <- r |>
step_YeoJohnson(all_numeric_predictors()) |>
rpwf_tag_recipe("r1")
d <- rpwf_data_set(r, r1, db_con = db_con)
#> No pandas idx added. Use update_roles() with 'pd.index' for one
#> No pandas idx added. Use update_roles() with 'pd.index' for one
- Write the transformed data, export the metadata to the database, and write the board YAML file.
rpwf_write_df(d)
#> Creating new version '20221219T051130Z-06963'
#> Writing to pin 'df.b1b6afb83db8b5cd2753f5b454bf7774.parquet'
#> Creating new version '20221219T051130Z-4a858'
#> Writing to pin 'df.a8472f0060dc2de7b5f2701fa91f07e0.parquet'
rpwf_export_db(d, db_con)
#> Exporting workflows to db...
#> [1] 2
rpwf_write_board_yaml(board, paste(tmp_dir, "board.yml", sep = "/"))
Get the data in python
- Import the modules
- Create a board from the written yml file and a database object
from rpwf import database, rpwf
from pathlib import Path
= # <replace with tmp_dir>
db_path = # <replace with paste(tmp_dir, "board.yml", sep = "/")>
board_yml
= database.Base(db_path)
db_obj = database.Board(board_yml) board_obj
- See all the exported wflow as follows
db_obj.all_wflow()
# wflow_id model_tag recipe_tag result_pin_name model_pin_name
# 1 None r None None
# 2 None r1 None None
- Pick a
wflow_id
, and create arpwf.Wflow
object associated with thatwflow_id
= 2
wflow_id = rpwf.Wflow(db_obj, board_obj, wflow_id) wflow_obj
- Finally create a
rpwf.TrainDf
object and use theget_df_X
andget_df_y
methods to get the trainpandas.DataFrame
and responsepandas.Series
= rpwf.TrainDf(db_obj, board_obj, wflow_obj)
df_obj = df_obj.get_df_X(), df_obj.get_df_y() X, y