Internal R6 Object that Process the Data Transformation
TrainDf.Rd
A R6 object that manage the export of metadata and parquet file of the
transformed data defined by the recipe for data transformation. Accept a
special role "pd.index"
from the recipes::update_role()
as an index for
a pandas DataFrame. If no outcome is provided, then the data.frame
is
considered a "test" data.frame.
Details
This object works by:
inherits from the BaseEx class.
accept a recipe and generate a prepped object with
recipes::prep()
.use the prepped object to get the name of the pd.index, target, and predictors.
calculate the hash of the prepped recipe (not the data frame) and check the hash of the prepped recipe in the database.
if the hash is found in the database:
assign the pin_name associated with the hash of the data.frame to
self$pin_name
.check if the file exists with the
self$pin_name
.if not exists, transform the prepped object with
recipes::juice()
and assign toself$df
.if the file exists, then assign
NULL
toself$df
attribute to stopself$export_parquet()
from executing.
if the hash is not found in the database:
transform the prepped object with
recipes::juice()
and assign toself$df
.generate a new pin_name to write the transformed data.
generate a SQL query to update the database with the new hash and new pin_name
update the database with the generated SQL query with
self$export_db()
write the parquet with
self$export_parquet()
Super class
rpwf::BaseEx
-> TrainDf
Public fields
prepped
(
recipes::prep()
)
holds the prepped object.term_info
(
tibble::tibble()
)
theself$prepped
object has the attributeterm_info
that has information of transformed variable before actually transforming the data.idx_col
(
character()
)
Having a pre-defined index in R makes working withpandas.DataFrame
less error prone. Defined by the provided recipe.target
(
character()
)
Name of the target variable. If missing, then a message is returned to say that a test df is assumed to be generated. Defined by the provided recipe.predictors
(
character()
)
List of names of the predictors. Stored as JSON string to be parsed in python into a python list.
Methods
Method new()
Create a new instance of the TrainDf class. Accept a
special role "pd.index"
from the recipes::update_role()
as an index for
a pandas DataFrame. If no outcome is provided, then the data.frame
is
considered a "test" data.frame
. See ?rprw::BaseEx
for
details about the attributes and methods.
Arguments
recipe
(
recipes::recipe()
)
provided recipe that defines how the data is transformed.db_con
(
DbCon
)
aDbCon
object, generated byrpwf_connect_db()
.seed
(
numeric()
)
Random seed to control recipe at the prep level
Method set_attrs()
Refresh the attributes using the current hash. Needed to be run because it updates the attributes with information from the Db
Method set_target_col()
Set the target column as defined by the recipe. Assume to be test data if the target is not found.
Method set_predictors()
Store the list of predictors defined by the recipe as a json string to be parsed in python.