Internal R6 Object that Process the Data Transformation
TrainDf.RdA R6 object that manage the export of metadata and parquet file of the
transformed data defined by the recipe for data transformation. Accept a
special role "pd.index" from the recipes::update_role() as an index for
a pandas DataFrame. If no outcome is provided, then the data.frame is
considered a "test" data.frame.
Details
This object works by:
- inherits from the BaseEx class. 
- accept a recipe and generate a prepped object with - recipes::prep().
- use the prepped object to get the name of the pd.index, target, and predictors. 
- calculate the hash of the prepped recipe (not the data frame) and check the hash of the prepped recipe in the database. 
- if the hash is found in the database: - assign the pin_name associated with the hash of the data.frame to - self$pin_name.
- check if the file exists with the - self$pin_name.
- if not exists, transform the prepped object with - recipes::juice()and assign to- self$df.
- if the file exists, then assign - NULLto- self$dfattribute to stop- self$export_parquet()from executing.
 
- if the hash is not found in the database: - transform the prepped object with - recipes::juice()and assign to- self$df.
- generate a new pin_name to write the transformed data. 
- generate a SQL query to update the database with the new hash and new pin_name 
 
- update the database with the generated SQL query with - self$export_db()
- write the parquet with - self$export_parquet()
Super class
rpwf::BaseEx -> TrainDf
Public fields
- prepped
- ( - recipes::prep())
 holds the prepped object.
- term_info
- ( - tibble::tibble())
 the- self$preppedobject has the attribute- term_infothat has information of transformed variable before actually transforming the data.
- idx_col
- ( - character())
 Having a pre-defined index in R makes working with- pandas.DataFrameless error prone. Defined by the provided recipe.
- target
- ( - character())
 Name of the target variable. If missing, then a message is returned to say that a test df is assumed to be generated. Defined by the provided recipe.
- predictors
- ( - character())
 List of names of the predictors. Stored as JSON string to be parsed in python into a python list.
Methods
Method new()
Create a new instance of the TrainDf class. Accept a
special role "pd.index" from the recipes::update_role() as an index for
a pandas DataFrame. If no outcome is provided, then the data.frame is
considered a "test" data.frame. See ?rprw::BaseEx for
details about the attributes and methods.
Arguments
- recipe
- ( - recipes::recipe())
 provided recipe that defines how the data is transformed.
- db_con
- ( - DbCon)
 a- DbConobject, generated by- rpwf_connect_db().
- seed
- ( - numeric())
 Random seed to control recipe at the prep level
Method set_attrs()
Refresh the attributes using the current hash. Needed to be run because it updates the attributes with information from the Db
Method set_target_col()
Set the target column as defined by the recipe. Assume to be test data if the target is not found.
Method set_predictors()
Store the list of predictors defined by the recipe as a json string to be parsed in python.