Skip to contents

A R6 object that manage the export of metadata and parquet file of the transformed data defined by the recipe for data transformation. Accept a special role "pd.index" from the recipes::update_role() as an index for a pandas DataFrame. If no outcome is provided, then the data.frame is considered a "test" data.frame.

Details

This object works by:

  • inherits from the BaseEx class.

  • accept a recipe and generate a prepped object with recipes::prep().

  • use the prepped object to get the name of the pd.index, target, and predictors.

  • calculate the hash of the prepped recipe (not the data frame) and check the hash of the prepped recipe in the database.

  • if the hash is found in the database:

    • assign the pin_name associated with the hash of the data.frame to self$pin_name.

    • check if the file exists with the self$pin_name.

    • if not exists, transform the prepped object with recipes::juice() and assign to self$df.

    • if the file exists, then assign NULL to self$df attribute to stop self$export_parquet() from executing.

  • if the hash is not found in the database:

    • transform the prepped object with recipes::juice() and assign to self$df.

    • generate a new pin_name to write the transformed data.

    • generate a SQL query to update the database with the new hash and new pin_name

  • update the database with the generated SQL query with self$export_db()

  • write the parquet with self$export_parquet()

Super class

rpwf::BaseEx -> TrainDf

Public fields

prepped

(recipes::prep())
holds the prepped object.

term_info

(tibble::tibble())
the self$prepped object has the attribute term_info that has information of transformed variable before actually transforming the data.

idx_col

(character())
Having a pre-defined index in R makes working with pandas.DataFrame less error prone. Defined by the provided recipe.

target

(character())
Name of the target variable. If missing, then a message is returned to say that a test df is assumed to be generated. Defined by the provided recipe.

predictors

(character())
List of names of the predictors. Stored as JSON string to be parsed in python into a python list.

Methods

Inherited methods


Method new()

Create a new instance of the TrainDf class. Accept a special role "pd.index" from the recipes::update_role() as an index for a pandas DataFrame. If no outcome is provided, then the data.frame is considered a "test" data.frame. See ?rprw::BaseEx for details about the attributes and methods.

Usage

TrainDf$new(recipe, db_con, seed = sample(1:1e+05, size = 1))

Arguments

recipe

(recipes::recipe())
provided recipe that defines how the data is transformed.

db_con

(DbCon)
a DbCon object, generated by rpwf_connect_db().

seed

(numeric())
Random seed to control recipe at the prep level


Method set_attrs()

Refresh the attributes using the current hash. Needed to be run because it updates the attributes with information from the Db

Usage

TrainDf$set_attrs()


Method set_idx_col()

Set the index column as defined by the recipe.

Usage

TrainDf$set_idx_col()


Method set_target_col()

Set the target column as defined by the recipe. Assume to be test data if the target is not found.

Usage

TrainDf$set_target_col()


Method set_predictors()

Store the list of predictors defined by the recipe as a json string to be parsed in python.

Usage

TrainDf$set_predictors()


Method clone()

The objects of this class are cloneable with this method.

Usage

TrainDf$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.