Custom Steps

The Yeast library contains a number of different Steps but you might need to define your own operations using CustomStep:

class yeast.steps.CustomStep(to_prepare=None, to_bake=None, to_validate=None, role='all')

Custom Step was designed to extend all the power of Yeast Pipelines and cover all scenarios where the Yeast steps are not adequate. You might need to define your own operations. You could define your custom transformations, business rules or extend to third-party libraries. The usage is quite straightforward and designed to avoid spending too much time on the implementation. It expects between 1 and 3 arguments, all functions and optional:

  • to_validate(step, df)
  • to_prepare(step, df): returns df
  • to_bake(step, df): returns df

Please notice that to_prepare and to_bake must return a DataFrame to continue the pipeline execution in further steps. CustomStep enables you to structure and document your code and business rules in Steps that could be shared across Recipes.

Parameters:

  • to_validate: perform validations on the data. Raise YeastValidationError on a problem.
  • to_prepare: prepare the step before bake, like train or calculate aggregations.
  • to_bake: execute the bake (processing). This is the core method.
  • role: String name of the role to control baking flows on new data. Default: all.

Inline Usage:

recipe = Recipe([
    # Custom Business Rules:
    CustomStep(to_bake=lambda step, df: df['sales'].fillna(0))
])

Custom rules:

def my_bake(step, df):
    # Calculate total sales or anything you need:
    df['total_sales'] = df['sales'] + df['fees']
    return df

recipe = Recipe([
    # Custom Business Rules:
    CustomStep(to_bake=my_bake)
])

Custom Checks and Validations:

def my_validate(step, df):
    if 'sales' not in df.columns:
        raise YeastValidationError('sales column not found')
    if 'fees' not in df.columns:
        raise YeastValidationError('fees colum not found')

recipe = Recipe([
    CustomStep(to_validate=my_validate, to_bake=my_bake)
])

Define the Estimation/Preparation procedure:

def my_preparation(step, df):
    step.mean_sales = df['sales'].mean()

def my_bake(step, df):
    df['sales_deviation'] = df['sales'] - step.mean_sales
    return df

recipe = Recipe([
    CustomStep(to_prepare=my_preparation, to_bake=my_bake)
])

Creating a custom step inheriting from CustomStep:

class MyCustomStep(CustomStep):

    def do_validate(self, df):
        # Some validations that could raise YeastValidationError
        pass

    def do_prepare(self, df):
        # Prepare the step if needed
        return df

    def do_bake(self, df):
        # Logic to process the df
        return df
recipe = Recipe([
    MyCustomStep()
])

Raises:

  • YeastValidationError: if any of the parameters is defined but not callable.