Custom Steps¶
The Yeast library contains a number of different Steps but you might need to define your own
operations using CustomStep
:
yeast.steps.CustomStep
(to_prepare=None, to_bake=None, to_validate=None, role='all')Custom Step was designed to extend all the power of Yeast Pipelines and cover all scenarios where the Yeast steps are not adequate. You might need to define your own operations. You could define your custom transformations, business rules or extend to third-party libraries. The usage is quite straightforward and designed to avoid spending too much time on the implementation. It expects between 1 and 3 arguments, all functions and optional:
to_validate(step, df)
to_prepare(step, df)
: returnsdf
to_bake(step, df)
: returnsdf
Please notice that to_prepare
and to_bake
must return a DataFrame
to continue the pipeline
execution in further steps. CustomStep
enables you to structure and document your code and
business rules in Steps that could be shared across Recipes.
Parameters:
to_validate
: perform validations on the data. Raise YeastValidationError on a problem.to_prepare
: prepare the step before bake, like train or calculate aggregations.to_bake
: execute the bake (processing). This is the core method.role
: String name of the role to control baking flows on new data. Default:all
.
Inline Usage:
recipe = Recipe([
# Custom Business Rules:
CustomStep(to_bake=lambda step, df: df['sales'].fillna(0))
])
Custom rules:
def my_bake(step, df):
# Calculate total sales or anything you need:
df['total_sales'] = df['sales'] + df['fees']
return df
recipe = Recipe([
# Custom Business Rules:
CustomStep(to_bake=my_bake)
])
Custom Checks and Validations:
def my_validate(step, df):
if 'sales' not in df.columns:
raise YeastValidationError('sales column not found')
if 'fees' not in df.columns:
raise YeastValidationError('fees colum not found')
recipe = Recipe([
CustomStep(to_validate=my_validate, to_bake=my_bake)
])
Define the Estimation/Preparation procedure:
def my_preparation(step, df):
step.mean_sales = df['sales'].mean()
def my_bake(step, df):
df['sales_deviation'] = df['sales'] - step.mean_sales
return df
recipe = Recipe([
CustomStep(to_prepare=my_preparation, to_bake=my_bake)
])
Creating a custom step inheriting from CustomStep:
class MyCustomStep(CustomStep):
def do_validate(self, df):
# Some validations that could raise YeastValidationError
pass
def do_prepare(self, df):
# Prepare the step if needed
return df
def do_bake(self, df):
# Logic to process the df
return df
recipe = Recipe([
MyCustomStep()
])
Raises:
YeastValidationError
: if any of the parameters is defined but not callable.