Join two DataFrames together¶
It’s rare that a data analysis involves only a single DataFrame. In practice, you’ll normally have many tables that contribute to an analysis, and you need flexible tools to combine them.
df_stocks
ticker name market
0 'APPL' 'Apple' 'NASDAQ'
...
df_prices
ticker date price
0 'APPL' '28-02-2020' 2219
1 'APPL' '30-03-2020' 2203
2 'APPL' '30-04-2020' 3322
...
recipe = Recipe([
LeftJoinStep(df_prices, by="ticker")
])
recipe.bake(df_stocks)
ticker name market date price
0 'APPL' 'Apple' 'NASDAQ' '28-02-2020' 2219
0 'APPL' 'Apple' 'NASDAQ' '30-03-2020' 2203
0 'APPL' 'Apple' 'NASDAQ' '30-04-2020' 3322
Yeast supports four types of joins: left, right, inner and full:
LeftJoinStep(y)¶
Return all rows from x, and all columns from x and y.
Rows in x with no match in y will have NA values in the new columns.
If there are multiple matches between x and y all combinations of the matches are returned.

RightJoinStep(y)¶
Return all rows from y, and all columns from x and y. Rows in y with no match in x
will have NA values in the new columns. If there are multiple matches between x and y,
all combinations of the matches are returned.

InnerJoinStep(y)¶
Return all rows from x where there are matching values in y, and all columns from x and y.
If there are multiple matches between x and y, all combination of the matches are returned.

FullJoinStep(y)¶
Return all rows and all columns from both x and y. Where there are not matching values, returns
NA for the one missing.

Joining with the result for a Recipe¶
Sometimes when you are working on complex scenarios you want to merge the data from the result of
another recipe that was not executed. All join steps support a Recipe as y argument:
# Left join with the DataFrame obtained from the execution of another Recipe
# Recipe to prepare the prices dataset
prices_recipe = Recipe([
SortStep(['ticker', 'date'])
])
# Recipe to prepare the stocks data:
stocks_recipe = Recipe([
LeftJoinStep(prices_recipe, by="ticker", df=df_prices)
])
# `prices_recipe` will be executed using `df_prices` inside the `stocks_recipe` execution
stocks_recipe.prepare(df_stocks).bake(df_stocks)