Join two DataFrames together

It’s rare that a data analysis involves only a single DataFrame. In practice, you’ll normally have many tables that contribute to an analysis, and you need flexible tools to combine them.

df_stocks
      ticker      name    market
0     'APPL'   'Apple'  'NASDAQ'
...
df_prices
      ticker           date    price
0     'APPL'    '28-02-2020'    2219
1     'APPL'    '30-03-2020'    2203
2     'APPL'    '30-04-2020'    3322
...
recipe = Recipe([
  LeftJoinStep(df_prices, by="ticker")
])

recipe.bake(df_stocks)
      ticker      name    market          date    price
0     'APPL'   'Apple'  'NASDAQ'  '28-02-2020'     2219
0     'APPL'   'Apple'  'NASDAQ'  '30-03-2020'     2203
0     'APPL'   'Apple'  'NASDAQ'  '30-04-2020'     3322

Yeast supports four types of joins: left, right, inner and full:

LeftJoinStep(y)

Return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y all combinations of the matches are returned.

RightJoinStep(y)

Return all rows from y, and all columns from x and y. Rows in y with no match in x will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned.

InnerJoinStep(y)

Return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned.

FullJoinStep(y)

Return all rows and all columns from both x and y. Where there are not matching values, returns NA for the one missing.

Joining with the result for a Recipe

Sometimes when you are working on complex scenarios you want to merge the data from the result of another recipe that was not executed. All join steps support a Recipe as y argument:

# Left join with the DataFrame obtained from the execution of another Recipe

# Recipe to prepare the prices dataset
prices_recipe = Recipe([
    SortStep(['ticker', 'date'])
])

# Recipe to prepare the stocks data:
stocks_recipe = Recipe([
    LeftJoinStep(prices_recipe, by="ticker", df=df_prices)
])

# `prices_recipe` will be executed using `df_prices` inside the `stocks_recipe` execution
stocks_recipe.prepare(df_stocks).bake(df_stocks)

What's next?

Available Steps