Join two DataFrames together

It’s rare that a data analysis involves only a single DataFrame. In practice, you’ll normally have many tables that contribute to an analysis, and you need flexible tools to combine them.

      ticker      name    market
0     'APPL'   'Apple'  'NASDAQ'
      ticker           date    price
0     'APPL'    '28-02-2020'    2219
1     'APPL'    '30-03-2020'    2203
2     'APPL'    '30-04-2020'    3322
recipe = Recipe([
  LeftJoinStep(df_prices, by="ticker")

      ticker      name    market          date    price
0     'APPL'   'Apple'  'NASDAQ'  '28-02-2020'     2219
0     'APPL'   'Apple'  'NASDAQ'  '30-03-2020'     2203
0     'APPL'   'Apple'  'NASDAQ'  '30-04-2020'     3322

Yeast supports four types of joins: left, right, inner and full:


Return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y all combinations of the matches are returned.


Return all rows from y, and all columns from x and y. Rows in y with no match in x will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned.


Return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned.


Return all rows and all columns from both x and y. Where there are not matching values, returns NA for the one missing.

Joining with the result for a Recipe

Sometimes when you are working on complex scenarios you want to merge the data from the result of another recipe that was not executed. All join steps support a Recipe as y argument:

# Left join with the DataFrame obtained from the execution of another Recipe

# Recipe to prepare the prices dataset
prices_recipe = Recipe([
    SortStep(['ticker', 'date'])

# Recipe to prepare the stocks data:
stocks_recipe = Recipe([
    LeftJoinStep(prices_recipe, by="ticker", df=df_prices)

# `prices_recipe` will be executed using `df_prices` inside the `stocks_recipe` execution

What's next?

Available Steps