Methods for Selecting Variables

The step to select variables from a DataFrame is SelectColumnsStep / SelectStep. It will keep columns based on their names or the results of the selectors.

from yeast.selectors import AllNumeric

Recipe([
  # AllNumeric() is a selector in charge of keep all numeric variables
  # So, when executed it keeps all numeric columns and title
  SelectStep([AllNumeric(), 'title'])
])

The selectors can choose columns based on their data type or name. They are shortcuts to select a subset of columns/predictors based on a common attribute:

The usage is quite simple, you can pass them on any parameter that indicates column names and basically they are used to select columns based on the attributes.

Recipe([
  # Will keep all numeric and 2 more columns:
  SelectStep([AllNumeric(), 'title', 'aired']),
  # Will keep all the numeric variables:
  SelectStep(AllNumeric()),
  # Will only one columns:
  SelectStep('seasons')
])

Available Selectors

AllColumns

class yeast.selectors.AllColumns()

Return all columns on the DataFrame

Recipe([
    # Will keep all columns
    SelectStep(AllColumns())
])

AllString

class yeast.selectors.AllString()

Return all string columns

Recipe([
    # Will keep all strings
    SelectStep(AllString())
])

AllBoolean

class yeast.selectors.AllBoolean()

Return all boolean columns

Recipe([
    # Will keep all booleans
    SelectStep(AllBoolean())
])

AllNumeric

class yeast.selectors.AllNumeric()

Return all numerical columns

Recipe([
    # Will keep all numerical values like int, float, etc.
    SelectStep(AllNumeric())
])

AllDatetime

class yeast.selectors.AllDatetime()

Return all DateTime columns

Recipe([
    # Will keep all dates and times
    SelectStep(AllDatetime())
])

AllCategorical

class yeast.selectors.AllCategorical()

Return all Categorical columns

Recipe([
    # Will keep all categorical
    SelectStep(AllCategorical())
])

AllMatching

class yeast.selectors.AllMatching(pattern='')

Return all columns matching the regular expression given by pattern

Recipe([
    # Will keep all the columns ending with "ed" (ed$)
    SelectStep(AllMatching('ed$'))
])

What's next?