Methods for Creating and Transforming Variables

Besides selecting sets of existing columns, it’s often useful to add new columns that are functions of existing columns or modify values on rows. This is the job of MutateStep():

This steps uses a dictionary to list columns (keys) and transformers (values) that should be applied while you are allowed to refer to columns that you’ve just created:

The most basic signature is the following:

recipe = Recipe([
    MutateStep({
        # Column "fullname" from: "JONATHAN ARCHER" to "Jonathan Archer"
        'fullname': StrToTitle()
    })
])

While you can also pass a column name to transform:

# Column "fullname" from: "JONATHAN ARCHER" to "Jonathan Archer"
MutateStep({'fullname': StrToTitle('fullname')})

Moreover, you can extend to complex chains of transformations:

# Let's transform/create some variables:
MutateStep({
  # Transform the "name" column
  'name': [
      # "JONATHAN ARCHER" to "Jonathan Archer"
      StrToTitle('name'),
      # " Data " to "Data"
      StrTrim('name'),
      # "Philippa  Georgiou" to "Philippa Georgiou"
      StrReplace('  ', ' ', 'name'),
      # "Jean--Luc PICARD" to "Jean-Luc Picard"
      StrReplaceAll('--', '-', 'name')
  ],
  'rank': StrToTitle('rank')
})

Currently the transformers are categorized as:

  • String Transformers: String Transformers provide a cohesive set of transformers designed to make working with strings as easy as possible.
  • Rank Transformers: Returns the sample ranks of the values in a column.

General Transformers

  • MapValues: Replace specified values with new values.

String Transformers

String Transformers provide a cohesive set of transformers designed to make working with strings as easy as possible:

Rank Transformers

Returns the sample ranks of the values in a column:

Date Transformers

Returns components of a Date or DateTime column:

General Transformers

MapValues

class yeast.transformers.MapValues(mapping, column=None)

Replace specified values with new values.

# Map String/Categorical values
# Replace old_value with new_value
MapValues({'old_value': 'new_value', ...})

# Map Numerical values
# Replace 90 with NaN
MapValues({90: np.NaN})

Parameters:

  • mapping: Specify different replacement values for different existing values. For example: {'old': 'new'} replace the value old with new.

String Transformers

StrToUpper

class yeast.transformers.StrToUpper(column=None)

Convert case of a string to Upper case: ("Yeast" to "YEAST")

StrToLower

class yeast.transformers.StrToLower(column=None)

Convert case of a string to Lower case: ("Yeast" to "yeast")

StrToSentence

class yeast.transformers.StrToSentence(column=None)

Converts first character to uppercase and remaining to lowercase: ("yeast help" to "Yeast help")

StrToTitle

class yeast.transformers.StrToTitle(column=None)

Converts first character of each word to uppercase and remaining to lowercase: ("yeast help" to "Yeast Help")

StrTrim

class yeast.transformers.StrTrim(column=None)

Convert removing whitespaces from start and end of string: (" Yeast " to "Yeast")

StrReplace

class yeast.transformers.StrReplace(pattern, replacement, column=None)

Replace first ocurrence of matched patterns in a string: 'Y' to 'X' ("YYY" to "XYY")

Parameters:

  • pattern: Pattern or string to look for.
  • replacement: A string of replacements.

StrReplaceAll

class yeast.transformers.StrReplaceAll(pattern, replacement, column=None)

Replace all ocurrences of matched patterns in a string: 'Y' to 'X' ("YYY" to "XXX")

Parameters:

  • pattern: Pattern or string to look for.
  • replacement: A string of replacements.

StrPad

class yeast.transformers.StrPad(width, side='left', pad=' ', column=None)

Pad a string: 'Y' to 4 chars, left and '0' ("Y" to "000Y")

Parameters:

  • width: Minimum width of padded strings.
  • side: Side on which padding character is added (left, right or both).
  • pad: Single padding character (default is a space).

StrSlice

class yeast.transformers.StrSlice(start, stop, column=None)

Extract and replace substrings from a string:

StrSlice("Yeast Help", start=6, end=10) # "Help"

Parameters:

  • start: integer position of the first character
  • stop: integer position of the last character

StrRemove

class yeast.transformers.StrRemove(pattern, column=None)

Remove first matched pattern in a string

StrRemove("_temp") # "Yeast_temp" to "Yeast"

Parameters:

  • pattern: Pattern or string to look for.

StrRemoveAll

class yeast.transformers.StrRemoveAll(pattern, column=None)

Remove all matched patterns in a string

StrRemoveAll("_temp") # "Yeast_temp_temp" to "Yeast"

Parameters:

  • pattern: Pattern or string to look for.

StrContains

class yeast.transformers.StrContains(pattern, column=None, case=True, regex=True)

Test if pattern or regex is contained within a string column. Return a boolean variable ( True and False ).

MutateStep({
    'feature': StrContains("_temp", column="text", case=True, regex=True)
}),
# You can convert to numerical (0 and 1) with:
CastStep({'feature': 'integer'})

Parameters:

  • pattern: Pattern or string to look for.
  • case: If True, case sensitive.
  • regex: If True, assumes the pat is a regular expression. If False, treats the pat as a literal string.

Rank Transformers

class yeast.transformers.RankTransformer(column=None, ties_method='first', ascending=True, percentage=False)

Returns the sample ranks of the values in the column. Ties (i.e., equal values) and missing values can be handled in several ways.

Ties Methods:

The first method results in a permutation with increasing values at each index set of ties. average, replaces them by their mean, and max and min replaces them by their maximum and minimum respectively. dense is like min, but with no gaps between ranks.

Parameters:

  • column: name used to rank values
  • ties_method: string specifying how ties are treated: {'average', 'min', 'max', 'first', 'dense'}
  • ascending: boolean with the order of the row numbers

RowNumber

class yeast.transformers.RowNumber(column=None, ascending=True)

Creates/transforms a variable containg the row number.

Parameters:

  • ascending: boolean with the order of the row numbers
  • column: used to sort/arrange and rank values

RankFirst

class yeast.transformers.RankFirst(column=None, ascending=True)

Increasing rank values at each index set of ties

Parameters:

  • ascending: boolean with the order of the row numbers
  • column: used to sort/arrange and rank values

RankMin

class yeast.transformers.RankMin(column=None, ascending=True)

Replace by the minimum value

Parameters:

  • ascending: boolean with the order of the row numbers
  • column: used to sort/arrange and rank values

RankMax

class yeast.transformers.RankMax(column=None, ascending=True)

Replace by the maximum value

Parameters:

  • ascending: boolean with the order of the row numbers
  • column: used to sort/arrange and rank values

RankDense

class yeast.transformers.RankDense(column=None, ascending=True)

Replace by the minimum value like RankMin, but with no gaps between ranks.

Parameters:

  • ascending: boolean with the order of the row numbers
  • column: used to sort/arrange and rank values

RankMean

class yeast.transformers.RankMean(column=None, ascending=True)

Replace by the mean/average value

Parameters:

  • ascending: boolean with the order of the row numbers
  • column: used to sort/arrange and rank values

RankPercent

class yeast.transformers.RankPercent(column=None, ascending=True)

A number between 0 and 1 computed by rescaling RankMin to [0, 1]

Parameters:

  • ascending: boolean with the order of the row numbers
  • column: used to sort/arrange and rank values

Date Transformers

DateYear

class yeast.transformers.DateYear(column=None)

Extract the year of a Date column

DateQuarter

class yeast.transformers.DateQuarter(column=None)

Extract the quarter of a Date column

DateMonth

class yeast.transformers.DateMonth(column=None)

Extract the month of a Date column

DateWeek

class yeast.transformers.DateWeek(column=None)

Extract the week of a Date column

DateDay

class yeast.transformers.DateDay(column=None)

Extract the day of a Date column

DateDayOfWeek

class yeast.transformers.DateDayOfWeek(column=None)

Extract the day of week of a Date column. The day of the week is Monday=0 and Sunday=6

DateDayOfYear

class yeast.transformers.DateDayOfYear(column=None)

Extract the day of year of a Date column.

DateHour

class yeast.transformers.DateHour(column=None)

Extract the hour of a Date column.

DateMinute

class yeast.transformers.DateMinute(column=None)

Extract the minute of a Date column.

DateSecond

class yeast.transformers.DateSecond(column=None)

Extract the seconds of a Date column.

What's next?