Data - combine features and labels (, featurefunc, labelfunc)[source]

Return features and labels for the given equity data.

Each row of the features returned contains 2 * n_sessions + 1 columns (or 1 less if the constant feature is excluded). After the constant feature, if present, there will be n_sessions columns derived from daily growth of the given price column, which defaults to ‘Adj Close’. There will then follow another n_sessions columns representing current volume as a multiple of average volume over the previous 252 (or other value determined by the user) sessions.

The returned features are not centered or normalized because these adjustments need to be made after test or cross-validation data has been removed.

The constant feature is prepended by default.

The labels are derived from eqdata using labelfunc.


eqdata : DataFrame

Expected is a dataframe as return by the get() function. A column labeled ‘Volume’ must be present.

featurefunc : function

Function taking a dataframe of simple equity data as argument and returning a dataframe of features and an integer representing the number of rows that had to be skipped at the beginning of the index of the input dataframe. The rows skipped are used to synchronize the indices of features and labels. For example, if the features are composed of 4 successive daily returns, then the date of row 0 of features would be the same as the date of row 3 (counting from 0) of input data. So the corresponding featurefunc would return a dataframe and the value 3.

labelfunc : function

function for deriving labels from eqdata. labelfunc must take a single argument: df, a dataframe to which labelfunc will be applied. labelfunc should return a dataframe of labels followed by an int specifying the number of feature rows to skip at the end of the feature dataframe. For example, if features are relative prices 64 days out, these features will only be known up until 64 days before the data runs out. In order to properly align features and labels, the features should not include the last 64 rows that would otherwise be possible.

Usage: labels, skipatend = labelfunc(eqdata)


features : DataFrame

The features derived from the given parameters.

labels : DataFrame

The labels derived from the given parameters.