Data  preprocessing functions (pynance.data.prep
)¶

pynance.data.prep.
center
(dataset, out=None)[source]¶ Returns a centered data set.
Each column of the returned data will have mean 0. The row vector subtracted from each row to achieve this transformation is also returned.
Parameters: dataset : DataFrame or ndarray
out : DataFrame or ndarray, optional
Alternate output array in which to place the result. If provided, it must have the same shape and type (DataFrame or ndarray) as the expected output.
Returns: out : tuple of DataFrame or ndarray
The output data is of the same type as the input.
Notes
To exclude a column (such as a constant feature, which is usually the first or last column of data) simply don’t include it in the input. For example:
>>> centered_data, means = pn.center(mydata.iloc[:, 1:])
To perform this operation in place:
>>> _, means = pn.center(mydata.iloc[:, 1:], out=mydata.iloc:, 1:])

pynance.data.prep.
normalize
(centered_data, out=None)[source]¶ Returns a data set with standard deviation of 1.
The input data must be centered for the operation to yield valid results: The mean of each column must be 0. Each column of the returned data set will have standard deviation 1.
The row vector by which each row of data is divided is also returned.
Parameters: centered_data : DataFrame or ndarray
out : DataFrame or ndarray, optional
Alternate output array in which to place the result. If provided, it must have the same shape and type (DataFrame or ndarray) as the expected output.
Returns: out : tuple of DataFrame or ndarray
The output data is of the same type as the input.
Notes
To exclude a column (such as a constant feature, which is usually the first or last column of data) simply don’t include it in the input. For example:
>>> normalized_data, sd_adj = pn.normalize(mydata.iloc[:, 1:])
To perform this operation in place:
>>> _, sd_adj = pn.normalize(mydata.iloc[:, 1:], out=mydata.iloc:, 1:])

pynance.data.prep.
transform
(data_frame, **kwargs)[source]¶ Return a transformed DataFrame.
Transform data_frame along the given axis. By default, each row will be normalized (axis=0).
Parameters: data_frame : DataFrame
Data to be normalized.
axis : int, optional
0 (default) to normalize each row, 1 to normalize each column.
method : str, optional
Valid methods are:
 “vector” : Default for normalization by row (axis=0). Normalize along axis as a vector with norm norm
 “last” : Linear normalization setting last value along the axis to norm
 “first” : Default for normalization of columns (axis=1). Linear normalization setting first value along the given axis to norm
 “mean” : Normalize so that the mean of each vector along the given axis is norm
norm : float, optional
Target value of normalization, defaults to 1.0.
labels : DataFrame, optional
Labels may be passed as keyword argument, in which case the label values will also be normalized and returned.
Returns: df : DataFrame
Normalized data.
labels : DataFrame, optional
Normalized labels, if provided as input.
Notes
If labels are realvalued, they should also be normalized.