Data  preprocessing functions (pynance.data.prep
)¶

pynance.data.prep.
center
(dataset, out=None)[source]¶ Returns a centered data set.
Each column of the returned data will have mean 0. The row vector subtracted from each row to achieve this transformation is also returned.
Parameters: dataset : DataFrame or ndarray
out : DataFrame or ndarray, optional
Alternate output array in which to place the result. If provided, it must have the same shape and type (DataFrame or ndarray) as the expected output.
Returns: out : tuple of DataFrame or ndarray
The output data is of the same type as the input.
Notes
To exclude a column (such as a constant feature, which is usually the first or last column of data) simply don’t include it in the input. For example:
>>> centered_data, means = pn.center(mydata.iloc[:, 1:])
To perform this operation in place:
>>> _, means = pn.center(mydata.iloc[:, 1:], out=mydata.iloc:, 1:])

pynance.data.prep.
normalize
(centered_data, out=None)[source]¶ Returns a data set with standard deviation of 1.
The input data must be centered for the operation to yield valid results: The mean of each column must be 0. Each column of the returned data set will have standard deviation 1.
The row vector by which each row of data is divided is also returned.
Parameters: centered_data : DataFrame or ndarray
out : DataFrame or ndarray, optional
Alternate output array in which to place the result. If provided, it must have the same shape and type (DataFrame or ndarray) as the expected output.
Returns: out : tuple of DataFrame or ndarray
The output data is of the same type as the input.
Notes
To exclude a column (such as a constant feature, which is usually the first or last column of data) simply don’t include it in the input. For example:
>>> normalized_data, sd_adj = pn.normalize(mydata.iloc[:, 1:])
To perform this operation in place:
>>> _, sd_adj = pn.normalize(mydata.iloc[:, 1:], out=mydata.iloc:, 1:])

pynance.data.prep.
transform
(data_frame, **kwargs)[source]¶ Return a transformed DataFrame
Transform data_frame along the given axis. By default, each row will be normalized (axis=0)
Parameters: data_frame : DataFrame
data to be normalized
axis : int in {0, 1}, default: 0
0 to normalize each row, 1 to normalize each column
method : str
valid methods are:
 “vector” : Default for normalization by row (axis=0). Normalize along axis as a vector with norm norm
 “last” : Linear normalization setting last value along the axis to norm
 “first” : Default for normalization of columns (axis=1). Linear normalization setting first value along the given axis to norm
 “mean” : Normalize so that the mean of each vector along the given axis is norm
norm : float, default 1.0
Target value of normalization.
labels : DataFrame
labels may be passed as keyword argument, in which case the label values will also be normalized and returned
Returns: out : DataFrame or tuple of 2 DataFrames
Normalized data_frame if no labels are provided. Otherwise, a tuple
containing first normalized data_frame, then normalized labels.
Notes
If labels are realvalued, they should also be normalized.