Data - preprocessing functions (pynance.data.prep)

pynance.data.prep.center(dataset, out=None)[source]

Returns a centered data set.

Each column of the returned data will have mean 0. The row vector subtracted from each row to achieve this transformation is also returned.

Parameters:

dataset : DataFrame or ndarray

out : DataFrame or ndarray, optional

Alternate output array in which to place the result. If provided, it must have the same shape and type (DataFrame or ndarray) as the expected output.

Returns:

out : tuple of DataFrame or ndarray

The output data is of the same type as the input.

Notes

To exclude a column (such as a constant feature, which is usually the first or last column of data) simply don’t include it in the input. For example:

>>> centered_data, means = pn.center(mydata.iloc[:, 1:])

To perform this operation in place:

>>> _, means = pn.center(mydata.iloc[:, 1:], out=mydata.iloc:, 1:])
pynance.data.prep.normalize(centered_data, out=None)[source]

Returns a data set with standard deviation of 1.

The input data must be centered for the operation to yield valid results: The mean of each column must be 0. Each column of the returned data set will have standard deviation 1.

The row vector by which each row of data is divided is also returned.

Parameters:

centered_data : DataFrame or ndarray

out : DataFrame or ndarray, optional

Alternate output array in which to place the result. If provided, it must have the same shape and type (DataFrame or ndarray) as the expected output.

Returns:

out : tuple of DataFrame or ndarray

The output data is of the same type as the input.

Notes

To exclude a column (such as a constant feature, which is usually the first or last column of data) simply don’t include it in the input. For example:

>>> normalized_data, sd_adj = pn.normalize(mydata.iloc[:, 1:])

To perform this operation in place:

>>> _, sd_adj = pn.normalize(mydata.iloc[:, 1:], out=mydata.iloc:, 1:])
pynance.data.prep.transform(data_frame, **kwargs)[source]

Return a transformed DataFrame

Transform data_frame along the given axis. By default, each row will be normalized (axis=0)

Parameters:

data_frame : DataFrame

data to be normalized

axis : int in {0, 1}, default: 0

0 to normalize each row, 1 to normalize each column

method : str

valid methods are:

  • “vector” : Default for normalization by row (axis=0). Normalize along axis as a vector with norm norm
  • “last” : Linear normalization setting last value along the axis to norm
  • “first” : Default for normalization of columns (axis=1). Linear normalization setting first value along the given axis to norm
  • “mean” : Normalize so that the mean of each vector along the given axis is norm

norm : float, default 1.0

Target value of normalization.

labels : DataFrame

labels may be passed as keyword argument, in which case the label values will also be normalized and returned

Returns:

out : DataFrame or tuple of 2 DataFrames

Normalized data_frame if no labels are provided. Otherwise, a tuple

containing first normalized data_frame, then normalized labels.

Notes

If labels are real-valued, they should also be normalized.

Previous topic

Data - building labels (pynance.data.lab)

Next topic

Data - remote retrieval (pynance.data.retrieve)

This Page