sciris.sc_math¶
Extensions to Numpy, including finding array elements and smoothing data.
- Highlights:
sc.findinds()
: find indices of an array matching a conditionsc.findnearest()
: find nearest matching valuesc.rolling()
: calculate rolling averagesc.smooth()
: simple smoothing of 1D or 2D arrays
Functions
Determine whether two scalars (or an array and a scalar) approximately match. |
|
Like np.concatenate(), but takes anything and returns an array. |
|
Like |
|
Take an array of data and return either the first or last (or some other) non-NaN entry. |
|
Alias for findinds(..., first=True). |
|
Little function to find matches even if two things aren't eactly equal (eg. |
|
Alias for findinds(..., last=True). |
|
Return the index of the nearest match in series to value -- like findinds, but always returns an object with the same type as value (i.e. |
|
Gaussian 1D smoothing kernel. |
|
Gaussian 2D smoothing kernel. |
|
Return the data value indices that are valid based on the validity of the input data. |
|
Return the indices that are valid based on the validity of the input data from an arbitrary number of 1-D vector inputs. |
|
Like |
|
Determine if a number is prime. |
|
Rescale an array between a minimum value and a maximum value. |
|
Multiply a list or array by some normalizing factor so that its sum is equal to the total. |
|
Define an array of numbers uniformly perturbed with a mean of 1. |
|
Round a float, list, or array probabilistically to the nearest integer. |
|
Alias to Pandas' rolling() (window) method to smooth a series. |
|
Handle divide-by-zero and divide-by-nan elegantly. |
|
Sanitize input to remove NaNs. |
|
Very simple function to smooth a 1D or 2D array. |
|
Smoothly interpolate over values and keep end points. |
- approx(val1=None, val2=None, eps=None, **kwargs)[source]¶
Determine whether two scalars (or an array and a scalar) approximately match. Alias for np.isclose() and may be removed in future versions.
- Parameters
val1 (number or array) – the first value
val2 (number) – the second value
eps (float) – absolute tolerance
kwargs (dict) – passed to np.isclose()
Examples:
sc.approx(2*6, 11.9999999, eps=1e-6) # Returns True sc.approx([3,12,11.9], 12) # Returns array([False, True, False], dtype=bool)
- safedivide(numerator=None, denominator=None, default=None, eps=None, warn=False)[source]¶
Handle divide-by-zero and divide-by-nan elegantly.
Examples:
sc.safedivide(numerator=0, denominator=0, default=1, eps=0) # Returns 1 sc.safedivide(numerator=5, denominator=2.0, default=1, eps=1e-3) # Returns 2.5 sc.safedivide(3, np.array([1,3,0]), -1, warn=True) # Returns array([ 3, 1, -1])
- findinds(arr=None, val=None, eps=1e-06, first=False, last=False, die=True, **kwargs)[source]¶
Little function to find matches even if two things aren’t eactly equal (eg. due to floats vs. ints). If one argument, find nonzero values. With two arguments, check for equality using eps. Returns a tuple of arrays if val1 is multidimensional, else returns an array. Similar to calling np.nonzero(np.isclose(arr, val))[0].
- Parameters
arr (array) – the array to find values in
val (float) – if provided, the value to match
eps (float) – the precision for matching (default 1e-6, equivalent to np.isclose’s atol)
first (bool) – whether to return the first matching value
last (bool) – whether to return the last matching value
die (bool) – whether to raise an exception if first or last is true and no matches were found
kwargs (dict) – passed to np.isclose()
Examples:
sc.findinds(rand(10)<0.5) # returns e.g. array([2, 4, 5, 9]) sc.findinds([2,3,6,3], 3) # returs array([1,3]) sc.findinds([2,3,6,3], 3, first=True) # returns 1
New in version 1.2.3: “die” argument
- findnearest(series=None, value=None)[source]¶
Return the index of the nearest match in series to value – like findinds, but always returns an object with the same type as value (i.e. findnearest with a number returns a number, findnearest with an array returns an array).
Examples:
findnearest(rand(10), 0.5) # returns whichever index is closest to 0.5 findnearest([2,3,6,3], 6) # returns 2 findnearest([2,3,6,3], 6) # returns 2 findnearest([0,2,4,6,8,10], [3, 4, 5]) # returns array([1, 2, 2])
Version: 2017jan07
- dataindex(dataarray, index)[source]¶
Take an array of data and return either the first or last (or some other) non-NaN entry.
This function is deprecated.
- getvalidinds(data=None, filterdata=None)[source]¶
Return the indices that are valid based on the validity of the input data from an arbitrary number of 1-D vector inputs. Warning, closely related to getvaliddata().
This function is deprecated.
Example:
getvalidinds([3,5,8,13], [2000, nan, nan, 2004]) # Returns array([0,3])
- sanitize(data=None, returninds=False, replacenans=None, die=True, defaultval=None, label=None, verbose=True)[source]¶
Sanitize input to remove NaNs. Warning, does not work on multidimensional data!!
Examples:
sanitized,inds = sanitize(array([3,4,nan,8,2,nan,nan,nan,8]), returninds=True) sanitized = sanitize(array([3,4,nan,8,2,nan,nan,nan,8]), replacenans=True) sanitized = sanitize(array([3,4,nan,8,2,nan,nan,nan,8]), replacenans=0)
- getvaliddata(data=None, filterdata=None, defaultind=0)[source]¶
Return the data value indices that are valid based on the validity of the input data.
This function is deprecated.
Example:
getvaliddata(array([3,5,8,13]), array([2000, nan, nan, 2004])) # Returns array([3,13])
- isprime(n, verbose=False)[source]¶
Determine if a number is prime.
From https://stackoverflow.com/questions/15285534/isprime-function-for-python-language
Example:
for i in range(100): print(i) if sc.isprime(i) else None
- perturb(n=1, span=0.5, randseed=None, normal=False)[source]¶
Define an array of numbers uniformly perturbed with a mean of 1.
- Parameters
n (int) – number of points
span (float) – width of distribution on either side of 1
normal (bool) – whether to use a normal distribution instead of uniform
Example:
sc.perturb(5, 0.3) # Returns e.g. array([0.73852362, 0.7088094 , 0.93713658, 1.13150755, 0.87183371])
- normsum(arr, total=None)[source]¶
Multiply a list or array by some normalizing factor so that its sum is equal to the total. Formerly called
sc.scaleratio()
.- Parameters
arr (array) – array (or list) to normalize
total (float) – amount to sum to (default 1)
Example:
normarr = sc.normsum([2,5,3,10], 100) # Scale so sum equals 100; returns [10.0, 25.0, 15.0, 50.0]
Renamed in version 1.0.0.
- normalize(arr, minval=0.0, maxval=1.0)[source]¶
Rescale an array between a minimum value and a maximum value.
- Parameters
arr (array) – array to normalize
minval (float) – minimum value in rescaled array
maxval (float) – maximum value in rescaled array
Example:
normarr = sc.normalize([2,3,7,27]) # Returns array([0. , 0.04, 0.2 , 1. ])
- inclusiverange(*args, **kwargs)[source]¶
Like
np.arange()
/np.linspace()
, but includes the start and stop points. Accepts 0-3 args, or the kwargs start, stop, step.In most cases, equivalent to
np.linspace(start, stop, int((stop-start)/step)+1)
.- Parameters
start (float) – value to start at
stop (float) – value to stop at
step (float) – step size
kwargs (dict) – passed to
np.linspace()
Examples:
x = sc.inclusiverange(10) # Like np.arange(11) x = sc.inclusiverange(3,5,0.2) # Like np.linspace(3, 5, int((5-3)/0.2+1)) x = sc.inclusiverange(stop=5) # Like np.arange(6) x = sc.inclusiverange(6, step=2) # Like np.arange(0, 7, 2)
- randround(x)[source]¶
Round a float, list, or array probabilistically to the nearest integer. Works for both positive and negative values.
- Parameters
x (int, list, arr) – the floating point numbers to probabilistically convert to the nearest integer
- Returns
Array of integers
Example:
sc.randround(np.random.randn(8)) # Returns e.g. array([-1, 0, 1, -2, 2, 0, 0, 0])
New in version 1.0.0.
- cat(*args, axis=None, copy=False, **kwargs)[source]¶
Like np.concatenate(), but takes anything and returns an array. Useful for e.g. appending a single number onto the beginning or end of an array.
- Parameters
args (any) – items to concatenate into an array
axis (int) – axis along which to concatenate
copy (bool) – whether or not to deepcopy the result
kwargs (dict) – passed to
np.array()
Examples:
arr = sc.cat(4, np.ones(3)) arr = sc.cat(np.array([1,2,3]), [4,5], 6) arr = sc.cat(np.random.rand(2,4), np.random.rand(2,6), axis=1)
New in version 1.0.0.New in version 1.1.0: “copy” and keyword arguments.
- rolling(data, window=7, operation='mean', **kwargs)[source]¶
Alias to Pandas’ rolling() (window) method to smooth a series.
- Parameters
data (list/arr) – the 1D or 2D data to be smoothed
window (int) – the length of the window
operation (str) – the operation to perform: ‘mean’ (default), ‘median’, ‘sum’, or ‘none’
kwargs (dict) – passed to pd.Series.rolling()
Example:
data = [5,5,5,0,0,0,0,7,7,7,7,0,0,3,3,3] rolled = sc.rolling(data)
- convolve(a, v)[source]¶
Like
np.convolve()
, but always returns an array the size of the first array (equivalent to mode=’same’), and solves the boundary problem present innp.convolve()
by adjusting the edges by the weight of the convolution kernel.- Parameters
a (arr) – the input array
v (arr) – the convolution kernel
Example:
a = np.ones(5) v = np.array([0.3, 0.5, 0.2]) c1 = np.convolve(a, v, mode='same') # Returns array([0.8, 1. , 1. , 1. , 0.7]) c2 = sc.convolve(a, v) # Returns array([1., 1., 1., 1., 1.])
New in version 1.3.0.New in version 1.3.1: handling the case where len(a) < len(v)
- smooth(data, repeats=None, kernel=None, legacy=False)[source]¶
Very simple function to smooth a 1D or 2D array.
See also
sc.gauss1d()
for simple Gaussian smoothing.- Parameters
data (arr) – 1D or 2D array to smooth
repeats (int) – number of times to apply smoothing (by default, scale to be 1/5th the length of data)
kernel (arr) – the smoothing kernel to use (default:
[0.25, 0.5, 0.25]
)legacy (bool) – if True, use the old (pre-1.3.0) method of calculation that doesn’t correct for edge effects
Example:
data = pl.randn(5,5) smoothdata = sc.smooth(data)
New in version 1.3.0: Fix edge effects.
- smoothinterp(newx=None, origx=None, origy=None, smoothness=None, growth=None, ensurefinite=False, keepends=True, method='linear')[source]¶
Smoothly interpolate over values and keep end points. Same format as numpy.interp().
- Parameters
newx (arr) – the points at which to interpolate
origx (arr) – the original x coordinates
origy (arr) – the original y coordinates
smoothness (float) – how much to smooth
growth (float) – the growth rate to apply past the ends of the data [deprecated]
ensurefinite (bool) – ensure all values are finite
keepends (bool) – whether to keep the ends [deprecated]
method (str) – the type of interpolation to use (options are ‘linear’ or ‘nearest’)
- Returns
the new y coordinates
- Return type
newy (arr)
Example:
origy = np.array([0,0.1,0.3,0.8,0.7,0.9,0.95,1]) origx = np.linspace(0,1,len(origy)) newx = np.linspace(0,1,5*len(origy)) newy = sc.smoothinterp(newx, origx, origy, smoothness=5) pl.plot(newx,newy) pl.scatter(origx,origy)
Version: 2018jan24
- gauss1d(x=None, y=None, xi=None, scale=None, use32=True)[source]¶
Gaussian 1D smoothing kernel.
Create smooth interpolation of input points at interpolated points. If no points are supplied, use the same as the input points.
- Parameters
x (arr) – 1D list of x coordinates
y (arr) – 1D list of y values at each of the x coordinates
xi (arr) – 1D list of points to calculate the interpolated y
scale (float) – how much smoothing to apply (by default, width of 5 data points)
use32 (bool) – convert arrays to 32-bit floats (doubles speed for large arrays)
Examples:
# Setup import pylab as pl x = pl.rand(40) y = (x-0.3)**2 + 0.2*pl.rand(40) # Smooth yi = sc.gauss1d(x, y) yi2 = sc.gauss1d(x, y, scale=0.3) xi3 = pl.linspace(0,1) yi3 = sc.gauss1d(x, y, xi) # Plot oiginal and interpolated versions pl.scatter(x, y, label='Original') pl.scatter(x, yi, label='Default smoothing') pl.scatter(x, yi2, label='More smoothing') pl.scatter(xi3, yi3, label='Uniform spacing') pl.show() # Simple usage sc.gauss1d(y)
New in version 1.3.0.
- gauss2d(x=None, y=None, z=None, xi=None, yi=None, scale=1.0, xscale=1.0, yscale=1.0, grid=False, use32=True)[source]¶
Gaussian 2D smoothing kernel.
Create smooth interpolation of input points at interpolated points. Can handle either 1D or 2D inputs.
- Parameters
x (arr) – 1D or 2D array of x coordinates (if None, take from z)
y (arr) – ditto, for y
z (arr) – 1D or 2D array of z values at each of the (x,y) points
xi (arr) – 1D or 2D array of points to calculate the interpolated Z; if None, same as x
yi (arr) – ditto, for y
scale (float) – overall scale factor
xscale (float) – ditto, just for x
yscale (float) – ditto, just for y
grid (bool) – if True, then return Z at a grid of (xi,yi) rather than at points
use32 (bool) – convert arrays to 32-bit floats (doubles speed for large arrays)
Examples:
# Setup import pylab as pl x = pl.rand(40) y = pl.rand(40) z = 1-(x-0.5)**2 + (y-0.5)**2 # Make a saddle # Simple usage -- only works if z is 2D zi0 = sc.gauss2d(pl.rand(10,10)) sc.surf3d(zi0) # Method 1 -- form grid xi = pl.linspace(0,1,20) yi = pl.linspace(0,1,20) zi = sc.gauss2d(x, y, z, xi, yi, scale=0.1, grid=True) # Method 2 -- use points directly xi2 = pl.rand(400) yi2 = pl.rand(400) zi2 = sc.gauss2d(x, y, z, xi2, yi2, scale=0.1) # Plot oiginal and interpolated versions sc.scatter3d(x, y, z, c=z) sc.surf3d(zi) sc.scatter3d(xi2, yi2, zi2, c=zi2) pl.show()
New in version 1.3.0.New in version 1.3.1: default arguments; support for 2D inputs