sciris.sc_math

Extensions to Numpy, including finding array elements and smoothing data.

Highlights:
  • sc.findinds(): find indices of an array matching a condition

  • sc.findnearest(): find nearest matching value

  • sc.rolling(): calculate rolling average

  • sc.smooth(): simple smoothing of 1D or 2D arrays

Functions

approx

Determine whether two scalars (or an array and a scalar) approximately match.

cat

Like np.concatenate(), but takes anything and returns an array.

convolve

Like np.convolve(), but always returns an array the size of the first array (equivalent to mode='same'), and solves the boundary problem present in np.convolve() by adjusting the edges by the weight of the convolution kernel.

dataindex

Take an array of data and return either the first or last (or some other) non-NaN entry.

findfirst

Alias for findinds(..., first=True).

findinds

Little function to find matches even if two things aren't eactly equal (eg.

findlast

Alias for findinds(..., last=True).

findnearest

Return the index of the nearest match in series to value -- like findinds, but always returns an object with the same type as value (i.e.

gauss1d

Gaussian 1D smoothing kernel.

gauss2d

Gaussian 2D smoothing kernel.

getvaliddata

Return the data value indices that are valid based on the validity of the input data.

getvalidinds

Return the indices that are valid based on the validity of the input data from an arbitrary number of 1-D vector inputs.

inclusiverange

Like np.arange()/np.linspace(), but includes the start and stop points.

isprime

Determine if a number is prime.

normalize

Rescale an array between a minimum value and a maximum value.

normsum

Multiply a list or array by some normalizing factor so that its sum is equal to the total.

perturb

Define an array of numbers uniformly perturbed with a mean of 1.

randround

Round a float, list, or array probabilistically to the nearest integer.

rolling

Alias to Pandas' rolling() (window) method to smooth a series.

safedivide

Handle divide-by-zero and divide-by-nan elegantly.

sanitize

Sanitize input to remove NaNs.

smooth

Very simple function to smooth a 1D or 2D array.

smoothinterp

Smoothly interpolate over values and keep end points.

approx(val1=None, val2=None, eps=None, **kwargs)[source]

Determine whether two scalars (or an array and a scalar) approximately match. Alias for np.isclose() and may be removed in future versions.

Parameters
  • val1 (number or array) – the first value

  • val2 (number) – the second value

  • eps (float) – absolute tolerance

  • kwargs (dict) – passed to np.isclose()

Examples:

sc.approx(2*6, 11.9999999, eps=1e-6) # Returns True
sc.approx([3,12,11.9], 12) # Returns array([False, True, False], dtype=bool)
safedivide(numerator=None, denominator=None, default=None, eps=None, warn=False)[source]

Handle divide-by-zero and divide-by-nan elegantly.

Examples:

sc.safedivide(numerator=0, denominator=0, default=1, eps=0) # Returns 1
sc.safedivide(numerator=5, denominator=2.0, default=1, eps=1e-3) # Returns 2.5
sc.safedivide(3, np.array([1,3,0]), -1, warn=True) # Returns array([ 3,  1, -1])
findinds(arr=None, val=None, eps=1e-06, first=False, last=False, die=True, **kwargs)[source]

Little function to find matches even if two things aren’t eactly equal (eg. due to floats vs. ints). If one argument, find nonzero values. With two arguments, check for equality using eps. Returns a tuple of arrays if val1 is multidimensional, else returns an array. Similar to calling np.nonzero(np.isclose(arr, val))[0].

Parameters
  • arr (array) – the array to find values in

  • val (float) – if provided, the value to match

  • eps (float) – the precision for matching (default 1e-6, equivalent to np.isclose’s atol)

  • first (bool) – whether to return the first matching value

  • last (bool) – whether to return the last matching value

  • die (bool) – whether to raise an exception if first or last is true and no matches were found

  • kwargs (dict) – passed to np.isclose()

Examples:

sc.findinds(rand(10)<0.5) # returns e.g. array([2, 4, 5, 9])
sc.findinds([2,3,6,3], 3) # returs array([1,3])
sc.findinds([2,3,6,3], 3, first=True) # returns 1

New in version 1.2.3: “die” argument

findfirst(*args, **kwargs)[source]

Alias for findinds(…, first=True). New in version 1.0.0.

findlast(*args, **kwargs)[source]

Alias for findinds(…, last=True). New in version 1.0.0.

findnearest(series=None, value=None)[source]

Return the index of the nearest match in series to value – like findinds, but always returns an object with the same type as value (i.e. findnearest with a number returns a number, findnearest with an array returns an array).

Examples:

findnearest(rand(10), 0.5) # returns whichever index is closest to 0.5
findnearest([2,3,6,3], 6) # returns 2
findnearest([2,3,6,3], 6) # returns 2
findnearest([0,2,4,6,8,10], [3, 4, 5]) # returns array([1, 2, 2])

Version: 2017jan07

dataindex(dataarray, index)[source]

Take an array of data and return either the first or last (or some other) non-NaN entry.

This function is deprecated.

getvalidinds(data=None, filterdata=None)[source]

Return the indices that are valid based on the validity of the input data from an arbitrary number of 1-D vector inputs. Warning, closely related to getvaliddata().

This function is deprecated.

Example:

getvalidinds([3,5,8,13], [2000, nan, nan, 2004]) # Returns array([0,3])
sanitize(data=None, returninds=False, replacenans=None, die=True, defaultval=None, label=None, verbose=True)[source]

Sanitize input to remove NaNs. Warning, does not work on multidimensional data!!

Examples:

sanitized,inds = sanitize(array([3,4,nan,8,2,nan,nan,nan,8]), returninds=True)
sanitized = sanitize(array([3,4,nan,8,2,nan,nan,nan,8]), replacenans=True)
sanitized = sanitize(array([3,4,nan,8,2,nan,nan,nan,8]), replacenans=0)
getvaliddata(data=None, filterdata=None, defaultind=0)[source]

Return the data value indices that are valid based on the validity of the input data.

This function is deprecated.

Example:

getvaliddata(array([3,5,8,13]), array([2000, nan, nan, 2004])) # Returns array([3,13])
isprime(n, verbose=False)[source]

Determine if a number is prime.

From https://stackoverflow.com/questions/15285534/isprime-function-for-python-language

Example:

for i in range(100): print(i) if sc.isprime(i) else None
perturb(n=1, span=0.5, randseed=None, normal=False)[source]

Define an array of numbers uniformly perturbed with a mean of 1.

Parameters
  • n (int) – number of points

  • span (float) – width of distribution on either side of 1

  • normal (bool) – whether to use a normal distribution instead of uniform

Example:

sc.perturb(5, 0.3) # Returns e.g. array([0.73852362, 0.7088094 , 0.93713658, 1.13150755, 0.87183371])
normsum(arr, total=None)[source]

Multiply a list or array by some normalizing factor so that its sum is equal to the total. Formerly called sc.scaleratio().

Parameters
  • arr (array) – array (or list) to normalize

  • total (float) – amount to sum to (default 1)

Example:

normarr = sc.normsum([2,5,3,10], 100) # Scale so sum equals 100; returns [10.0, 25.0, 15.0, 50.0]

Renamed in version 1.0.0.

normalize(arr, minval=0.0, maxval=1.0)[source]

Rescale an array between a minimum value and a maximum value.

Parameters
  • arr (array) – array to normalize

  • minval (float) – minimum value in rescaled array

  • maxval (float) – maximum value in rescaled array

Example:

normarr = sc.normalize([2,3,7,27]) # Returns array([0.  , 0.04, 0.2 , 1.  ])
inclusiverange(*args, **kwargs)[source]

Like np.arange()/np.linspace(), but includes the start and stop points. Accepts 0-3 args, or the kwargs start, stop, step.

In most cases, equivalent to np.linspace(start, stop, int((stop-start)/step)+1).

Parameters
  • start (float) – value to start at

  • stop (float) – value to stop at

  • step (float) – step size

  • kwargs (dict) – passed to np.linspace()

Examples:

x = sc.inclusiverange(10)        # Like np.arange(11)
x = sc.inclusiverange(3,5,0.2)   # Like np.linspace(3, 5, int((5-3)/0.2+1))
x = sc.inclusiverange(stop=5)    # Like np.arange(6)
x = sc.inclusiverange(6, step=2) # Like np.arange(0, 7, 2)
randround(x)[source]

Round a float, list, or array probabilistically to the nearest integer. Works for both positive and negative values.

Adapted from:

https://stackoverflow.com/questions/19045971/random-rounding-to-integer-in-python

Parameters

x (int, list, arr) – the floating point numbers to probabilistically convert to the nearest integer

Returns

Array of integers

Example:

sc.randround(np.random.randn(8)) # Returns e.g. array([-1,  0,  1, -2,  2,  0,  0,  0])

New in version 1.0.0.

cat(*args, axis=None, copy=False, **kwargs)[source]

Like np.concatenate(), but takes anything and returns an array. Useful for e.g. appending a single number onto the beginning or end of an array.

Parameters
  • args (any) – items to concatenate into an array

  • axis (int) – axis along which to concatenate

  • copy (bool) – whether or not to deepcopy the result

  • kwargs (dict) – passed to np.array()

Examples:

arr = sc.cat(4, np.ones(3))
arr = sc.cat(np.array([1,2,3]), [4,5], 6)
arr = sc.cat(np.random.rand(2,4), np.random.rand(2,6), axis=1)
New in version 1.0.0.
New in version 1.1.0: “copy” and keyword arguments.
rolling(data, window=7, operation='mean', **kwargs)[source]

Alias to Pandas’ rolling() (window) method to smooth a series.

Parameters
  • data (list/arr) – the 1D or 2D data to be smoothed

  • window (int) – the length of the window

  • operation (str) – the operation to perform: ‘mean’ (default), ‘median’, ‘sum’, or ‘none’

  • kwargs (dict) – passed to pd.Series.rolling()

Example:

data = [5,5,5,0,0,0,0,7,7,7,7,0,0,3,3,3]
rolled = sc.rolling(data)
convolve(a, v)[source]

Like np.convolve(), but always returns an array the size of the first array (equivalent to mode=’same’), and solves the boundary problem present in np.convolve() by adjusting the edges by the weight of the convolution kernel.

Parameters
  • a (arr) – the input array

  • v (arr) – the convolution kernel

Example:

a = np.ones(5)
v = np.array([0.3, 0.5, 0.2])
c1 = np.convolve(a, v, mode='same') # Returns array([0.8, 1.  , 1.  , 1.  , 0.7])
c2 = sc.convolve(a, v)              # Returns array([1., 1., 1., 1., 1.])
New in version 1.3.0.
New in version 1.3.1: handling the case where len(a) < len(v)
smooth(data, repeats=None, kernel=None, legacy=False)[source]

Very simple function to smooth a 1D or 2D array.

See also sc.gauss1d() for simple Gaussian smoothing.

Parameters
  • data (arr) – 1D or 2D array to smooth

  • repeats (int) – number of times to apply smoothing (by default, scale to be 1/5th the length of data)

  • kernel (arr) – the smoothing kernel to use (default: [0.25, 0.5, 0.25])

  • legacy (bool) – if True, use the old (pre-1.3.0) method of calculation that doesn’t correct for edge effects

Example:

data = pl.randn(5,5)
smoothdata = sc.smooth(data)

New in version 1.3.0: Fix edge effects.

smoothinterp(newx=None, origx=None, origy=None, smoothness=None, growth=None, ensurefinite=False, keepends=True, method='linear')[source]

Smoothly interpolate over values and keep end points. Same format as numpy.interp().

Parameters
  • newx (arr) – the points at which to interpolate

  • origx (arr) – the original x coordinates

  • origy (arr) – the original y coordinates

  • smoothness (float) – how much to smooth

  • growth (float) – the growth rate to apply past the ends of the data [deprecated]

  • ensurefinite (bool) – ensure all values are finite

  • keepends (bool) – whether to keep the ends [deprecated]

  • method (str) – the type of interpolation to use (options are ‘linear’ or ‘nearest’)

Returns

the new y coordinates

Return type

newy (arr)

Example:

origy = np.array([0,0.1,0.3,0.8,0.7,0.9,0.95,1])
origx = np.linspace(0,1,len(origy))
newx = np.linspace(0,1,5*len(origy))
newy = sc.smoothinterp(newx, origx, origy, smoothness=5)
pl.plot(newx,newy)
pl.scatter(origx,origy)

Version: 2018jan24

gauss1d(x=None, y=None, xi=None, scale=None, use32=True)[source]

Gaussian 1D smoothing kernel.

Create smooth interpolation of input points at interpolated points. If no points are supplied, use the same as the input points.

Parameters
  • x (arr) – 1D list of x coordinates

  • y (arr) – 1D list of y values at each of the x coordinates

  • xi (arr) – 1D list of points to calculate the interpolated y

  • scale (float) – how much smoothing to apply (by default, width of 5 data points)

  • use32 (bool) – convert arrays to 32-bit floats (doubles speed for large arrays)

Examples:

# Setup
import pylab as pl
x = pl.rand(40)
y = (x-0.3)**2 + 0.2*pl.rand(40)

# Smooth
yi = sc.gauss1d(x, y)
yi2 = sc.gauss1d(x, y, scale=0.3)
xi3 = pl.linspace(0,1)
yi3 = sc.gauss1d(x, y, xi)

# Plot oiginal and interpolated versions
pl.scatter(x, y,     label='Original')
pl.scatter(x, yi,    label='Default smoothing')
pl.scatter(x, yi2,   label='More smoothing')
pl.scatter(xi3, yi3, label='Uniform spacing')
pl.show()

# Simple usage
sc.gauss1d(y)

New in version 1.3.0.

gauss2d(x=None, y=None, z=None, xi=None, yi=None, scale=1.0, xscale=1.0, yscale=1.0, grid=False, use32=True)[source]

Gaussian 2D smoothing kernel.

Create smooth interpolation of input points at interpolated points. Can handle either 1D or 2D inputs.

Parameters
  • x (arr) – 1D or 2D array of x coordinates (if None, take from z)

  • y (arr) – ditto, for y

  • z (arr) – 1D or 2D array of z values at each of the (x,y) points

  • xi (arr) – 1D or 2D array of points to calculate the interpolated Z; if None, same as x

  • yi (arr) – ditto, for y

  • scale (float) – overall scale factor

  • xscale (float) – ditto, just for x

  • yscale (float) – ditto, just for y

  • grid (bool) – if True, then return Z at a grid of (xi,yi) rather than at points

  • use32 (bool) – convert arrays to 32-bit floats (doubles speed for large arrays)

Examples:

# Setup
import pylab as pl
x = pl.rand(40)
y = pl.rand(40)
z = 1-(x-0.5)**2 + (y-0.5)**2 # Make a saddle

# Simple usage -- only works if z is 2D
zi0 = sc.gauss2d(pl.rand(10,10))
sc.surf3d(zi0)

# Method 1 -- form grid
xi = pl.linspace(0,1,20)
yi = pl.linspace(0,1,20)
zi = sc.gauss2d(x, y, z, xi, yi, scale=0.1, grid=True)

# Method 2 -- use points directly
xi2 = pl.rand(400)
yi2 = pl.rand(400)
zi2 = sc.gauss2d(x, y, z, xi2, yi2, scale=0.1)

# Plot oiginal and interpolated versions
sc.scatter3d(x, y, z, c=z)
sc.surf3d(zi)
sc.scatter3d(xi2, yi2, zi2, c=zi2)
pl.show()
New in version 1.3.0.
New in version 1.3.1: default arguments; support for 2D inputs