sciris.sc_math

Extensions to Numpy, including finding array elements and smoothing data.

Highlights:

Functions

approx

Determine whether two scalars (or an array and a scalar) approximately match.

cat

Like np.concatenate(), but takes anything and returns an array.

convolve

Like np.convolve(), but always returns an array the size of the first array (equivalent to mode='same'), and solves the boundary problem present in np.convolve() by adjusting the edges by the weight of the convolution kernel.

count

Count the number of matching elements.

dataindex

Take an array of data and return either the first or last (or some other) non-NaN entry.

fillnans

Alias for ``sc.sanitize(..., replacenans=True) with nearest interpolation (or a specified value).

findfirst

Alias for findinds(..., first=True).

findinds

Find matches even if two things aren't eactly equal (e.g.

findlast

Alias for findinds(..., last=True).

findnearest

Return the index of the nearest match in series to value -- like findinds, but always returns an object with the same type as value (i.e.

gauss1d

Gaussian 1D smoothing kernel.

gauss2d

Gaussian 2D smoothing kernel.

getvaliddata

Return the data value indices that are valid based on the validity of the input data.

getvalidinds

Return the indices that are valid based on the validity of the input data from an arbitrary number of 1-D vector inputs.

inclusiverange

Like np.arange()/np.linspace(), but includes the start and stop points.

isprime

Determine if a number is prime.

linregress

Simple linear regression returning the line of best fit and R value.

normalize

Rescale an array between a minimum value and a maximum value.

normsum

Multiply a list or array by some normalizing factor so that its sum is equal to the total.

numdigits

Count the number of digits in a number (or list of numbers).

perturb

Define an array of numbers uniformly perturbed with a mean of 1.

randround

Round a float, list, or array probabilistically to the nearest integer.

rmnans

Sanitize input to remove NaNs.

rolling

Alias to pandas' rolling() (window) method to smooth a series.

safedivide

Handle divide-by-zero and divide-by-nan elegantly.

sanitize

Sanitize input to remove NaNs.

smooth

Very simple function to smooth a 1D or 2D array.

smoothinterp

Smoothly interpolate over values and keep end points.

approx(val1=None, val2=None, eps=None, **kwargs)[source]

Determine whether two scalars (or an array and a scalar) approximately match. Alias for np.isclose() and may be removed in future versions.

Parameters
  • val1 (number or array) – the first value

  • val2 (number) – the second value

  • eps (float) – absolute tolerance

  • kwargs (dict) – passed to np.isclose()

Examples:

sc.approx(2*6, 11.9999999, eps=1e-6) # Returns True
sc.approx([3,12,11.9], 12) # Returns array([False, True, False], dtype=bool)
safedivide(numerator=None, denominator=None, default=None, eps=None, warn=False)[source]

Handle divide-by-zero and divide-by-nan elegantly.

Examples:

sc.safedivide(numerator=0, denominator=0, default=1, eps=0) # Returns 1
sc.safedivide(numerator=5, denominator=2.0, default=1, eps=1e-3) # Returns 2.5
sc.safedivide(3, np.array([1,3,0]), -1, warn=True) # Returns array([ 3,  1, -1])
findinds(arr=None, val=None, *args, eps=1e-06, first=False, last=False, ind=None, die=True, **kwargs)[source]

Find matches even if two things aren’t eactly equal (e.g. floats vs. ints).

If one argument, find nonzero values. With two arguments, check for equality using eps. Returns a tuple of arrays if val1 is multidimensional, else returns an array. Similar to calling np.nonzero(np.isclose(arr, val))[0].

Parameters
  • arr (array) – the array to find values in

  • val (float) – if provided, the value to match

  • args (list) – if provided, additional boolean arrays

  • eps (float) – the precision for matching (default 1e-6, equivalent to np.isclose()’s atol)

  • first (bool) – whether to return the first matching value (equivalent to ind=0)

  • last (bool) – whether to return the last matching value (equivalent to ind=-1)

  • ind (int) – index of match to retrieve

  • die (bool) – whether to raise an exception if first or last is true and no matches were found

  • kwargs (dict) – passed to np.isclose()

Examples:

data = np.random.rand(10)
sc.findinds(data<0.5) # Standard usage; returns e.g. array([2, 4, 5, 9])
sc.findinds(data>0.1, data<0.5) # Multiple arguments

sc.findinds([2,3,6,3], 3) # Returs array([1,3])
sc.findinds([2,3,6,3], 3, first=True) # Returns 1
New in version 1.2.3: “die” argument
New in version 2.0.0: fix string matching; allow multiple arguments
findfirst(*args, **kwargs)[source]

Alias for findinds(…, first=True). New in version 1.0.0.

findlast(*args, **kwargs)[source]

Alias for findinds(…, last=True). New in version 1.0.0.

findnearest(series=None, value=None)[source]

Return the index of the nearest match in series to value – like findinds, but always returns an object with the same type as value (i.e. findnearest with a number returns a number, findnearest with an array returns an array).

Examples:

findnearest(rand(10), 0.5) # returns whichever index is closest to 0.5
findnearest([2,3,6,3], 6) # returns 2
findnearest([2,3,6,3], 6) # returns 2
findnearest([0,2,4,6,8,10], [3, 4, 5]) # returns array([1, 2, 2])

Version: 2017jan07

count(arr=None, val=None, eps=1e-06, **kwargs)[source]

Count the number of matching elements.

Similar to np.count_nonzero(), but allows for slight mismatches (e.g., floats vs. ints). Equivalent to len(sc.findinds()).

Parameters
  • arr (array) – the array to find values in

  • val (float) – if provided, the value to match

  • eps (float) – the precision for matching (default 1e-6, equivalent to np.isclose’s atol)

  • kwargs (dict) – passed to np.isclose()

Examples:

sc.count(rand(10)<0.5) # returns e.g. 4
sc.count([2,3,6,3], 3) # returs 2

New in version 2.0.0.

dataindex(dataarray, index)[source]

Take an array of data and return either the first or last (or some other) non-NaN entry.

This function is deprecated.

getvalidinds(data=None, filterdata=None)[source]

Return the indices that are valid based on the validity of the input data from an arbitrary number of 1-D vector inputs. Warning, closely related to getvaliddata().

This function is deprecated.

Example:

getvalidinds([3,5,8,13], [2000, nan, nan, 2004]) # Returns array([0,3])
sanitize(data=None, returninds=False, replacenans=None, defaultval=None, die=True, verbose=False, label=None)[source]

Sanitize input to remove NaNs. (NB: sc.sanitize() and sc.rmnans() are aliases.)

Returns an array with the sanitized data. If replacenans=True, the sanitized array is of the same length/size as data. If replacenans=False, the sanitized array may be shorter than data.

Parameters
  • data (arr/list) – array or list with numbers to be sanitized

  • returninds (bool) – whether to return indices of non-nan/valid elements, indices are with respect the shape of data

  • replacenans (float/str) – whether to replace the NaNs with the specified value, or if True or a string, using interpolation

  • defaultval (float) – value to return if the sanitized array is empty

  • die (bool) – whether to raise an exception if the sanitization failed (otherwise return an empty array)

  • verbose (bool) – whether to print out a warning if no valid values are found

  • label (str) – human readable label for data (for use with verbose mode only)

Examples:

data = [3, 4, np.nan, 8, 2, np.nan, np.nan, 8]
sanitized1, inds = sc.sanitize(data, returninds=True) # Remove NaNs
sanitized2 = sc.sanitize(data, replacenans=True) # Replace NaNs using nearest neighbor interpolation
sanitized3 = sc.sanitize(data, replacenans='nearest') # Eequivalent to replacenans=True
sanitized4 = sc.sanitize(data, replacenans='linear') # Replace NaNs using linear interpolation
sanitized5 = sc.sanitize(data, replacenans=0) # Replace NaNs with 0

New in version 2.0.0: handle multidimensional arrays

rmnans(data=None, returninds=False, replacenans=None, defaultval=None, die=True, verbose=False, label=None)

Sanitize input to remove NaNs. (NB: sc.sanitize() and sc.rmnans() are aliases.)

Returns an array with the sanitized data. If replacenans=True, the sanitized array is of the same length/size as data. If replacenans=False, the sanitized array may be shorter than data.

Parameters
  • data (arr/list) – array or list with numbers to be sanitized

  • returninds (bool) – whether to return indices of non-nan/valid elements, indices are with respect the shape of data

  • replacenans (float/str) – whether to replace the NaNs with the specified value, or if True or a string, using interpolation

  • defaultval (float) – value to return if the sanitized array is empty

  • die (bool) – whether to raise an exception if the sanitization failed (otherwise return an empty array)

  • verbose (bool) – whether to print out a warning if no valid values are found

  • label (str) – human readable label for data (for use with verbose mode only)

Examples:

data = [3, 4, np.nan, 8, 2, np.nan, np.nan, 8]
sanitized1, inds = sc.sanitize(data, returninds=True) # Remove NaNs
sanitized2 = sc.sanitize(data, replacenans=True) # Replace NaNs using nearest neighbor interpolation
sanitized3 = sc.sanitize(data, replacenans='nearest') # Eequivalent to replacenans=True
sanitized4 = sc.sanitize(data, replacenans='linear') # Replace NaNs using linear interpolation
sanitized5 = sc.sanitize(data, replacenans=0) # Replace NaNs with 0

New in version 2.0.0: handle multidimensional arrays

fillnans(data=None, replacenans=True, **kwargs)[source]

Alias for ``sc.sanitize(…, replacenans=True) with nearest interpolation (or a specified value).

New in version 2.0.0.

Sanitize input to remove NaNs. (NB: sc.sanitize() and sc.rmnans() are aliases.)

Returns an array with the sanitized data. If replacenans=True, the sanitized array is of the same length/size as data. If replacenans=False, the sanitized array may be shorter than data.

Args:

data (arr/list) : array or list with numbers to be sanitized returninds (bool) : whether to return indices of non-nan/valid elements, indices are with respect the shape of data replacenans (float/str) : whether to replace the NaNs with the specified value, or if True or a string, using interpolation defaultval (float) : value to return if the sanitized array is empty die (bool) : whether to raise an exception if the sanitization failed (otherwise return an empty array) verbose (bool) : whether to print out a warning if no valid values are found label (str) : human readable label for data (for use with verbose mode only)

Examples:

data = [3, 4, np.nan, 8, 2, np.nan, np.nan, 8]
sanitized1, inds = sc.sanitize(data, returninds=True) # Remove NaNs
sanitized2 = sc.sanitize(data, replacenans=True) # Replace NaNs using nearest neighbor interpolation
sanitized3 = sc.sanitize(data, replacenans='nearest') # Eequivalent to replacenans=True
sanitized4 = sc.sanitize(data, replacenans='linear') # Replace NaNs using linear interpolation
sanitized5 = sc.sanitize(data, replacenans=0) # Replace NaNs with 0

New in version 2.0.0: handle multidimensional arrays

getvaliddata(data=None, filterdata=None, defaultind=0)[source]

Return the data value indices that are valid based on the validity of the input data.

This function is deprecated.

Example:

getvaliddata(array([3,5,8,13]), array([2000, nan, nan, 2004])) # Returns array([3,13])
isprime(n, verbose=False)[source]

Determine if a number is prime.

From https://stackoverflow.com/questions/15285534/isprime-function-for-python-language

Example:

for i in range(100): print(i) if sc.isprime(i) else None
numdigits(n, *args, count_minus=False, count_decimal=False)[source]

Count the number of digits in a number (or list of numbers).

Useful for e.g. knowing how long a string needs to be to fit a given number.

If a number is less than 1, return the number of digits until the decimal place.

Reference: https://stackoverflow.com/questions/22656345/how-to-count-the-number-of-digits-in-python

Parameters
  • n (int/float/list/array) – number or list of numbers

  • args (list) – additional numbers

  • count_minus (bool) – whether to count the minus sign as a digit

  • count_decimal (bool) – whether to count the decimal point as a digit

Examples:

sc.numdigits(12345) # Returns 5
sc.numdigits(12345.5) # Returns 5
sc.numdigits(0) # Returns 1
sc.numdigits(-12345) # Returns 5
sc.numdigits(-12345, count_minus=True) # Returns 6
sc.numdigits(12, 123, 12345) # Returns [2, 3, 5]
sc.numdigits(0.01) # Returns -2
sc.numdigits(0.01, count_decimal=True) # Returns -4

New in version 2.0.0.

perturb(n=1, span=0.5, randseed=None, normal=False)[source]

Define an array of numbers uniformly perturbed with a mean of 1.

Parameters
  • n (int) – number of points

  • span (float) – width of distribution on either side of 1

  • randseed (int) – seed passed to the reseed numpy’s legacy MT19937 BitGenerator

  • normal (bool) – whether to use a normal distribution instead of uniform

Example:

sc.perturb(5, 0.3) # Returns e.g. array([0.73852362, 0.7088094 , 0.93713658, 1.13150755, 0.87183371])
normsum(arr, total=None)[source]

Multiply a list or array by some normalizing factor so that its sum is equal to the total. Formerly called sc.scaleratio().

Parameters
  • arr (array) – array (or list) to normalize

  • total (float) – amount to sum to (default 1)

Example:

normarr = sc.normsum([2,5,3,10], 100) # Scale so sum equals 100; returns [10.0, 25.0, 15.0, 50.0]

Renamed in version 1.0.0.

normalize(arr, minval=0.0, maxval=1.0)[source]

Rescale an array between a minimum value and a maximum value.

Parameters
  • arr (array) – array to normalize

  • minval (float) – minimum value in rescaled array

  • maxval (float) – maximum value in rescaled array

Example:

normarr = sc.normalize([2,3,7,27]) # Returns array([0.  , 0.04, 0.2 , 1.  ])
inclusiverange(*args, **kwargs)[source]

Like np.arange()/np.linspace(), but includes the start and stop points. Accepts 0-3 args, or the kwargs start, stop, step.

In most cases, equivalent to np.linspace(start, stop, int((stop-start)/step)+1).

Parameters
  • start (float) – value to start at

  • stop (float) – value to stop at

  • step (float) – step size

  • kwargs (dict) – passed to np.linspace()

Examples:

x = sc.inclusiverange(10)        # Like np.arange(11)
x = sc.inclusiverange(3,5,0.2)   # Like np.linspace(3, 5, int((5-3)/0.2+1))
x = sc.inclusiverange(stop=5)    # Like np.arange(6)
x = sc.inclusiverange(6, step=2) # Like np.arange(0, 7, 2)
randround(x)[source]

Round a float, list, or array probabilistically to the nearest integer. Works for both positive and negative values.

Adapted from:

https://stackoverflow.com/questions/19045971/random-rounding-to-integer-in-python

Parameters

x (int, list, arr) – the floating point numbers to probabilistically convert to the nearest integer

Returns

Array of integers

Example:

sc.randround(np.random.randn(8)) # Returns e.g. array([-1,  0,  1, -2,  2,  0,  0,  0])

New in version 1.0.0.

cat(*args, copy=False, **kwargs)[source]

Like np.concatenate(), but takes anything and returns an array. Useful for e.g. appending a single number onto the beginning or end of an array.

Parameters
  • args (any) – items to concatenate into an array

  • kwargs (dict) – passed to np.concatenate()

Examples:

arr = sc.cat(4, np.ones(3))
arr = sc.cat(np.array([1,2,3]), [4,5], 6)
arr = sc.cat(np.random.rand(2,4), np.random.rand(2,6), axis=1)
New in version 1.0.0.
New in version 1.1.0: “copy” and keyword arguments.
New in version 2.0.2: removed “copy” argument; changed default axis of 0; arguments passed to np.concatenate()
linregress(x, y, full=False, **kwargs)[source]

Simple linear regression returning the line of best fit and R value. Similar to scipy.stats.linregress but simpler.

Parameters
  • x (array) – the x coordinates

  • y (array) – the y coordinates

  • full (bool) – whether to return a full data structure

  • kwargs (dict) – passed to np.polyfit()

Examples:

x = range(10)
y = sorted(2*np.random.rand(10) + 1)
m,b = sc.linregress(x, y) # Simple usage
out = sc.linregress(x, y, full=True) # Has out.m, out.b, out.x, out.y, out.corr, etc.
pl.scatter(x, y)
pl.plot(x, m*x+b)
pl.bar(x, out.residuals)
pl.title(f'R² = {out.r2}')
rolling(data, window=7, operation='mean', **kwargs)[source]

Alias to pandas’ rolling() (window) method to smooth a series.

Parameters
  • data (list/arr) – the 1D or 2D data to be smoothed

  • window (int) – the length of the window

  • operation (str) – the operation to perform: ‘mean’ (default), ‘median’, ‘sum’, or ‘none’

  • kwargs (dict) – passed to pd.Series.rolling()

Example:

data = [5,5,5,0,0,0,0,7,7,7,7,0,0,3,3,3]
rolled = sc.rolling(data)
convolve(a, v)[source]

Like np.convolve(), but always returns an array the size of the first array (equivalent to mode=’same’), and solves the boundary problem present in np.convolve() by adjusting the edges by the weight of the convolution kernel.

Parameters
  • a (arr) – the input array

  • v (arr) – the convolution kernel

Example:

a = np.ones(5)
v = np.array([0.3, 0.5, 0.2])
c1 = np.convolve(a, v, mode='same') # Returns array([0.8, 1.  , 1.  , 1.  , 0.7])
c2 = sc.convolve(a, v)              # Returns array([1., 1., 1., 1., 1.])
New in version 1.3.0.
New in version 1.3.1: handling the case where len(a) < len(v)
smooth(data, repeats=None, kernel=None, legacy=False)[source]

Very simple function to smooth a 1D or 2D array.

See also sc.gauss1d() for simple Gaussian smoothing.

Parameters
  • data (arr) – 1D or 2D array to smooth

  • repeats (int) – number of times to apply smoothing (by default, scale to be 1/5th the length of data)

  • kernel (arr) – the smoothing kernel to use (default: [0.25, 0.5, 0.25])

  • legacy (bool) – if True, use the old (pre-1.3.0) method of calculation that doesn’t correct for edge effects

Example:

data = pl.randn(5,5)
smoothdata = sc.smooth(data)

New in version 1.3.0: Fix edge effects.

smoothinterp(newx=None, origx=None, origy=None, smoothness=None, growth=None, ensurefinite=False, keepends=True, method='linear')[source]

Smoothly interpolate over values and keep end points. Same format as numpy.interp().

Parameters
  • newx (arr) – the points at which to interpolate

  • origx (arr) – the original x coordinates

  • origy (arr) – the original y coordinates

  • smoothness (float) – how much to smooth

  • growth (float) – the growth rate to apply past the ends of the data [deprecated]

  • ensurefinite (bool) – ensure all values are finite

  • keepends (bool) – whether to keep the ends [deprecated]

  • method (str) – the type of interpolation to use (options are ‘linear’ or ‘nearest’)

Returns

the new y coordinates

Return type

newy (arr)

Example:

origy = np.array([0,0.1,0.3,0.8,0.7,0.9,0.95,1])
origx = np.linspace(0,1,len(origy))
newx = np.linspace(0,1,5*len(origy))
newy = sc.smoothinterp(newx, origx, origy, smoothness=5)
pl.plot(newx,newy)
pl.scatter(origx,origy)

Version: 2018jan24

gauss1d(x=None, y=None, xi=None, scale=None, use32=True)[source]

Gaussian 1D smoothing kernel.

Create smooth interpolation of input points at interpolated points. If no points are supplied, use the same as the input points.

Parameters
  • x (arr) – 1D list of x coordinates

  • y (arr) – 1D list of y values at each of the x coordinates

  • xi (arr) – 1D list of points to calculate the interpolated y

  • scale (float) – how much smoothing to apply (by default, width of 5 data points)

  • use32 (bool) – convert arrays to 32-bit floats (doubles speed for large arrays)

Examples:

# Setup
import pylab as pl
x = pl.rand(40)
y = (x-0.3)**2 + 0.2*pl.rand(40)

# Smooth
yi = sc.gauss1d(x, y)
yi2 = sc.gauss1d(x, y, scale=0.3)
xi3 = pl.linspace(0,1)
yi3 = sc.gauss1d(x, y, xi)

# Plot oiginal and interpolated versions
pl.scatter(x, y,     label='Original')
pl.scatter(x, yi,    label='Default smoothing')
pl.scatter(x, yi2,   label='More smoothing')
pl.scatter(xi3, yi3, label='Uniform spacing')
pl.show()

# Simple usage
sc.gauss1d(y)

New in version 1.3.0.

gauss2d(x=None, y=None, z=None, xi=None, yi=None, scale=1.0, xscale=1.0, yscale=1.0, grid=False, use32=True)[source]

Gaussian 2D smoothing kernel.

Create smooth interpolation of input points at interpolated points. Can handle either 1D or 2D inputs.

Parameters
  • x (arr) – 1D or 2D array of x coordinates (if None, take from z)

  • y (arr) – ditto, for y

  • z (arr) – 1D or 2D array of z values at each of the (x,y) points

  • xi (arr) – 1D or 2D array of points to calculate the interpolated Z; if None, same as x

  • yi (arr) – ditto, for y

  • scale (float) – overall scale factor

  • xscale (float) – ditto, just for x

  • yscale (float) – ditto, just for y

  • grid (bool) – if True, then return Z at a grid of (xi,yi) rather than at points

  • use32 (bool) – convert arrays to 32-bit floats (doubles speed for large arrays)

Examples:

# Setup
import pylab as pl
x = pl.rand(40)
y = pl.rand(40)
z = 1-(x-0.5)**2 + (y-0.5)**2 # Make a saddle

# Simple usage -- only works if z is 2D
zi0 = sc.gauss2d(pl.rand(10,10))
sc.surf3d(zi0)

# Method 1 -- form grid
xi = pl.linspace(0,1,20)
yi = pl.linspace(0,1,20)
zi = sc.gauss2d(x, y, z, xi, yi, scale=0.1, grid=True)

# Method 2 -- use points directly
xi2 = pl.rand(400)
yi2 = pl.rand(400)
zi2 = sc.gauss2d(x, y, z, xi2, yi2, scale=0.1)

# Plot oiginal and interpolated versions
sc.scatter3d(x, y, z, c=z)
sc.surf3d(zi)
sc.scatter3d(xi2, yi2, zi2, c=zi2)
pl.show()
New in version 1.3.0.
New in version 1.3.1: default arguments; support for 2D inputs