# sciris.sc_math¶

Extensions to Numpy, including finding array elements and smoothing data.

Highlights:

Functions

 `approx` Determine whether two scalars (or an array and a scalar) approximately match. `cat` Like np.concatenate(), but takes anything and returns an array. `convolve` Like `np.convolve()`, but always returns an array the size of the first array (equivalent to mode='same'), and solves the boundary problem present in `np.convolve()` by adjusting the edges by the weight of the convolution kernel. `count` Count the number of matching elements. `dataindex` Take an array of data and return either the first or last (or some other) non-NaN entry. `fillnans` Alias for ``sc.sanitize(..., replacenans=True) with nearest interpolation (or a specified value). `findfirst` Alias for findinds(..., first=True). `findinds` Find matches even if two things aren't eactly equal (e.g. `findlast` Alias for findinds(..., last=True). `findnearest` Return the index of the nearest match in series to value -- like findinds, but always returns an object with the same type as value (i.e. `gauss1d` Gaussian 1D smoothing kernel. `gauss2d` Gaussian 2D smoothing kernel. `getvaliddata` Return the data value indices that are valid based on the validity of the input data. `getvalidinds` Return the indices that are valid based on the validity of the input data from an arbitrary number of 1-D vector inputs. `inclusiverange` Like `np.arange()`/`np.linspace()`, but includes the start and stop points. `isprime` Determine if a number is prime. `linregress` Simple linear regression returning the line of best fit and R value. `normalize` Rescale an array between a minimum value and a maximum value. `normsum` Multiply a list or array by some normalizing factor so that its sum is equal to the total. `numdigits` Count the number of digits in a number (or list of numbers). `perturb` Define an array of numbers uniformly perturbed with a mean of 1. `randround` Round a float, list, or array probabilistically to the nearest integer. `rmnans` Sanitize input to remove NaNs. `rolling` Alias to pandas' rolling() (window) method to smooth a series. `safedivide` Handle divide-by-zero and divide-by-nan elegantly. `sanitize` Sanitize input to remove NaNs. `smooth` Very simple function to smooth a 1D or 2D array. `smoothinterp` Smoothly interpolate over values and keep end points.
approx(val1=None, val2=None, eps=None, **kwargs)[source]

Determine whether two scalars (or an array and a scalar) approximately match. Alias for np.isclose() and may be removed in future versions.

Parameters
• val1 (number or array) – the first value

• val2 (number) – the second value

• eps (float) – absolute tolerance

• kwargs (dict) – passed to np.isclose()

Examples:

```sc.approx(2*6, 11.9999999, eps=1e-6) # Returns True
sc.approx([3,12,11.9], 12) # Returns array([False, True, False], dtype=bool)
```
safedivide(numerator=None, denominator=None, default=None, eps=None, warn=False)[source]

Handle divide-by-zero and divide-by-nan elegantly.

Examples:

```sc.safedivide(numerator=0, denominator=0, default=1, eps=0) # Returns 1
sc.safedivide(numerator=5, denominator=2.0, default=1, eps=1e-3) # Returns 2.5
sc.safedivide(3, np.array([1,3,0]), -1, warn=True) # Returns array([ 3,  1, -1])
```
findinds(arr=None, val=None, *args, eps=1e-06, first=False, last=False, ind=None, die=True, **kwargs)[source]

Find matches even if two things aren’t eactly equal (e.g. floats vs. ints).

If one argument, find nonzero values. With two arguments, check for equality using eps. Returns a tuple of arrays if val1 is multidimensional, else returns an array. Similar to calling `np.nonzero(np.isclose(arr, val))[0]`.

Parameters
• arr (array) – the array to find values in

• val (float) – if provided, the value to match

• args (list) – if provided, additional boolean arrays

• eps (float) – the precision for matching (default 1e-6, equivalent to `np.isclose()`’s atol)

• first (bool) – whether to return the first matching value (equivalent to ind=0)

• last (bool) – whether to return the last matching value (equivalent to ind=-1)

• ind (int) – index of match to retrieve

• die (bool) – whether to raise an exception if first or last is true and no matches were found

• kwargs (dict) – passed to `np.isclose()`

Examples:

```data = np.random.rand(10)
sc.findinds(data<0.5) # Standard usage; returns e.g. array([2, 4, 5, 9])
sc.findinds(data>0.1, data<0.5) # Multiple arguments

sc.findinds([2,3,6,3], 3) # Returs array([1,3])
sc.findinds([2,3,6,3], 3, first=True) # Returns 1
```
New in version 1.2.3: “die” argument
New in version 2.0.0: fix string matching; allow multiple arguments
findfirst(*args, **kwargs)[source]

Alias for findinds(…, first=True). New in version 1.0.0.

findlast(*args, **kwargs)[source]

Alias for findinds(…, last=True). New in version 1.0.0.

findnearest(series=None, value=None)[source]

Return the index of the nearest match in series to value – like findinds, but always returns an object with the same type as value (i.e. findnearest with a number returns a number, findnearest with an array returns an array).

Examples:

```findnearest(rand(10), 0.5) # returns whichever index is closest to 0.5
findnearest([2,3,6,3], 6) # returns 2
findnearest([2,3,6,3], 6) # returns 2
findnearest([0,2,4,6,8,10], [3, 4, 5]) # returns array([1, 2, 2])
```

Version: 2017jan07

count(arr=None, val=None, eps=1e-06, **kwargs)[source]

Count the number of matching elements.

Similar to `np.count_nonzero()`, but allows for slight mismatches (e.g., floats vs. ints). Equivalent to `len(sc.findinds())`.

Parameters
• arr (array) – the array to find values in

• val (float) – if provided, the value to match

• eps (float) – the precision for matching (default 1e-6, equivalent to np.isclose’s atol)

• kwargs (dict) – passed to `np.isclose()`

Examples:

```sc.count(rand(10)<0.5) # returns e.g. 4
sc.count([2,3,6,3], 3) # returs 2
```

New in version 2.0.0.

dataindex(dataarray, index)[source]

Take an array of data and return either the first or last (or some other) non-NaN entry.

This function is deprecated.

getvalidinds(data=None, filterdata=None)[source]

Return the indices that are valid based on the validity of the input data from an arbitrary number of 1-D vector inputs. Warning, closely related to getvaliddata().

This function is deprecated.

Example:

```getvalidinds([3,5,8,13], [2000, nan, nan, 2004]) # Returns array([0,3])
```
sanitize(data=None, returninds=False, replacenans=None, defaultval=None, die=True, verbose=False, label=None)[source]

Sanitize input to remove NaNs. (NB: `sc.sanitize()` and `sc.rmnans()` are aliases.)

Returns an array with the sanitized data. If `replacenans=True`, the sanitized array is of the same length/size as data. If `replacenans=False`, the sanitized array may be shorter than data.

Parameters
• data (arr/list) – array or list with numbers to be sanitized

• returninds (bool) – whether to return indices of non-nan/valid elements, indices are with respect the shape of data

• replacenans (float/str) – whether to replace the NaNs with the specified value, or if `True` or a string, using interpolation

• defaultval (float) – value to return if the sanitized array is empty

• die (bool) – whether to raise an exception if the sanitization failed (otherwise return an empty array)

• verbose (bool) – whether to print out a warning if no valid values are found

• label (str) – human readable label for data (for use with verbose mode only)

Examples:

```data = [3, 4, np.nan, 8, 2, np.nan, np.nan, 8]
sanitized1, inds = sc.sanitize(data, returninds=True) # Remove NaNs
sanitized2 = sc.sanitize(data, replacenans=True) # Replace NaNs using nearest neighbor interpolation
sanitized3 = sc.sanitize(data, replacenans='nearest') # Eequivalent to replacenans=True
sanitized4 = sc.sanitize(data, replacenans='linear') # Replace NaNs using linear interpolation
sanitized5 = sc.sanitize(data, replacenans=0) # Replace NaNs with 0
```

New in version 2.0.0: handle multidimensional arrays

rmnans(data=None, returninds=False, replacenans=None, defaultval=None, die=True, verbose=False, label=None)

Sanitize input to remove NaNs. (NB: `sc.sanitize()` and `sc.rmnans()` are aliases.)

Returns an array with the sanitized data. If `replacenans=True`, the sanitized array is of the same length/size as data. If `replacenans=False`, the sanitized array may be shorter than data.

Parameters
• data (arr/list) – array or list with numbers to be sanitized

• returninds (bool) – whether to return indices of non-nan/valid elements, indices are with respect the shape of data

• replacenans (float/str) – whether to replace the NaNs with the specified value, or if `True` or a string, using interpolation

• defaultval (float) – value to return if the sanitized array is empty

• die (bool) – whether to raise an exception if the sanitization failed (otherwise return an empty array)

• verbose (bool) – whether to print out a warning if no valid values are found

• label (str) – human readable label for data (for use with verbose mode only)

Examples:

```data = [3, 4, np.nan, 8, 2, np.nan, np.nan, 8]
sanitized1, inds = sc.sanitize(data, returninds=True) # Remove NaNs
sanitized2 = sc.sanitize(data, replacenans=True) # Replace NaNs using nearest neighbor interpolation
sanitized3 = sc.sanitize(data, replacenans='nearest') # Eequivalent to replacenans=True
sanitized4 = sc.sanitize(data, replacenans='linear') # Replace NaNs using linear interpolation
sanitized5 = sc.sanitize(data, replacenans=0) # Replace NaNs with 0
```

New in version 2.0.0: handle multidimensional arrays

fillnans(data=None, replacenans=True, **kwargs)[source]

Alias for ``sc.sanitize(…, replacenans=True) with nearest interpolation (or a specified value).

New in version 2.0.0.

Sanitize input to remove NaNs. (NB: `sc.sanitize()` and `sc.rmnans()` are aliases.)

Returns an array with the sanitized data. If `replacenans=True`, the sanitized array is of the same length/size as data. If `replacenans=False`, the sanitized array may be shorter than data.

Args:

data (arr/list) : array or list with numbers to be sanitized returninds (bool) : whether to return indices of non-nan/valid elements, indices are with respect the shape of data replacenans (float/str) : whether to replace the NaNs with the specified value, or if `True` or a string, using interpolation defaultval (float) : value to return if the sanitized array is empty die (bool) : whether to raise an exception if the sanitization failed (otherwise return an empty array) verbose (bool) : whether to print out a warning if no valid values are found label (str) : human readable label for data (for use with verbose mode only)

Examples:

```data = [3, 4, np.nan, 8, 2, np.nan, np.nan, 8]
sanitized1, inds = sc.sanitize(data, returninds=True) # Remove NaNs
sanitized2 = sc.sanitize(data, replacenans=True) # Replace NaNs using nearest neighbor interpolation
sanitized3 = sc.sanitize(data, replacenans='nearest') # Eequivalent to replacenans=True
sanitized4 = sc.sanitize(data, replacenans='linear') # Replace NaNs using linear interpolation
sanitized5 = sc.sanitize(data, replacenans=0) # Replace NaNs with 0
```

New in version 2.0.0: handle multidimensional arrays

getvaliddata(data=None, filterdata=None, defaultind=0)[source]

Return the data value indices that are valid based on the validity of the input data.

This function is deprecated.

Example:

```getvaliddata(array([3,5,8,13]), array([2000, nan, nan, 2004])) # Returns array([3,13])
```
isprime(n, verbose=False)[source]

Determine if a number is prime.

Example:

```for i in range(100): print(i) if sc.isprime(i) else None
```
numdigits(n, *args, count_minus=False, count_decimal=False)[source]

Count the number of digits in a number (or list of numbers).

Useful for e.g. knowing how long a string needs to be to fit a given number.

If a number is less than 1, return the number of digits until the decimal place.

Parameters
• n (int/float/list/array) – number or list of numbers

• args (list) – additional numbers

• count_minus (bool) – whether to count the minus sign as a digit

• count_decimal (bool) – whether to count the decimal point as a digit

Examples:

```sc.numdigits(12345) # Returns 5
sc.numdigits(12345.5) # Returns 5
sc.numdigits(0) # Returns 1
sc.numdigits(-12345) # Returns 5
sc.numdigits(-12345, count_minus=True) # Returns 6
sc.numdigits(12, 123, 12345) # Returns [2, 3, 5]
sc.numdigits(0.01) # Returns -2
sc.numdigits(0.01, count_decimal=True) # Returns -4
```

New in version 2.0.0.

perturb(n=1, span=0.5, randseed=None, normal=False)[source]

Define an array of numbers uniformly perturbed with a mean of 1.

Parameters
• n (int) – number of points

• span (float) – width of distribution on either side of 1

• randseed (int) – seed passed to the reseed numpy’s legacy MT19937 BitGenerator

• normal (bool) – whether to use a normal distribution instead of uniform

Example:

```sc.perturb(5, 0.3) # Returns e.g. array([0.73852362, 0.7088094 , 0.93713658, 1.13150755, 0.87183371])
```
normsum(arr, total=None)[source]

Multiply a list or array by some normalizing factor so that its sum is equal to the total. Formerly called `sc.scaleratio()`.

Parameters
• arr (array) – array (or list) to normalize

• total (float) – amount to sum to (default 1)

Example:

```normarr = sc.normsum([2,5,3,10], 100) # Scale so sum equals 100; returns [10.0, 25.0, 15.0, 50.0]
```

Renamed in version 1.0.0.

normalize(arr, minval=0.0, maxval=1.0)[source]

Rescale an array between a minimum value and a maximum value.

Parameters
• arr (array) – array to normalize

• minval (float) – minimum value in rescaled array

• maxval (float) – maximum value in rescaled array

Example:

```normarr = sc.normalize([2,3,7,27]) # Returns array([0.  , 0.04, 0.2 , 1.  ])
```
inclusiverange(*args, **kwargs)[source]

Like `np.arange()`/`np.linspace()`, but includes the start and stop points. Accepts 0-3 args, or the kwargs start, stop, step.

In most cases, equivalent to `np.linspace(start, stop, int((stop-start)/step)+1)`.

Parameters
• start (float) – value to start at

• stop (float) – value to stop at

• step (float) – step size

• kwargs (dict) – passed to `np.linspace()`

Examples:

```x = sc.inclusiverange(10)        # Like np.arange(11)
x = sc.inclusiverange(3,5,0.2)   # Like np.linspace(3, 5, int((5-3)/0.2+1))
x = sc.inclusiverange(stop=5)    # Like np.arange(6)
x = sc.inclusiverange(6, step=2) # Like np.arange(0, 7, 2)
```
randround(x)[source]

Round a float, list, or array probabilistically to the nearest integer. Works for both positive and negative values.

https://stackoverflow.com/questions/19045971/random-rounding-to-integer-in-python

Parameters

x (int, list, arr) – the floating point numbers to probabilistically convert to the nearest integer

Returns

Array of integers

Example:

```sc.randround(np.random.randn(8)) # Returns e.g. array([-1,  0,  1, -2,  2,  0,  0,  0])
```

New in version 1.0.0.

cat(*args, copy=False, **kwargs)[source]

Like np.concatenate(), but takes anything and returns an array. Useful for e.g. appending a single number onto the beginning or end of an array.

Parameters
• args (any) – items to concatenate into an array

• kwargs (dict) – passed to `np.concatenate()`

Examples:

```arr = sc.cat(4, np.ones(3))
arr = sc.cat(np.array([1,2,3]), [4,5], 6)
arr = sc.cat(np.random.rand(2,4), np.random.rand(2,6), axis=1)
```
New in version 1.0.0.
New in version 1.1.0: “copy” and keyword arguments.
New in version 2.0.2: removed “copy” argument; changed default axis of 0; arguments passed to `np.concatenate()`
linregress(x, y, full=False, **kwargs)[source]

Simple linear regression returning the line of best fit and R value. Similar to `scipy.stats.linregress` but simpler.

Parameters
• x (array) – the x coordinates

• y (array) – the y coordinates

• full (bool) – whether to return a full data structure

• kwargs (dict) – passed to `np.polyfit()`

Examples:

```x = range(10)
y = sorted(2*np.random.rand(10) + 1)
m,b = sc.linregress(x, y) # Simple usage
out = sc.linregress(x, y, full=True) # Has out.m, out.b, out.x, out.y, out.corr, etc.
pl.scatter(x, y)
pl.plot(x, m*x+b)
pl.bar(x, out.residuals)
pl.title(f'R² = {out.r2}')
```
rolling(data, window=7, operation='mean', **kwargs)[source]

Alias to pandas’ rolling() (window) method to smooth a series.

Parameters
• data (list/arr) – the 1D or 2D data to be smoothed

• window (int) – the length of the window

• operation (str) – the operation to perform: ‘mean’ (default), ‘median’, ‘sum’, or ‘none’

• kwargs (dict) – passed to pd.Series.rolling()

Example:

```data = [5,5,5,0,0,0,0,7,7,7,7,0,0,3,3,3]
rolled = sc.rolling(data)
```
convolve(a, v)[source]

Like `np.convolve()`, but always returns an array the size of the first array (equivalent to mode=’same’), and solves the boundary problem present in `np.convolve()` by adjusting the edges by the weight of the convolution kernel.

Parameters
• a (arr) – the input array

• v (arr) – the convolution kernel

Example:

```a = np.ones(5)
v = np.array([0.3, 0.5, 0.2])
c1 = np.convolve(a, v, mode='same') # Returns array([0.8, 1.  , 1.  , 1.  , 0.7])
c2 = sc.convolve(a, v)              # Returns array([1., 1., 1., 1., 1.])
```
New in version 1.3.0.
New in version 1.3.1: handling the case where len(a) < len(v)
smooth(data, repeats=None, kernel=None, legacy=False)[source]

Very simple function to smooth a 1D or 2D array.

See also `sc.gauss1d()` for simple Gaussian smoothing.

Parameters
• data (arr) – 1D or 2D array to smooth

• repeats (int) – number of times to apply smoothing (by default, scale to be 1/5th the length of data)

• kernel (arr) – the smoothing kernel to use (default: `[0.25, 0.5, 0.25]`)

• legacy (bool) – if True, use the old (pre-1.3.0) method of calculation that doesn’t correct for edge effects

Example:

```data = pl.randn(5,5)
smoothdata = sc.smooth(data)
```

New in version 1.3.0: Fix edge effects.

smoothinterp(newx=None, origx=None, origy=None, smoothness=None, growth=None, ensurefinite=False, keepends=True, method='linear')[source]

Smoothly interpolate over values and keep end points. Same format as numpy.interp().

Parameters
• newx (arr) – the points at which to interpolate

• origx (arr) – the original x coordinates

• origy (arr) – the original y coordinates

• smoothness (float) – how much to smooth

• growth (float) – the growth rate to apply past the ends of the data [deprecated]

• ensurefinite (bool) – ensure all values are finite

• keepends (bool) – whether to keep the ends [deprecated]

• method (str) – the type of interpolation to use (options are ‘linear’ or ‘nearest’)

Returns

the new y coordinates

Return type

newy (arr)

Example:

```origy = np.array([0,0.1,0.3,0.8,0.7,0.9,0.95,1])
origx = np.linspace(0,1,len(origy))
newx = np.linspace(0,1,5*len(origy))
newy = sc.smoothinterp(newx, origx, origy, smoothness=5)
pl.plot(newx,newy)
pl.scatter(origx,origy)
```

Version: 2018jan24

gauss1d(x=None, y=None, xi=None, scale=None, use32=True)[source]

Gaussian 1D smoothing kernel.

Create smooth interpolation of input points at interpolated points. If no points are supplied, use the same as the input points.

Parameters
• x (arr) – 1D list of x coordinates

• y (arr) – 1D list of y values at each of the x coordinates

• xi (arr) – 1D list of points to calculate the interpolated y

• scale (float) – how much smoothing to apply (by default, width of 5 data points)

• use32 (bool) – convert arrays to 32-bit floats (doubles speed for large arrays)

Examples:

```# Setup
import pylab as pl
x = pl.rand(40)
y = (x-0.3)**2 + 0.2*pl.rand(40)

# Smooth
yi = sc.gauss1d(x, y)
yi2 = sc.gauss1d(x, y, scale=0.3)
xi3 = pl.linspace(0,1)
yi3 = sc.gauss1d(x, y, xi)

# Plot oiginal and interpolated versions
pl.scatter(x, y,     label='Original')
pl.scatter(x, yi,    label='Default smoothing')
pl.scatter(x, yi2,   label='More smoothing')
pl.scatter(xi3, yi3, label='Uniform spacing')
pl.show()

# Simple usage
sc.gauss1d(y)
```

New in version 1.3.0.

gauss2d(x=None, y=None, z=None, xi=None, yi=None, scale=1.0, xscale=1.0, yscale=1.0, grid=False, use32=True)[source]

Gaussian 2D smoothing kernel.

Create smooth interpolation of input points at interpolated points. Can handle either 1D or 2D inputs.

Parameters
• x (arr) – 1D or 2D array of x coordinates (if None, take from z)

• y (arr) – ditto, for y

• z (arr) – 1D or 2D array of z values at each of the (x,y) points

• xi (arr) – 1D or 2D array of points to calculate the interpolated Z; if None, same as x

• yi (arr) – ditto, for y

• scale (float) – overall scale factor

• xscale (float) – ditto, just for x

• yscale (float) – ditto, just for y

• grid (bool) – if True, then return Z at a grid of (xi,yi) rather than at points

• use32 (bool) – convert arrays to 32-bit floats (doubles speed for large arrays)

Examples:

```# Setup
import pylab as pl
x = pl.rand(40)
y = pl.rand(40)
z = 1-(x-0.5)**2 + (y-0.5)**2 # Make a saddle

# Simple usage -- only works if z is 2D
zi0 = sc.gauss2d(pl.rand(10,10))
sc.surf3d(zi0)

# Method 1 -- form grid
xi = pl.linspace(0,1,20)
yi = pl.linspace(0,1,20)
zi = sc.gauss2d(x, y, z, xi, yi, scale=0.1, grid=True)

# Method 2 -- use points directly
xi2 = pl.rand(400)
yi2 = pl.rand(400)
zi2 = sc.gauss2d(x, y, z, xi2, yi2, scale=0.1)

# Plot oiginal and interpolated versions
sc.scatter3d(x, y, z, c=z)
sc.surf3d(zi)
sc.scatter3d(xi2, yi2, zi2, c=zi2)
pl.show()
```
New in version 1.3.0.
New in version 1.3.1: default arguments; support for 2D inputs