sciris.sc_dataframe

Extension of the pandas dataframe to be more flexible, especially with filtering rows/columns and concatenating data.

Classes

dataframe

An extension of the pandas dataframe with additional convenience methods for accessing rows and columns and performing other operations.

class dataframe(data=None, columns=None, nrows=None, **kwargs)[source]

An extension of the pandas dataframe with additional convenience methods for accessing rows and columns and performing other operations.

Parameters
  • data (dict/array/dataframe) – the data to use

  • columns (list) – column labels

  • nrows (int) – the number of arrows to preallocate (default 0)

  • kwargs (dict) – passed to pd.DataFrame()

Examples:

a = sc.dataframe(cols=['x','y'],data=[[1238,2],[384,5],[666,7]]) # Create data frame
a['x'] # Print out a column
a[0] # Print out a row
a['x',0] # Print out an element
a[0] = [123,6]; print(a) # Set values for a whole row
a['y'] = [8,5,0]; print(a) # Set values for a whole column
a['z'] = [14,14,14]; print(a) # Add new column
a.addcol('z', [14,14,14]); print(a) # Alternate way to add new column
a.rmcol('z'); print(a) # Remove a column
a.pop(1); print(a) # Remove a row
a.append([555,2,14]); print(a) # Append a new row
a.insert(1,[555,2,14]); print(a) # Insert a new row
a.sort(); print(a) # Sort by the first column
a.sort('y'); print(a) # Sort by the second column
a.addrow([555,2,14]); print(a) # Replace the previous row and sort
a.getrow(1) # Return the row starting with value '1'
a.rmrow(); print(a) # Remove last row
a.rmrow(1238); print(a) # Remove the row starting with element '3'

The dataframe can be used for both numeric and non-numeric data.

New in version 2.0.0: subclass pandas DataFrame
property cols

Get columns as a list

set(key, value=None)[source]

Alias to pandas __setitem__ method

flexget(cols=None, rows=None, asarray=False, cast=True, default=None)[source]

More complicated way of getting data from a dataframe. While getting directly by key usually returns the array data directly, this usually returns another dataframe.

Parameters
  • cols (str/list) – the column(s) to get

  • rows (int/list) – the row(s) to get

  • asarray (bool) – whether to return an array (otherwise, return a dataframe)

  • cast (bool) – attempt to cast to an all-numeric array

  • default (any) – the value to return if the column(s)/row(s) can’t be found

Example:

df = sc.dataframe(cols=['x','y','z'],data=[[1238,2,-1],[384,5,-2],[666,7,-3]]) # Create data frame
df.flexget(cols=['x','z'], rows=[0,2])
disp(nrows=None, ncols=None, width=999, precision=4, options=None)[source]

Flexible display of a dataframe, showing all rows/columns by default.

Parameters
  • nrows (int) – maximum number of rows to show (default: all)

  • ncols (int) – maximum number of columns to show (default: all)

  • width (int) – maximum screen width (default: 999)

  • precision (int) – number of decimal places to show (default: 4)

  • kwargs (dict) – passed to pd.option_context()

Examples:

df = sc.dataframe(data=np.random.rand(100,10))
df.disp()
df.disp(precision=1, ncols=5, options={'display.colheader_justify': 'left'})

New in version 2.0.1.

poprow(key, returnval=True)[source]

Remove a row from the data frame

replacedata(newdata=None, newdf=None, reset_index=True, inplace=True)[source]

Replace data in the dataframe with other data

Parameters
  • newdata (array) – replace the dataframe’s data with these data

  • newdf (dataframe) – substitute the current dataframe with this one

  • reset_index (bool) – update the index

  • inplace (bool) – whether to modify in-place

appendrow(value, reset_index=True, inplace=True)[source]

Add a row to the end of the dataframe. See also concat() and insertrow().

Parameters
  • value (array) – the row(s) to append

  • reset_index (bool) – update the index

  • inplace (bool) – whether to modify in-place

insertrow(row=0, value=None, reset_index=True, inplace=True)[source]

Insert a row at the specified location. See also concat() and appendrow().

Parameters
  • row (int) – index at which to insert new row(s)

  • value (array) – the row(s) to insert

  • reset_index (bool) – update the index

  • inplace (bool) – whether to modify in-place

concat(data, *args, columns=None, reset_index=True, inplace=False, dfargs=None, **kwargs)[source]

Concatenate additional data onto the current dataframe. See also `appendrow() and insertrow().

Parameters
  • data (dataframe/array) – the data to concatenate

  • *args (dataframe/array) – additional data to concatenate

  • columns (list) – if supplied, columns to go with the data

  • reset_index (bool) – update the index

  • inplace (bool) – whether to append in place

  • dfargs (dict) – arguments passed to construct each dataframe

  • **kwargs (dict) – passed to pd.concat()

New in version 2.0.2: “inplace” defaults to False
static cat(data, *args, dfargs=None, **kwargs)[source]

Convenience method for concatenating multiple dataframes.

Parameters
  • data (dataframe/array) – the dataframe/data to use as the basis of the new dataframe

  • args (list) – additional dataframes (or object that can be converted to dataframes) to concatenate

  • dfargs (dict) – arguments passed to construct each dataframe

  • kwargs (dict) – passed to sc.dataframe.concat()

Example:

arr1 = np.random.rand(6,3)
df2 = pd.DataFrame(np.random.rand(4,3))
df3 = sc.dataframe.cat(arr1, df2)

New in version 2.0.2.

property ncols

Get the number of columns in the dataframe

property nrows

Get the number of rows in the dataframe

addcol(key=None, value=None)[source]

Add a new column to the data frame – for consistency only

rmcol(key, die=True)[source]

Remove a column or columns from the data frame

rmrow(value=None, col=None, returnval=False, die=True)[source]

Like pop, but removes by matching the value in the given column instead of the index

rmrows(inds=None, reset_index=True, inplace=True)[source]

Remove rows by index

replacecol(col=None, old=None, new=None)[source]

Replace all of one value in a column with a new value

to_odict(row=None)[source]

Convert dataframe to a dict of columns, optionally specifying certain rows.

Parameters

row (int/list) – the rows to include

findrow(value=None, col=None, default=None, closest=False, die=False, asdict=False)[source]

Return a row by searching for a matching value.

Parameters
  • value – the value to look for

  • col – the column to look for this value in

  • default – the value to return if key is not found (overrides die)

  • closest – whether or not to return the closest row (overrides default and die)

  • die – whether to raise an exception if the value is not found

  • asdict – whether to return results as dict rather than list

Example:

df = dataframe(cols=['year','val'],data=[[2016,0.3],[2017,0.5]])
df.findrow(2016) # returns array([2016, 0.3], dtype=object)
df.findrow(2013) # returns None, or exception if die is True
df.findrow(2013, closest=True) # returns array([2016, 0.3], dtype=object)
df.findrow(2016, asdict=True) # returns {'year':2016, 'val':0.3}
findinds(value=None, col=None)[source]

Return the indices of all rows matching the given key in a given column.

filterin(inds=None, value=None, col=None, verbose=False, reset_index=True, inplace=False)[source]

Keep only rows matching a criterion

filterout(inds=None, value=None, col=None, verbose=False, reset_index=True, inplace=False)[source]

Remove rows matching a criterion (in place)

filtercols(cols=None, die=True, reset_index=True, inplace=False)[source]

Filter columns keeping only those specified – note, by default, do not perform in place

sortrows(col=None, reverse=False, returninds=False)[source]

Sort the dataframe rows in place by the specified column(s)

sortcols(sortorder=None, reverse=False, returninds=False)[source]

Like sortrows(), but change column order (in place) instead

to_pandas(**kwargs)[source]

Convert to a plain pandas dataframe