sciris.sc_fileio

Functions for reading/writing to files, including pickles, JSONs, and Excel.

Highlights:
  • sc.saveobj()/sc.loadobj(): efficiently save/load any Python object (via pickling)

  • sc.savejson()/sc.loadjson(): likewise, for JSONs

  • sc.thisdir(): get current folder

  • sc.getfilelist(): easy way to access glob

Functions

dumpstr

Dump an object as a bytes-like string (rarely used); see sc.loadstr()

getfilelist

A shortcut for using glob.

jsonify

This is the main conversion function for Python data-structures into JSON-compatible data structures (note: sanitizejson/jsonify are identical).

jsonpickle

Save any Python object to a JSON file using jsonpickle.

jsonunpickle

Use jsonpickle to restore an object (see jsonpickle())

load

Load a file that has been saved as a gzipped pickle file, e.g.

loadjson

Convenience function for reading a JSON file (or string).

loadobj

Load a file that has been saved as a gzipped pickle file, e.g.

loadobj2or3

Try to load as a (Sciris-saved) Python 3 pickle; if that fails, try to load as a Python 2 pickle.

loadspreadsheet

Load a spreadsheet as a dataframe or a list of lists.

loadstr

Like loadobj(), but for a bytes-like string (rarely used).

loadtext

Convenience function for reading a text file

makefailed

Create a class -- not an object! -- that contains the failure info for a pickle that failed to load

makefilepath

Utility for taking a filename and folder -- or not -- and generating a valid path from them.

path

Alias to pathlib.Path().

pickleMethod

sanitizefilename

Takes a potentially Linux- and Windows-unfriendly candidate file name, and returns a "sanitized" version that is more usable.

sanitizejson

This is the main conversion function for Python data-structures into JSON-compatible data structures (note: sanitizejson/jsonify are identical).

save

Save an object to file as a gzipped pickle -- use compression 5 by default, since more is much slower but not much smaller.

savejson

Convenience function for saving to a JSON file.

saveobj

Save an object to file as a gzipped pickle -- use compression 5 by default, since more is much slower but not much smaller.

savespreadsheet

Not-so-little function to format data nicely for Excel.

savetext

Convenience function for saving a text file -- accepts a string or list of strings.

savezip

Create a zip file from the supplied list of files

thisdir

Tiny helper function to get the folder for a file, usually the current file.

unpickleMethod

Classes

Blobject

A wrapper for a binary file -- rarely used directly.

Empty

Another empty class to represent a failed object loading, but do not proceed with setstate

Failed

An empty class to represent a failed object loading

Spreadsheet

A class for reading and writing Excel files in binary format.

UniversalFailed

A universal failed object class, that preserves as much data as possible

loadobj(filename=None, folder=None, verbose=False, die=None, remapping=None, method='pickle', **kwargs)[source]

Load a file that has been saved as a gzipped pickle file, e.g. by sc.saveobj(). Accepts either a filename (standard usage) or a file object as the first argument. Note that loadobj()/load() are aliases of each other.

Note: be careful when loading pickle files, since a malicious pickle can be used to execute arbitrary code.

When a pickle file is loaded, Python imports any modules that are referenced in it. This is a problem if module has been renamed. In this case, you can use the remapping argument to point to the new modules or classes.

Parameters
  • filename (str/Path) – the filename (or full path) to load

  • folder (str/Path) – the folder

  • verbose (bool) – print details

  • die (bool) – whether to raise an exception if errors are encountered (otherwise, load as much as possible)

  • remapping (dict) – way of mapping old/unavailable module names to new

  • method (str) – method for loading (usually pickle or dill)

  • kwargs (dict) – passed to pickle.loads()/dill.loads()

Examples:

obj = sc.loadobj('myfile.obj') # Standard usage
old = sc.loadobj('my-old-file.obj', method='dill', ignore=True) # Load classes from saved files
old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':cat.Mat}) # If loading a saved object containing a reference to foo.Bar that is now cat.Mat
old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':('cat', 'Mat')}) # Equivalent to the above
New in version 1.1.0: “remapping” argument
New in version 1.2.2: ability to load non-gzipped pickles; support for dill; arguments passed to loader
loadstr(string, verbose=False, die=None, remapping=None)[source]

Like loadobj(), but for a bytes-like string (rarely used).

Example:

obj = sc.objdict(a=1, b=2)
string1 = sc.dumpstr(obj)
string2 = sc.loadstr(string1)
assert string1 == string2
saveobj(filename=None, obj=None, compresslevel=5, verbose=0, folder=None, method='pickle', die=True, *args, **kwargs)[source]

Save an object to file as a gzipped pickle – use compression 5 by default, since more is much slower but not much smaller. Once saved, can be loaded with sc.loadobj(). Note that saveobj()/save() are identical.

Parameters
  • filename (str or Path) – the filename to save to; if str, passed to sc.makefilepath()

  • obj (literally anything) – the object to save

  • compresslevel (int) – the level of gzip compression

  • verbose (int) – detail to print

  • folder (str) – passed to sc.makefilepath()

  • method (str) – whether to use pickle (default) or dill

  • die (bool) – whether to fail if no object is provided

  • args (list) – passed to pickle.dumps()

  • kwargs (dict) – passed to pickle.dumps()

Example:

myobj = ['this', 'is', 'a', 'weird', {'object':44}]
sc.saveobj('myfile.obj', myobj)
sc.saveobj('myfile.obj', myobj, method='dill') # Use dill instead, to save custom classes as well
New in version 1.1.1: removed Python 2 support.
New in version 1.2.2: automatic swapping of arguments if order is incorrect; correct passing of arguments
dumpstr(obj=None)[source]

Dump an object as a bytes-like string (rarely used); see sc.loadstr()

load(filename=None, folder=None, verbose=False, die=None, remapping=None, method='pickle', **kwargs)

Load a file that has been saved as a gzipped pickle file, e.g. by sc.saveobj(). Accepts either a filename (standard usage) or a file object as the first argument. Note that loadobj()/load() are aliases of each other.

Note: be careful when loading pickle files, since a malicious pickle can be used to execute arbitrary code.

When a pickle file is loaded, Python imports any modules that are referenced in it. This is a problem if module has been renamed. In this case, you can use the remapping argument to point to the new modules or classes.

Parameters
  • filename (str/Path) – the filename (or full path) to load

  • folder (str/Path) – the folder

  • verbose (bool) – print details

  • die (bool) – whether to raise an exception if errors are encountered (otherwise, load as much as possible)

  • remapping (dict) – way of mapping old/unavailable module names to new

  • method (str) – method for loading (usually pickle or dill)

  • kwargs (dict) – passed to pickle.loads()/dill.loads()

Examples:

obj = sc.loadobj('myfile.obj') # Standard usage
old = sc.loadobj('my-old-file.obj', method='dill', ignore=True) # Load classes from saved files
old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':cat.Mat}) # If loading a saved object containing a reference to foo.Bar that is now cat.Mat
old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':('cat', 'Mat')}) # Equivalent to the above
New in version 1.1.0: “remapping” argument
New in version 1.2.2: ability to load non-gzipped pickles; support for dill; arguments passed to loader
save(filename=None, obj=None, compresslevel=5, verbose=0, folder=None, method='pickle', die=True, *args, **kwargs)

Save an object to file as a gzipped pickle – use compression 5 by default, since more is much slower but not much smaller. Once saved, can be loaded with sc.loadobj(). Note that saveobj()/save() are identical.

Parameters
  • filename (str or Path) – the filename to save to; if str, passed to sc.makefilepath()

  • obj (literally anything) – the object to save

  • compresslevel (int) – the level of gzip compression

  • verbose (int) – detail to print

  • folder (str) – passed to sc.makefilepath()

  • method (str) – whether to use pickle (default) or dill

  • die (bool) – whether to fail if no object is provided

  • args (list) – passed to pickle.dumps()

  • kwargs (dict) – passed to pickle.dumps()

Example:

myobj = ['this', 'is', 'a', 'weird', {'object':44}]
sc.saveobj('myfile.obj', myobj)
sc.saveobj('myfile.obj', myobj, method='dill') # Use dill instead, to save custom classes as well
New in version 1.1.1: removed Python 2 support.
New in version 1.2.2: automatic swapping of arguments if order is incorrect; correct passing of arguments
loadtext(filename=None, folder=None, splitlines=False)[source]

Convenience function for reading a text file

Example:

mytext = sc.loadtext('my-document.txt')
savetext(filename=None, string=None)[source]

Convenience function for saving a text file – accepts a string or list of strings.

Example:

text = ['Here', 'is', 'a', 'poem']
sc.savetext('my-document.txt', text)
savezip(filename=None, filelist=None, folder=None, basename=True, verbose=True)[source]

Create a zip file from the supplied list of files

Example:

scripts = sc.getfilelist('./code/*.py')
sc.savezip('scripts.zip', scripts)
getfilelist(folder=None, pattern=None, abspath=False, nopath=False, filesonly=False, foldersonly=False, recursive=False, aspath=None)[source]

A shortcut for using glob.

Parameters
  • folder (str) – the folder to find files in (default, current)

  • pattern (str) – the pattern to match (default, wildcard); can be excluded if part of the folder

  • abspath (bool) – whether to return the full path

  • nopath (bool) – whether to return no path

  • filesonly (bool) – whether to only return files (not folders)

  • foldersonly (bool) – whether to only return folders (not files)

  • recursive (bool) – passed to glob()

  • aspath (bool) – whether to return Path objects

Returns

List of files/folders

Examples:

sc.getfilelist() # return all files and folders in current folder
sc.getfilelist('~/temp', '*.py', abspath=True) # return absolute paths of all Python files in ~/temp folder
sc.getfilelist('~/temp/*.py') # Like above

New in version 1.1.0: “aspath” argument

sanitizefilename(rawfilename)[source]

Takes a potentially Linux- and Windows-unfriendly candidate file name, and returns a “sanitized” version that is more usable.

Example:

bad_name = 'How*is*this*even*a*filename?!.doc'
good_name = sc.sanitizefilename(bad_name) # Returns 'How_is_this_even_a_filename.doc'
makefilepath(filename=None, folder=None, ext=None, default=None, split=False, aspath=None, abspath=True, makedirs=True, checkexists=None, sanitize=False, die=True, verbose=False)[source]

Utility for taking a filename and folder – or not – and generating a valid path from them. By default, this function will combine a filename and folder using os.path.join, create the folder(s) if needed with os.makedirs, and return the absolute path.

Parameters
  • filename (str or Path) – the filename, or full file path, to save to – in which case this utility does nothing

  • folder (str/Path/list) – the name of the folder to be prepended to the filename; if a list, fed to os.path.join()

  • ext (str) – the extension to ensure the file has

  • default (str or list) – a name or list of names to use if filename is None

  • split (bool) – whether to return the path and filename separately

  • aspath (bool) – whether to return a Path object

  • makedirs (bool) – whether or not to make the folders to save into if they don’t exist

  • checkexists (bool) – if False/True, raises an exception if the path does/doesn’t exist

  • sanitize (bool) – whether or not to remove special characters from the path; see sc.sanitizefilename() for details

  • verbose (bool) – how much detail to print

Returns

the validated path (or the folder and filename if split=True)

Return type

filepath (str or Path)

Simple example:

filepath = sc.makefilepath('myfile.obj') # Equivalent to os.path.abspath(os.path.expanduser('myfile.obj'))

Complex example:

filepath = makefilepath(filename=None, folder='./congee', ext='prj', default=[project.filename, project.name], split=True, abspath=True, makedirs=True)

Assuming project.filename is None and project.name is “recipe” and ./congee doesn’t exist, this will makes folder ./congee and returns e.g. (‘/home/myname/congee’, ‘recipe.prj’)

New in version 1.1.0: “aspath” argument

path(*args, **kwargs)[source]

Alias to pathlib.Path(). New in version 1.2.2.

PurePath subclass that can make system calls.

Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.

thisdir(file=None, path=None, *args, aspath=None, **kwargs)[source]

Tiny helper function to get the folder for a file, usually the current file. If not supplied, then use the current file.

Parameters
  • file (str) – the file to get the directory from; usually __file__

  • path (str/list) – additional path to append; passed to os.path.join()

  • args (list) – also passed to os.path.join()

  • aspath (bool) – whether to return a Path object instead of a string

  • kwargs (dict) – passed to Path()

Returns

the full path to the folder (or filename if additional arguments are given)

Return type

filepath (str)

Examples:

thisdir = sc.thisdir() # Get folder of calling file
thisdir = sc.thisdir('.') # Ditto (usually)
thisdir = sc.thisdir(__file__) # Ditto (usually)
file_in_same_dir = sc.thisdir(path='new_file.txt')
file_in_sub_dir = sc.thisdir('..', 'tests', 'mytests.py') # Merge parent folder with sufolders and a file
np_dir = sc.thisdir(np) # Get the folder that Numpy is loaded from (assuming "import numpy as np")
New in version 1.1.0: “as_path” argument renamed “aspath”
New in version 1.2.2: “path” argument
New in version 1.3.0: allow modules
sanitizejson(obj, verbose=True, die=False, tostring=False, **kwargs)[source]

This is the main conversion function for Python data-structures into JSON-compatible data structures (note: sanitizejson/jsonify are identical).

Parameters
  • obj (any) – almost any kind of data structure that is a combination of list, numpy.ndarray, odicts, etc.

  • verbose (bool) – level of detail to print

  • die (bool) – whether or not to raise an exception if conversion failed (otherwise, return a string)

  • tostring (bool) – whether to return a string representation of the sanitized object instead of the object itself

  • kwargs (dict) – passed to json.dumps() if tostring=True

Returns

the converted object that should be JSON compatible, or its representation as a string if tostring=True

Return type

object (any or str)

Version: 2020apr11

jsonify(obj, verbose=True, die=False, tostring=False, **kwargs)

This is the main conversion function for Python data-structures into JSON-compatible data structures (note: sanitizejson/jsonify are identical).

Parameters
  • obj (any) – almost any kind of data structure that is a combination of list, numpy.ndarray, odicts, etc.

  • verbose (bool) – level of detail to print

  • die (bool) – whether or not to raise an exception if conversion failed (otherwise, return a string)

  • tostring (bool) – whether to return a string representation of the sanitized object instead of the object itself

  • kwargs (dict) – passed to json.dumps() if tostring=True

Returns

the converted object that should be JSON compatible, or its representation as a string if tostring=True

Return type

object (any or str)

Version: 2020apr11

loadjson(filename=None, folder=None, string=None, fromfile=True, **kwargs)[source]

Convenience function for reading a JSON file (or string).

Parameters
  • filename (str) – the file to load, or the JSON object if using positional arguments

  • folder (str) – folder if not part of the filename

  • string (str) – if not loading from a file, a string representation of the JSON

  • fromfile (bool) – whether or not to load from file

  • kwargs (dict) – passed to json.load()

Returns

the JSON object

Return type

output (dict)

Examples:

json = sc.loadjson('my-file.json')
json = sc.loadjson(string='{"a":null, "b":[1,2,3]}')
savejson(filename=None, obj=None, folder=None, die=True, indent=2, keepnone=False, **kwargs)[source]

Convenience function for saving to a JSON file.

Parameters
  • filename (str) – the file to save

  • obj (anything) – the object to save; if not already in JSON format, conversion will be attempted

  • folder (str) – folder if not part of the filename

  • die (bool) – whether or not to raise an exception if saving an empty object

  • indent (int) – indentation to use for saved JSON

  • keepnone (bool) – allow sc.savejson(None) to return ‘null’ rather than raising an exception

  • kwargs (dict) – passed to json.dump()

Returns

None

Example:

json = {'foo':'bar', 'data':[1,2,3]}
sc.savejson('my-file.json', json)
jsonpickle(obj, tostring=False)[source]

Save any Python object to a JSON file using jsonpickle.

Parameters
  • obj (any) – the object to pickle as a JSON

  • tostring (bool) – whether to return a string (rather than the JSONified Python object)

Returns

Either a string or a Python object for the JSON

Wrapper for the jsonpickle library: https://jsonpickle.github.io/

jsonunpickle(json)[source]

Use jsonpickle to restore an object (see jsonpickle())

class Blobject(source=None, name=None, filename=None, blob=None)[source]

A wrapper for a binary file – rarely used directly.

So named because it’s an object representing a blob.

“source” is a specification of where to get the data from. It can be anything supported by Blobject.load() which are (a) a filename, which will get loaded, or (b) a io.BytesIO which will get dumped into this instance

Alternatively, can specify blob which is a binary string that gets stored directly in the blob attribute

load(source=None)[source]

This function loads the spreadsheet from a file or object. If no input argument is supplied, then it will read self.bytes, assuming it exists.

save(filename=None)[source]

This function writes the spreadsheet to a file on disk.

tofile(output=True)[source]

Return a file-like object with the contents of the file.

This can then be used to open the workbook from memory without writing anything to disk e.g.

  • book = openpyxl.load_workbook(self.tofile())

  • book = xlrd.open_workbook(file_contents=self.tofile().read())

freshbytes()[source]

Refresh the bytes object to accept new data

class Spreadsheet(*args, **kwargs)[source]

A class for reading and writing Excel files in binary format. No disk IO needs to happen to manipulate the spreadsheets with openpyxl (or xlrd or pandas).

New in version 1.3.0: Changed default from xlrd to openpyxl and added self.wb attribute to avoid the need to reload workbooks.

xlrd(reload=False, store=True, **kwargs)[source]

Return a book as opened by xlrd

openpyxl(reload=False, store=True, **kwargs)[source]

Return a book as opened by openpyxl

openpyexcel(*args, **kwargs)[source]

Legacy name for openpyxl()

pandas(reload=False, store=True, **kwargs)[source]

Return a book as opened by pandas

update(book)[source]

Updated the stored spreadsheet with book instead

readcells(wbargs=None, *args, **kwargs)[source]

Alias to loadspreadsheet()

writecells(cells=None, startrow=None, startcol=None, vals=None, sheetname=None, sheetnum=None, verbose=False, wbargs=None)[source]

Specify cells to write. Can supply either a list of cells of the same length as the values, or else specify a starting row and column and write the values from there.

Examples:

S = sc.Spreadsheet()
S.writecells(cells=['A6','B7'], vals=['Cat','Dog']) # Method 1
S.writecells(cells=[np.array([2,3])+i for i in range(2)], vals=['Foo', 'Bar']) # Method 2
S.writecells(startrow=14, startcol=1, vals=np.random.rand(3,3)) # Method 3
S.save('myfile.xlsx')
loadspreadsheet(filename=None, folder=None, fileobj=None, sheet=0, asdataframe=None, header=True, method='pandas', **kwargs)[source]

Load a spreadsheet as a dataframe or a list of lists.

By default, an alias to pandas.read_excel() with a header, but also supports loading via openpyxl or xlrd. Read from either a filename or a file object.

Parameters
  • filename (str) – filename or path to read

  • folder (str) – optional folder to use with the filename

  • fileobj (obj) – load from file object rather than path

  • sheet (str/int/list) – name or number of sheet(s) to use (default 0)

  • asdataframe (bool) – whether to return as a pandas/Sciris dataframe (default True)

  • method (str) – how to read (default ‘pandas’, other choices ‘openpyxl’ and ‘xlrd’)

  • kwargs (dict) – passed to pd.read_excel(), openpyxl(), etc.

Examples:

df = sc.loadspreadsheet('myfile.xlsx') # Alias to pd.read_excel(header=1)
wb = sc.loadspreadsheet('myfile.xlsx', method='openpyxl') # Returns workbook
data = sc.loadspreadsheet('myfile.xlsx', method='xlrd', asdataframe=False) # Returns raw data; requires xlrd

New in version 1.3.0: change default from xlrd to pandas; renamed sheetname and sheetnum arguments to sheet.

savespreadsheet(filename=None, data=None, folder=None, sheetnames=None, close=True, formats=None, formatdata=None, verbose=False)[source]

Not-so-little function to format data nicely for Excel.

Note: this function, while not deprecated, is not actively maintained.

Examples:

import sciris as sc
import pylab as pl

# Simple example
testdata1 = pl.rand(8,3)
sc.savespreadsheet(filename='test1.xlsx', data=testdata1)

# Include column headers
test2headers = [['A','B','C']] # Need double to get right shape
test2values = pl.rand(8,3).tolist()
testdata2 = test2headers + test2values
sc.savespreadsheet(filename='test2.xlsx', data=testdata2)

# Multiple sheets
testdata3 = [pl.rand(10,10), pl.rand(20,5)]
sheetnames = ['Ten by ten', 'Twenty by five']
sc.savespreadsheet(filename='test3.xlsx', data=testdata3, sheetnames=sheetnames)

# Supply data as an odict
testdata4 = sc.odict([('First sheet', pl.rand(6,2)), ('Second sheet', pl.rand(3,3))])
sc.savespreadsheet(filename='test4.xlsx', data=testdata4, sheetnames=sheetnames)

# Include formatting
nrows = 15
ncols = 3
formats = {
    'header':{'bold':True, 'bg_color':'#3c7d3e', 'color':'#ffffff'},
    'plain': {},
    'big':   {'bg_color':'#ffcccc'}}
testdata5  = pl.zeros((nrows+1, ncols), dtype=object) # Includes header row
formatdata = pl.zeros((nrows+1, ncols), dtype=object) # Format data needs to be the same size
testdata5[0,:] = ['A', 'B', 'C'] # Create header
testdata5[1:,:] = pl.rand(nrows,ncols) # Create data
formatdata[1:,:] = 'plain' # Format data
formatdata[testdata5>0.7] = 'big' # Find "big" numbers and format them differently
formatdata[0,:] = 'header' # Format header
sc.savespreadsheet(filename='test5.xlsx', data=testdata5, formats=formats, formatdata=formatdata)
class Failed(*args, **kwargs)[source]

An empty class to represent a failed object loading

class Empty(*args, **kwargs)[source]

Another empty class to represent a failed object loading, but do not proceed with setstate

loadobj2or3(filename=None, filestring=None, recursionlimit=None, **kwargs)[source]

Try to load as a (Sciris-saved) Python 3 pickle; if that fails, try to load as a Python 2 pickle. For legacy support only.

For available keyword arguments, see sc.load().

Parameters
  • filename (str) – the name of the file to load

  • filestring (str) – alternatively, specify an already-loaded bytestring

  • recursionlimit (int) – how deeply to parse objects before failing (default 1000)