sciris.sc_fileio¶
Functions for reading/writing to files, including pickles, JSONs, and Excel.
- Highlights:
sc.saveobj()/sc.loadobj()
: efficiently save/load any Python object (via pickling)sc.savejson()/sc.loadjson()
: likewise, for JSONssc.thisdir()
: get current foldersc.getfilelist()
: easy way to access glob
Functions
Dump an object as a bytes-like string (rarely used); see |
|
A shortcut for using glob. |
|
This is the main conversion function for Python data-structures into JSON-compatible data structures (note: sanitizejson/jsonify are identical). |
|
Save any Python object to a JSON file using jsonpickle. |
|
Use jsonpickle to restore an object (see jsonpickle()) |
|
Load a file that has been saved as a gzipped pickle file, e.g. |
|
Convenience function for reading a JSON file (or string). |
|
Load a file that has been saved as a gzipped pickle file, e.g. |
|
Try to load as a (Sciris-saved) Python 3 pickle; if that fails, try to load as a Python 2 pickle. |
|
Load a spreadsheet as a dataframe or a list of lists. |
|
Like loadobj(), but for a bytes-like string (rarely used). |
|
Convenience function for reading a text file |
|
Create a class -- not an object! -- that contains the failure info for a pickle that failed to load |
|
Utility for taking a filename and folder -- or not -- and generating a valid path from them. |
|
Alias to pathlib.Path(). |
|
Takes a potentially Linux- and Windows-unfriendly candidate file name, and returns a "sanitized" version that is more usable. |
|
This is the main conversion function for Python data-structures into JSON-compatible data structures (note: sanitizejson/jsonify are identical). |
|
Save an object to file as a gzipped pickle -- use compression 5 by default, since more is much slower but not much smaller. |
|
Convenience function for saving to a JSON file. |
|
Save an object to file as a gzipped pickle -- use compression 5 by default, since more is much slower but not much smaller. |
|
Not-so-little function to format data nicely for Excel. |
|
Convenience function for saving a text file -- accepts a string or list of strings. |
|
Create a zip file from the supplied list of files |
|
Tiny helper function to get the folder for a file, usually the current file. |
|
Classes
A wrapper for a binary file -- rarely used directly. |
|
Another empty class to represent a failed object loading, but do not proceed with setstate |
|
An empty class to represent a failed object loading |
|
A class for reading and writing Excel files in binary format. |
|
A universal failed object class, that preserves as much data as possible |
- loadobj(filename=None, folder=None, verbose=False, die=None, remapping=None, method='pickle', **kwargs)[source]¶
Load a file that has been saved as a gzipped pickle file, e.g. by
sc.saveobj()
. Accepts either a filename (standard usage) or a file object as the first argument. Note thatloadobj()
/load()
are aliases of each other.Note: be careful when loading pickle files, since a malicious pickle can be used to execute arbitrary code.
When a pickle file is loaded, Python imports any modules that are referenced in it. This is a problem if module has been renamed. In this case, you can use the
remapping
argument to point to the new modules or classes.- Parameters
filename (str/Path) – the filename (or full path) to load
folder (str/Path) – the folder
verbose (bool) – print details
die (bool) – whether to raise an exception if errors are encountered (otherwise, load as much as possible)
remapping (dict) – way of mapping old/unavailable module names to new
method (str) – method for loading (usually pickle or dill)
kwargs (dict) – passed to pickle.loads()/dill.loads()
Examples:
obj = sc.loadobj('myfile.obj') # Standard usage old = sc.loadobj('my-old-file.obj', method='dill', ignore=True) # Load classes from saved files old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':cat.Mat}) # If loading a saved object containing a reference to foo.Bar that is now cat.Mat old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':('cat', 'Mat')}) # Equivalent to the above
New in version 1.1.0: “remapping” argumentNew in version 1.2.2: ability to load non-gzipped pickles; support for dill; arguments passed to loader
- loadstr(string, verbose=False, die=None, remapping=None)[source]¶
Like loadobj(), but for a bytes-like string (rarely used).
Example:
obj = sc.objdict(a=1, b=2) string1 = sc.dumpstr(obj) string2 = sc.loadstr(string1) assert string1 == string2
- saveobj(filename=None, obj=None, compresslevel=5, verbose=0, folder=None, method='pickle', die=True, *args, **kwargs)[source]¶
Save an object to file as a gzipped pickle – use compression 5 by default, since more is much slower but not much smaller. Once saved, can be loaded with sc.loadobj(). Note that saveobj()/save() are identical.
- Parameters
filename (str or Path) – the filename to save to; if str, passed to sc.makefilepath()
obj (literally anything) – the object to save
compresslevel (int) – the level of gzip compression
verbose (int) – detail to print
folder (str) – passed to sc.makefilepath()
method (str) – whether to use pickle (default) or dill
die (bool) – whether to fail if no object is provided
args (list) – passed to pickle.dumps()
kwargs (dict) – passed to pickle.dumps()
Example:
myobj = ['this', 'is', 'a', 'weird', {'object':44}] sc.saveobj('myfile.obj', myobj) sc.saveobj('myfile.obj', myobj, method='dill') # Use dill instead, to save custom classes as well
New in version 1.1.1: removed Python 2 support.New in version 1.2.2: automatic swapping of arguments if order is incorrect; correct passing of arguments
- load(filename=None, folder=None, verbose=False, die=None, remapping=None, method='pickle', **kwargs)¶
Load a file that has been saved as a gzipped pickle file, e.g. by
sc.saveobj()
. Accepts either a filename (standard usage) or a file object as the first argument. Note thatloadobj()
/load()
are aliases of each other.Note: be careful when loading pickle files, since a malicious pickle can be used to execute arbitrary code.
When a pickle file is loaded, Python imports any modules that are referenced in it. This is a problem if module has been renamed. In this case, you can use the
remapping
argument to point to the new modules or classes.- Parameters
filename (str/Path) – the filename (or full path) to load
folder (str/Path) – the folder
verbose (bool) – print details
die (bool) – whether to raise an exception if errors are encountered (otherwise, load as much as possible)
remapping (dict) – way of mapping old/unavailable module names to new
method (str) – method for loading (usually pickle or dill)
kwargs (dict) – passed to pickle.loads()/dill.loads()
Examples:
obj = sc.loadobj('myfile.obj') # Standard usage old = sc.loadobj('my-old-file.obj', method='dill', ignore=True) # Load classes from saved files old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':cat.Mat}) # If loading a saved object containing a reference to foo.Bar that is now cat.Mat old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':('cat', 'Mat')}) # Equivalent to the above
New in version 1.1.0: “remapping” argumentNew in version 1.2.2: ability to load non-gzipped pickles; support for dill; arguments passed to loader
- save(filename=None, obj=None, compresslevel=5, verbose=0, folder=None, method='pickle', die=True, *args, **kwargs)¶
Save an object to file as a gzipped pickle – use compression 5 by default, since more is much slower but not much smaller. Once saved, can be loaded with sc.loadobj(). Note that saveobj()/save() are identical.
- Parameters
filename (str or Path) – the filename to save to; if str, passed to sc.makefilepath()
obj (literally anything) – the object to save
compresslevel (int) – the level of gzip compression
verbose (int) – detail to print
folder (str) – passed to sc.makefilepath()
method (str) – whether to use pickle (default) or dill
die (bool) – whether to fail if no object is provided
args (list) – passed to pickle.dumps()
kwargs (dict) – passed to pickle.dumps()
Example:
myobj = ['this', 'is', 'a', 'weird', {'object':44}] sc.saveobj('myfile.obj', myobj) sc.saveobj('myfile.obj', myobj, method='dill') # Use dill instead, to save custom classes as well
New in version 1.1.1: removed Python 2 support.New in version 1.2.2: automatic swapping of arguments if order is incorrect; correct passing of arguments
- loadtext(filename=None, folder=None, splitlines=False)[source]¶
Convenience function for reading a text file
Example:
mytext = sc.loadtext('my-document.txt')
- savetext(filename=None, string=None)[source]¶
Convenience function for saving a text file – accepts a string or list of strings.
Example:
text = ['Here', 'is', 'a', 'poem'] sc.savetext('my-document.txt', text)
- savezip(filename=None, filelist=None, folder=None, basename=True, verbose=True)[source]¶
Create a zip file from the supplied list of files
Example:
scripts = sc.getfilelist('./code/*.py') sc.savezip('scripts.zip', scripts)
- getfilelist(folder=None, pattern=None, abspath=False, nopath=False, filesonly=False, foldersonly=False, recursive=False, aspath=None)[source]¶
A shortcut for using glob.
- Parameters
folder (str) – the folder to find files in (default, current)
pattern (str) – the pattern to match (default, wildcard); can be excluded if part of the folder
abspath (bool) – whether to return the full path
nopath (bool) – whether to return no path
filesonly (bool) – whether to only return files (not folders)
foldersonly (bool) – whether to only return folders (not files)
recursive (bool) – passed to glob()
aspath (bool) – whether to return Path objects
- Returns
List of files/folders
Examples:
sc.getfilelist() # return all files and folders in current folder sc.getfilelist('~/temp', '*.py', abspath=True) # return absolute paths of all Python files in ~/temp folder sc.getfilelist('~/temp/*.py') # Like above
New in version 1.1.0: “aspath” argument
- sanitizefilename(rawfilename)[source]¶
Takes a potentially Linux- and Windows-unfriendly candidate file name, and returns a “sanitized” version that is more usable.
Example:
bad_name = 'How*is*this*even*a*filename?!.doc' good_name = sc.sanitizefilename(bad_name) # Returns 'How_is_this_even_a_filename.doc'
- makefilepath(filename=None, folder=None, ext=None, default=None, split=False, aspath=None, abspath=True, makedirs=True, checkexists=None, sanitize=False, die=True, verbose=False)[source]¶
Utility for taking a filename and folder – or not – and generating a valid path from them. By default, this function will combine a filename and folder using os.path.join, create the folder(s) if needed with os.makedirs, and return the absolute path.
- Parameters
filename (str or Path) – the filename, or full file path, to save to – in which case this utility does nothing
folder (str/Path/list) – the name of the folder to be prepended to the filename; if a list, fed to
os.path.join()
ext (str) – the extension to ensure the file has
default (str or list) – a name or list of names to use if filename is None
split (bool) – whether to return the path and filename separately
aspath (bool) – whether to return a Path object
makedirs (bool) – whether or not to make the folders to save into if they don’t exist
checkexists (bool) – if False/True, raises an exception if the path does/doesn’t exist
sanitize (bool) – whether or not to remove special characters from the path; see
sc.sanitizefilename()
for detailsverbose (bool) – how much detail to print
- Returns
the validated path (or the folder and filename if split=True)
- Return type
filepath (str or Path)
Simple example:
filepath = sc.makefilepath('myfile.obj') # Equivalent to os.path.abspath(os.path.expanduser('myfile.obj'))
Complex example:
filepath = makefilepath(filename=None, folder='./congee', ext='prj', default=[project.filename, project.name], split=True, abspath=True, makedirs=True)
Assuming project.filename is None and project.name is “recipe” and ./congee doesn’t exist, this will makes folder ./congee and returns e.g. (‘/home/myname/congee’, ‘recipe.prj’)
New in version 1.1.0: “aspath” argument
- path(*args, **kwargs)[source]¶
Alias to pathlib.Path(). New in version 1.2.2.
PurePath subclass that can make system calls.
Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.
- thisdir(file=None, path=None, *args, aspath=None, **kwargs)[source]¶
Tiny helper function to get the folder for a file, usually the current file. If not supplied, then use the current file.
- Parameters
file (str) – the file to get the directory from; usually __file__
path (str/list) – additional path to append; passed to os.path.join()
args (list) – also passed to os.path.join()
aspath (bool) – whether to return a Path object instead of a string
kwargs (dict) – passed to Path()
- Returns
the full path to the folder (or filename if additional arguments are given)
- Return type
filepath (str)
Examples:
thisdir = sc.thisdir() # Get folder of calling file thisdir = sc.thisdir('.') # Ditto (usually) thisdir = sc.thisdir(__file__) # Ditto (usually) file_in_same_dir = sc.thisdir(path='new_file.txt') file_in_sub_dir = sc.thisdir('..', 'tests', 'mytests.py') # Merge parent folder with sufolders and a file np_dir = sc.thisdir(np) # Get the folder that Numpy is loaded from (assuming "import numpy as np")
New in version 1.1.0: “as_path” argument renamed “aspath”New in version 1.2.2: “path” argumentNew in version 1.3.0: allow modules
- sanitizejson(obj, verbose=True, die=False, tostring=False, **kwargs)[source]¶
This is the main conversion function for Python data-structures into JSON-compatible data structures (note: sanitizejson/jsonify are identical).
- Parameters
obj (any) – almost any kind of data structure that is a combination of list, numpy.ndarray, odicts, etc.
verbose (bool) – level of detail to print
die (bool) – whether or not to raise an exception if conversion failed (otherwise, return a string)
tostring (bool) – whether to return a string representation of the sanitized object instead of the object itself
kwargs (dict) – passed to json.dumps() if tostring=True
- Returns
the converted object that should be JSON compatible, or its representation as a string if tostring=True
- Return type
object (any or str)
Version: 2020apr11
- jsonify(obj, verbose=True, die=False, tostring=False, **kwargs)¶
This is the main conversion function for Python data-structures into JSON-compatible data structures (note: sanitizejson/jsonify are identical).
- Parameters
obj (any) – almost any kind of data structure that is a combination of list, numpy.ndarray, odicts, etc.
verbose (bool) – level of detail to print
die (bool) – whether or not to raise an exception if conversion failed (otherwise, return a string)
tostring (bool) – whether to return a string representation of the sanitized object instead of the object itself
kwargs (dict) – passed to json.dumps() if tostring=True
- Returns
the converted object that should be JSON compatible, or its representation as a string if tostring=True
- Return type
object (any or str)
Version: 2020apr11
- loadjson(filename=None, folder=None, string=None, fromfile=True, **kwargs)[source]¶
Convenience function for reading a JSON file (or string).
- Parameters
filename (str) – the file to load, or the JSON object if using positional arguments
folder (str) – folder if not part of the filename
string (str) – if not loading from a file, a string representation of the JSON
fromfile (bool) – whether or not to load from file
kwargs (dict) – passed to json.load()
- Returns
the JSON object
- Return type
output (dict)
Examples:
json = sc.loadjson('my-file.json') json = sc.loadjson(string='{"a":null, "b":[1,2,3]}')
- savejson(filename=None, obj=None, folder=None, die=True, indent=2, keepnone=False, **kwargs)[source]¶
Convenience function for saving to a JSON file.
- Parameters
filename (str) – the file to save
obj (anything) – the object to save; if not already in JSON format, conversion will be attempted
folder (str) – folder if not part of the filename
die (bool) – whether or not to raise an exception if saving an empty object
indent (int) – indentation to use for saved JSON
keepnone (bool) – allow
sc.savejson(None)
to return ‘null’ rather than raising an exceptionkwargs (dict) – passed to json.dump()
- Returns
None
Example:
json = {'foo':'bar', 'data':[1,2,3]} sc.savejson('my-file.json', json)
- jsonpickle(obj, tostring=False)[source]¶
Save any Python object to a JSON file using jsonpickle.
- Parameters
obj (any) – the object to pickle as a JSON
tostring (bool) – whether to return a string (rather than the JSONified Python object)
- Returns
Either a string or a Python object for the JSON
Wrapper for the jsonpickle library: https://jsonpickle.github.io/
- class Blobject(source=None, name=None, filename=None, blob=None)[source]¶
A wrapper for a binary file – rarely used directly.
So named because it’s an object representing a blob.
“source” is a specification of where to get the data from. It can be anything supported by Blobject.load() which are (a) a filename, which will get loaded, or (b) a io.BytesIO which will get dumped into this instance
Alternatively, can specify
blob
which is a binary string that gets stored directly in theblob
attribute- load(source=None)[source]¶
This function loads the spreadsheet from a file or object. If no input argument is supplied, then it will read self.bytes, assuming it exists.
- class Spreadsheet(*args, **kwargs)[source]¶
A class for reading and writing Excel files in binary format. No disk IO needs to happen to manipulate the spreadsheets with openpyxl (or xlrd or pandas).
New in version 1.3.0: Changed default from xlrd to openpyxl and added self.wb attribute to avoid the need to reload workbooks.
- writecells(cells=None, startrow=None, startcol=None, vals=None, sheetname=None, sheetnum=None, verbose=False, wbargs=None)[source]¶
Specify cells to write. Can supply either a list of cells of the same length as the values, or else specify a starting row and column and write the values from there.
Examples:
S = sc.Spreadsheet() S.writecells(cells=['A6','B7'], vals=['Cat','Dog']) # Method 1 S.writecells(cells=[np.array([2,3])+i for i in range(2)], vals=['Foo', 'Bar']) # Method 2 S.writecells(startrow=14, startcol=1, vals=np.random.rand(3,3)) # Method 3 S.save('myfile.xlsx')
- loadspreadsheet(filename=None, folder=None, fileobj=None, sheet=0, asdataframe=None, header=True, method='pandas', **kwargs)[source]¶
Load a spreadsheet as a dataframe or a list of lists.
By default, an alias to
pandas.read_excel()
with a header, but also supports loading via openpyxl or xlrd. Read from either a filename or a file object.- Parameters
filename (str) – filename or path to read
folder (str) – optional folder to use with the filename
fileobj (obj) – load from file object rather than path
sheet (str/int/list) – name or number of sheet(s) to use (default 0)
asdataframe (bool) – whether to return as a pandas/Sciris dataframe (default True)
method (str) – how to read (default ‘pandas’, other choices ‘openpyxl’ and ‘xlrd’)
kwargs (dict) – passed to pd.read_excel(), openpyxl(), etc.
Examples:
df = sc.loadspreadsheet('myfile.xlsx') # Alias to pd.read_excel(header=1) wb = sc.loadspreadsheet('myfile.xlsx', method='openpyxl') # Returns workbook data = sc.loadspreadsheet('myfile.xlsx', method='xlrd', asdataframe=False) # Returns raw data; requires xlrd
New in version 1.3.0: change default from xlrd to pandas; renamed sheetname and sheetnum arguments to sheet.
- savespreadsheet(filename=None, data=None, folder=None, sheetnames=None, close=True, formats=None, formatdata=None, verbose=False)[source]¶
Not-so-little function to format data nicely for Excel.
Note: this function, while not deprecated, is not actively maintained.
Examples:
import sciris as sc import pylab as pl # Simple example testdata1 = pl.rand(8,3) sc.savespreadsheet(filename='test1.xlsx', data=testdata1) # Include column headers test2headers = [['A','B','C']] # Need double to get right shape test2values = pl.rand(8,3).tolist() testdata2 = test2headers + test2values sc.savespreadsheet(filename='test2.xlsx', data=testdata2) # Multiple sheets testdata3 = [pl.rand(10,10), pl.rand(20,5)] sheetnames = ['Ten by ten', 'Twenty by five'] sc.savespreadsheet(filename='test3.xlsx', data=testdata3, sheetnames=sheetnames) # Supply data as an odict testdata4 = sc.odict([('First sheet', pl.rand(6,2)), ('Second sheet', pl.rand(3,3))]) sc.savespreadsheet(filename='test4.xlsx', data=testdata4, sheetnames=sheetnames) # Include formatting nrows = 15 ncols = 3 formats = { 'header':{'bold':True, 'bg_color':'#3c7d3e', 'color':'#ffffff'}, 'plain': {}, 'big': {'bg_color':'#ffcccc'}} testdata5 = pl.zeros((nrows+1, ncols), dtype=object) # Includes header row formatdata = pl.zeros((nrows+1, ncols), dtype=object) # Format data needs to be the same size testdata5[0,:] = ['A', 'B', 'C'] # Create header testdata5[1:,:] = pl.rand(nrows,ncols) # Create data formatdata[1:,:] = 'plain' # Format data formatdata[testdata5>0.7] = 'big' # Find "big" numbers and format them differently formatdata[0,:] = 'header' # Format header sc.savespreadsheet(filename='test5.xlsx', data=testdata5, formats=formats, formatdata=formatdata)
- class Empty(*args, **kwargs)[source]¶
Another empty class to represent a failed object loading, but do not proceed with setstate
- loadobj2or3(filename=None, filestring=None, recursionlimit=None, **kwargs)[source]¶
Try to load as a (Sciris-saved) Python 3 pickle; if that fails, try to load as a Python 2 pickle. For legacy support only.
For available keyword arguments, see sc.load().
- Parameters
filename (str) – the name of the file to load
filestring (str) – alternatively, specify an already-loaded bytestring
recursionlimit (int) – how deeply to parse objects before failing (default 1000)