class reproman.utils.HashableDict[source]

Bases: dict

Dict that can be used as keys

class reproman.utils.PathRoot(predicate)[source]

Bases: object

Find the root of paths based on a predicate function.

The path -> root mapping is cached across calls.

Parameters:predicate (callable) – A callable that will be passed a path and should return true if that path should be considered a root.
class reproman.utils.SemanticVersion(major, minor, patch, tag)

Bases: tuple


Alias for field number 0


Alias for field number 1


Alias for field number 2


Alias for field number 3

Return if any of regexes (list or str) searches succesfully for value

reproman.utils.assure_bytes(s, encoding='utf-8')[source]

Convert/encode unicode to bytes if of ‘str’

Parameters:encoding (str, optional) – Encoding to use. “utf-8” is the default
reproman.utils.assure_dict_from_str(s, **kwargs)[source]

Given a multiline string with key=value items convert it to a dictionary

  • s (str or dict) –
  • None if input s is empty (Returns) –

Make sure directory exists.

Joins the list of arguments to an os-specific path to the desired directory and creates it, if it not exists yet.


Given not a list, would place it into a list. If None - empty list is returned

Parameters:s (list or anything) –
reproman.utils.assure_list_from_str(s, sep='\n')[source]

Given a multiline string convert it to a list of return None if empty

Parameters:s (str or list) –

Given an object, wrap into a tuple if not list or tuple

reproman.utils.assure_unicode(s, encoding=None, confidence=None)[source]

Convert/decode to str if of ‘bytes’

  • encoding (str, optional) – Encoding to use. If None, “utf-8” is tried, and then if not a valid UTF-8, encoding will be guessed
  • confidence (float, optional) – A value between 0 and 1, so if guessing of encoding is of lower than specified confidence, ValueError is raised
reproman.utils.attrib(*args, **kwargs)[source]

Extend the attr.ib to include our metadata elements.

ATM we support additional keyword args which are then stored within metadata: - doc for documentation to describe the attribute (e.g. in –help)

Also, when the default argument of attr.ib is unspecified, set it to None.


Decorator for a class to assign it an automagic quick and dirty __repr__

It uses public class attributes to prepare repr of a class

Original idea:


Cache a property’s return value.

This avoids using lru_cache, which is more complicated than needed for simple properties and isn’t available in Python 2’s stdlib.

Use this only if the property’s return value is constant over the life of the object. This isn’t appropriate for a property with a setter or a property whose getter value may change based some outside state.

This should be positioned below the @property declaration.

class reproman.utils.chpwd(path, mkdir=False, logsuffix='')[source]

Bases: object

Wrapper around os.chdir which also adjusts environ[‘PWD’]

The reason is that otherwise PWD is simply inherited from the shell and we have no ability to assess directory path without dereferencing symlinks.

If used as a context manager it allows to temporarily change directory to the given path


Creates a filter for CommandErrors that match a specific error string

Parameters:err_string (basestring) – The error string we want to match
Return type:func object -> boolean

Convert command to the string representation.

Parameters:command (list or str) – If it is a list, convert it to a string, quoting each element as needed. If it is a string, it is returned as is.

Encode unicode filename


Surround filename in “” and escape ” in the filename

reproman.utils.execute_command_batch(session, command, args, exception_filter=None)[source]

Generator that executes session.execute_command, with batches of args

We want to call commands like “apt-cache policy” on a large number of packages, but risk creating command-lines that are too long. This function is a generator that will call execute_command but with batches of arguments (to stay within the command-line length limit) and yield the results.

  • session – Session object that implements the execute_command() member
  • command (sequence) – The command that we wish to execute
  • args (sequence) – The long list of additional arguments we wish to pass to the command
  • exception_filter (func x -> bool) – A filter of exception types that the calling code will gracefully handle

stdout of the command, stderr of the command, and an exception that is in the list of expected exceptions

Return type:

(out, err, exception)

reproman.utils.expandpath(path, force_absolute=True)[source]

Expand all variables and user handles in a path.

By default return an absolute path

reproman.utils.file_basename(name, return_ext=False)[source]

Strips up to 2 extensions of length up to 4 characters and starting with alpha not a digit, so we could get rid of .tar.gz etc

reproman.utils.find_files(regex, topdir='.', exclude=None, exclude_vcs=True, exclude_reproman=False, dirs=False)[source]

Generator to find files matching regex

  • regex (basestring) –
  • exclude (basestring, optional) – Matches to exclude
  • exclude_vcs – If True, excludes commonly known VCS subdirectories. If string, used as regex to exclude those files (regex: ‘/.(?:git|gitattributes|svn|bzr|hg)(?:/|$)’)
  • exclude_reproman – If True, excludes files known to be reproman meta-data files (e.g. under .reproman/ subdirectory) (regex: ‘/.(?:reproman)(?:/|$)’)
  • topdir (basestring, optional) – Directory where to search
  • dirs (bool, optional) – Either to match directories as well as files
reproman.utils.generate_unique_name(pattern, nameset)[source]

Create a unique numbered name from a pattern and a set

  • pattern (basestring) – The pattern for the name (to be used with %) that includes one %d location
  • nameset (collection) – Collection (set or list) of existing names. If the generated name is used, then add the name to the nameset.

The generated unique name

Return type:


reproman.utils.get_cmd_batch_len(arg_list, cmd_len)[source]

Estimate the maximum batch length for a given argument list

To make sure we don’t call shell commands with too many arguments this function looks at an argument list and the command length without any arguments, and estimates the number of arguments we want to batch together at one time.

  • arg_list (list) – The list to process in the command
  • cmd_len (number) – The length of the command without arguments

The maximum number in a single batch

Return type:



Provides args for a function

Parameters:func (str) – name of the function from which args are being requested
Returns:of the args that a function takes in
Return type:list
reproman.utils.get_tempfile_kwargs(tkwargs={}, prefix='', wrapped=None)[source]

Updates kwargs to be passed to tempfile. calls depending on env vars


Backward-compatibility wrapper for inspect.getargspec.


Try to return a CWD without dereferencing possible symlinks

If no PWD found in the env, output of getcwd() is returned

reproman.utils.instantiate_attr_object(item_type, items)[source]

Instantiate item_type given items (for a list or dict)

Provides a more informative exception message in case if some arguments are incorrect


Return true if an object is a binary string (not unicode)


Return whether a path explicitly points to a location

Any absolute path, or relative path starting with either ‘../’ or ‘./’ is assumed to indicate a location on the filesystem. Any other path format is not considered explicit.


Return True if all in/outs are tty

reproman.utils.is_subpath(path, directory)[source]

Test whether path is below (or is itself) directory.

Symbolic links are not resolved before the check.


Return true if an object is unicode

reproman.utils.items_to_dict(l, attrs='name', ordered=False)[source]

Given a list of attr instances, return a dict using specified attrs as keys

  • attrs (str or list of str) – Which attributes of the items to use to group
  • ordered (bool, optional) – Either to return an ordered dictionary following the original order of items in the list

ValueError – If there is a conflict - multiple items with the same attrs used for key


Return type:

dict or collections.OrderedDict


Joins a sequence of dicts into a single dict

Parameters:seq (sequence) – Sequence of dicts to join
Return type:dict
Raises:RuntimeError if a duplicate key is encountered.

Returns whether at a given path there is information about an annex

It is just a thin wrapper around GitRepo.is_with_annex() classmethod which also checks for path to exist first.

This includes actually present annexes, but also uninitialized ones, or even the presence of a remote annex branch.


Q&D helper to line profile the function and spit out stats

reproman.utils.lmtime(filepath, mtime)[source]

Set mtime for files, while not de-referencing symlinks.

To overcome absence of os.lutime

Works only on linux and OSX ATM

reproman.utils.make_tempfile(content=None, wrapped=None, **tkwargs)[source]

Helper class to provide a temporary file name and remove it at the end (context manager)

  • mkdir (bool, optional (default: False)) – If True, temporary directory created using tempfile.mkdtemp()
  • content (str or bytes, optional) – Content to be stored in the file created
  • wrapped (function, optional) – If set, function name used to prefix temporary file name
  • **tkwargs – All other arguments are passed into the call to{,d}temp(), and resultant temporary filename is passed as the first argument into the function t. If no ‘prefix’ argument is provided, it will be constructed using module and function names (‘.’ replaced with ‘_’).
  • change the used directory without providing keyword argument 'dir' set (To) –


>>> from os.path import exists
>>> from reproman.utils import make_tempfile
>>> with make_tempfile() as fname:
...    k = open(fname, 'w').write('silly test')
>>> assert not exists(fname)  # was removed
>>> with make_tempfile(content="blah") as fname:
...    assert open(fname).read() == "blah"

Convert an iterable of dictionaries.

In the case of key collisions, the last value wins.

Parameters:ds (iterable of dicts) –
Return type:dict

A little helper to be invoked to consistently fail whenever functionality is not supported (yet) on Windows


Given a dictionary, return the one only with entries which had non-null values


allows a decorator to take optional positional and keyword arguments. Assumes that taking a single, callable, positional argument means that it is decorating a function, i.e. something like this:

def function(): pass

Calls decorator with decorator(f, *args, **kwargs)


Create a dict from a “key=value” list.

Parameters:params (sequence of str or mapping) – For a sequence, each item should have the form “<key>=<value”. If params is a mapping, it will be returned as is.
Return type:A mapping from backend key to value.
Raises:ValueError if item in params does not match expected “key=value” format.

Split version into major, minor, patch, and tag components.

Parameters:version (str) – A version string X.Y.Z. X, Y, and Z must be digits. Any remaining text is treated as a tag (e.g., “-rc1”).
Return type:A namedtuple with the form (major, minor, patch, tag)
reproman.utils.partition(items, predicate=<class 'bool'>)[source]

Partition items by predicate.

  • items (iterable) –
  • predicate (callable) – A function that will be mapped over each element in items. The elements will partitioned based on whether the return value is false or true.

  • A tuple with two generators, the first for ‘false’ items and the second for
  • ’true’ ones.


Taken from Peter Otten’s snippet posted at


Map a pycache path to the original path.

Parameters:path (str) – A Python cache file.
  • Path of cached Python file (str) or None if path doesn’t look like a
  • cache file.
reproman.utils.rmtemp(f, *args, **kwargs)[source]

Wrapper to centralize removing of temp files so we could keep them around

It will not remove the temporary file/directory if REPROMAN_TESTS_KEEPTEMP environment variable is defined

reproman.utils.rmtree(path, chmod_files='auto', *args, **kwargs)[source]

To remove git-annex .git it is needed to make all files and directories writable again first

  • chmod_files (string or bool, optional) – Either to make files writable also before removal. Usually it is just a matter of directories to have write permissions. If ‘auto’ it would chmod files on windows by default
  • *args
  • **kwargs – Passed into shutil.rmtree call
reproman.utils.rotree(path, ro=True, chmod_files=True)[source]

To make tree read-only or writable

  • path (string) – Path to the tree/directory to chmod
  • ro (bool, optional) – Either to make it R/O (default) or RW
  • chmod_files (bool, optional) – Either to operate also on files (not just directories)
reproman.utils.safe_write(ostream, s, encoding='utf-8')[source]

Safely write different string types to an output stream


Overloads default sys.excepthook with our exceptionhook handler.

If interactive, our exceptionhook handler will invoke pdb.post_mortem; if not interactive, then invokes default handler.

reproman.utils.shortened_repr(value, l=30)[source]

Return a (sorted) list of files under dout


Context manager to consume all logs.


Context manager to help consuming both stdout and stderr, and print()

stdout is available as cm.out and stderr as cm.err whenever cm is the yielded context manager. Internally uses temporary files to guarantee absent side-effects of swallowing into StringIO which lacks .fileno.

print mocking is necessary for some uses where sys.stdout was already bound to original sys.stdout, thus mocking it later had no effect. Overriding print function had desired effect

reproman.utils.to_binarystring(s, encoding='utf-8')[source]

Converts any type string to binarystring

reproman.utils.to_unicode(s, encoding='utf-8')[source]

Converts any type string to unicode

reproman.utils.unique(seq, key=None)[source]

Given a sequence return a list only with unique elements while maintaining order

This is the fastest solution. See and for more information. Enhancement – added ability to compare for uniqueness using a key function

  • seq – Sequence to analyze
  • key (callable, optional) – Function to call on each element so we could decide not on a full element, but on its member etc
reproman.utils.updated(d, update)[source]

Return a copy of the input with the ‘update’

Primarily for updating dictionaries

reproman.utils.write_update(fname, content, encoding=None)[source]

Write content to fname unless it already has matching content.

This is the same as simply writing the content, except no writing occurs if the content of the existing file matches, the write or update is logged, and the leading directories of fname are created if needed.

  • fname (str) – Path to update.
  • content (str) – Content to dump to path.
  • encoding (str or None, optional) – Passed to open.