datacatalog package

Subpackages

Submodules

datacatalog.agavehelpers module

class datacatalog.agavehelpers.AgaveHelper(client, storage_system='data-sd2e-community')[source]

Bases: object

Uses an active API client to provide various utility functions

delete(filePath, systemId)[source]
dirname(path, storage_system=None)[source]
exists(path, storage_system=None)[source]

Check if a path exists on an Agave storage resource

Parameters:
  • path (str) – An Agave absolute path
  • storage_system (str, optional) – The storage system against which to resolve the POSIX path
Raises:

AgaveHelperError – The function has failed due an API error

Returns:

Whether the path exists or not

Return type:

bool

get_storage_system(storage_system)[source]
isdir(path, storage_system=None)[source]

Check if a path on an Agave storage resource is a directory

Parameters:
  • path (str) – An Agave absolute path
  • storage_system (str, optional) – The storage system against which to resolve the POSIX path
Raises:

AgaveHelperError – The function has failed due an API error

Returns:

Whether the path is a directory or not

Return type:

bool

isfile(path, storage_system=None)[source]

Check if a path on an Agave storage resource is a file

Parameters:
  • path (str) – An Agave absolute path
  • storage_system (str, optional) – The storage system against which to resolve the POSIX path
Raises:

AgaveHelperError – The function has failed due an API error

Returns:

Whether the path is a file or not

Return type:

bool

listdir(path, recurse, storage_system=None, directories=True)[source]

Get the contents of a directory on an Agave storage resource

Parameters:
  • path (str) – An Agave absolute path to directory
  • storage_system (str, optional) – The storage system where path is found
  • directories (bool, optional) – Whether to include directories in response
Returns:

Directory contents as a list of strings

Return type:

list

listdir_agave_lustre(path, recurse=True, storage_system=None, directories=True)[source]
listdir_agave_native(path, recurse, storage_system=None, directories=True, current_listing=[])[source]
listdir_agave_posix(path, recurse=True, storage_system=None, directories=True)[source]
mapped_posix_path(path, storage_system=None)[source]

Resolve the absolute POSIX path for an Agave directory

Parameters:
  • path (str) – Agave absolute path
  • storage_system (str, optional) – The storage system against which to resolve the POSIX path
Returns:

The path as a string

Return type:

str

mkdir(dirName, systemId, basePath='/', sync=False, timeOut=60)[source]

Creates a directory dirName on a storage system at basePath

Like mkdir -p this is imdepotent. It will create the child path tree so long as paths are specified correctly, but will do nothing if all directories are already in place.

paths_to_agave_uris(filepaths, storage_system=None)[source]

Transform a list of paths on a storage system to agave URI

Parameters:
  • filepaths (list) – A list of agave storage system paths
  • storage_system (str, optional) – The storage system where these paths reside
Returns:

The paths in agave:// format

Return type:

list

Warning

Existence of resources described by the URI list is not validated

exception datacatalog.agavehelpers.AgaveHelperError[source]

Bases: agavepy.agave.AgaveError

exception datacatalog.agavehelpers.AgaveHelperException[source]

Bases: datacatalog.agavehelpers.AgaveHelperError

datacatalog.agavehelpers.ag_files_list(client, systemId, filePath, limit=50, offset=0)[source]
datacatalog.agavehelpers.from_agave_uri(uri=None, validate=False)[source]

Partition an Agave storage URI into its components

Parameters:
  • uri (str) – An agave-canonical files URI
  • validate (bool, optional) – Whether to validate the URL using an API call
Raises:

AgaveError – Occurs when invalid URI is passed

Returns:

Three strings are returned: storageSystem, directoryPath, and fileName

Return type:

tuple

datacatalog.agavehelpers.process_agave_httperror(http_error_object)[source]

datacatalog.dicthelpers module

exception datacatalog.dicthelpers.DictionaryMergeError[source]

Bases: Exception

datacatalog.dicthelpers.data_merge(left, right, setkeys=['child_of'])[source]

Merge two mappings objects together, combining overlapping Mappings, and favoring right-values left: The left Mapping object. right: The right (favored) Mapping object. NOTE: This is not commutative (merge(a,b) != merge(b,a)).

datacatalog.dicthelpers.data_merge_diff(a, b, filters=('_id', 'uuid', 'properties', 'measurements_ids', 'measurements', 'files', 'samples'))[source]
datacatalog.dicthelpers.dict_compare(a, b)[source]

Lexically compare values of all primitives in a pair of dicts

datacatalog.dicthelpers.dict_merge(dct, merge_dct, add_keys=True)[source]

Recursive dict merge. Inspired by :meth:dict.update(), instead of updating only top-level keys, dict_merge recurses down into dicts nested to an arbitrary depth, updating keys. The merge_dct is merged into dct. This version will return a copy of the dictionary and leave the original arguments untouched. The optional argument add_keys, determines whether keys which are present in merge_dict but not dct should be included in the new dict. :param dct: :type dct: dict :param merge_dct: dct merged into dct :type merge_dct: dict :param add_keys: whether to add new keys :type add_keys: bool

Returns:updated dict
Return type:dict
datacatalog.dicthelpers.filter_dict(target_dict, keys_to_filter)[source]

Filters key(s) from top level of a dict :param target_dict: the dictionary to filter :type target_dict: dict :param keys_to_filter: set of keys to filter :type keys_to_filter: list, tuple, str

Returns:A filtered copy of target_dict
datacatalog.dicthelpers.flatten(d, parent_key='', sep='_')[source]
datacatalog.dicthelpers.flatten_dict(dd, separator='_', prefix='')[source]
datacatalog.dicthelpers.is_primitive(pyobj)[source]

Determine if pyobj is one of (what other languages would deem) a primitive

datacatalog.dicthelpers.json_diff(j1, j2, filters=('_id', 'uuid', 'properties', 'measurements_ids', 'measurements', 'files', 'samples'))[source]
datacatalog.dicthelpers.linearize_dict(dd, separator='|')[source]
datacatalog.dicthelpers.list_merge(lst, merge_lst)[source]
datacatalog.dicthelpers.right_merge(right_value, left_value)[source]

datacatalog.extensible module

class datacatalog.extensible.ExtensibleAttrDict[source]

Bases: dict

Implements AttrDict-like behavior for complex objects

as_dict(filters=[], private_prefix='__')[source]

datacatalog.githelpers module

datacatalog.githelpers.get_remote_uri(repo='/home/docs/checkouts/readthedocs.org/user_builds/python-datacatalog/checkouts/latest/docs')[source]

Gets the remote origin for the current directory

Parameters:repo (str, optional) – The directory to inspect for a git reflog
Returns:an SSH or HTTP git repository URL
Return type:str
datacatalog.githelpers.get_sha1(repo='/home/docs/checkouts/readthedocs.org/user_builds/python-datacatalog/checkouts/latest/docs')[source]

Get SHA-1 hash of given git repository

Inspects the specified directory, assumed to be a git repository, and returns the SHA1 hash from its HEAD-revision. The output is equivalent to calling git rev-parse HEAD.

Parameters:repo (str, optional) – The directory to inspect for a git reflog
Returns:A hexadecimal string representing the SHA1 hash
Return type:str
datacatalog.githelpers.get_sha1_short(repo='/home/docs/checkouts/readthedocs.org/user_builds/python-datacatalog/checkouts/latest/docs')[source]

Return abbreviated SHA1 hash for the specified git repository

Returns the initial seven characters of the current repository’s most recent git commit.

Parameters:repo (str, optional) – The directory to inspect for a git reflog
Returns:A hexadecimal string representing the SHA1 hash
Return type:str

datacatalog.utils module

datacatalog.utils.camel_to_snake(text_string)[source]

Transform a CamelCase string into snake_case

datacatalog.utils.current_time()[source]

Current UTC time :returns: A datetime object rounded to millisecond precision

datacatalog.utils.decode_path(encoded_file_path)[source]

Returns a URL-decoded version of a path

datacatalog.utils.detect_encoding(file_path)[source]

Uses chardet to detect encoding of a file

datacatalog.utils.dynamic_import(module, package=None)[source]

Dynamically import a module by name at runtime

Parameters:
  • module (str) – The name of the module to import
  • package (str, optional) – The package to import module from
Returns:

The imported module

Return type:

object

datacatalog.utils.encode_path(file_path)[source]

Returns a URL-encoded version of a path

datacatalog.utils.import_submodules(module, package=None)[source]

Dynamically discover and import submodules at runtime

datacatalog.utils.microseconds()[source]

Get currrent time in microseconds as int

datacatalog.utils.msec_precision(datetimeval)[source]
datacatalog.utils.normalize(filepath)[source]
datacatalog.utils.normpath(filepath)[source]
datacatalog.utils.safen_path(file_path, no_unicode=False, no_spaces=False, url_quote=False, no_equals=False)[source]

Returns a safened version of a path

Trailing whitespace is removed, Unicode characters (sorry!) are transformed to ASCII equivalents, equals are replaced with a dash, and whitespaces are replaced with a dash character.

datacatalog.utils.text_uuid_to_binary(text_uuid)[source]
datacatalog.utils.time_stamp(dt=None, rounded=False)[source]

Get time in seconds :param dt: Optional datetime object. [current_time()] :type dt: datetime :param rounded: Whether to round respose to nearest int :type rounded: bool

Returns:Time expressed as a float (or int)
datacatalog.utils.validate_file_to_schema(file_path, schema_file='/schemas/default.jsonschema', permissive=False)[source]

Validate a JSON document against a specified JSON schema

Args: file_path (str): path to the file to validate schema_file (str): path to the requisite JSON schema file [/schemas/default.jsonschema] permissive (bool): swallow validation errors and return only boolean [False]

Returns:Boolean value
Error handling:
Raises validation exceptions if ‘permissive’ is False.

datacatalog.version module