datacatalog.linkedstores.fixity package

Submodules

datacatalog.linkedstores.fixity.exceptions module

exception datacatalog.linkedstores.fixity.exceptions.FixityDuplicateError(error, code=None, details=None, max_wire_version=None)[source]

Bases: pymongo.errors.DuplicateKeyError

exception datacatalog.linkedstores.fixity.exceptions.FixtyNotFoundError[source]

Bases: KeyError

exception datacatalog.linkedstores.fixity.exceptions.FixtyUpdateFailure[source]

Bases: datacatalog.linkedstores.basestore.exceptions.CatalogError

datacatalog.linkedstores.fixity.indexer module

class datacatalog.linkedstores.fixity.indexer.FixityIndexer(abs_filename=None, storage_system='data-sd2e-community', cache_stat=True, block_size=128000, schema={}, agave=None, **kwargs)[source]

Bases: object

Captures fixed details for a given file

CHECKSUM_BLOCKSIZE = 128000

Chunk size for computing checksum

DEFAULT_SIZE = -1

Default size in bytes when it cannot be determined

XXHASH32_SEED = 2573985330

Seed for xxHash 32-bit fingerprinting

XXHASH64_SEED = 3759046909696704950

Seed for xxHash 64-bit fingerprinting

checksum_xxhash(file, return_type='int')[source]

Compute xxhash digest for a file

Parameters:
  • file (str) – Path to file
  • return_type (str, optional) – Type of digest to return [str, int]
Returns:

Current digest as an integer

Return type:

int64

Note

See https://cyan4973.github.io/xxHash/ for details on xxHash

get_checksum(file, algorithm='sha256')[source]

Compute checksum for indexing target

Parameters:
  • file (str) – Absolute path to the file
  • algorithm (str, optional) – Checksum algorithm to use
Returns:

Hexadecimal checksum for the file

Return type:

str

get_created(file)[source]

Returns (apparent) file creation time.

Parameters:file (str) – Absolute path to the file
Returns:The file’s ctime
Return type:datetime.datetime

Note

Only msec precision is supported, a deficiency inherited from BSON

get_fingerprint(file, algorithm='xxh64')[source]

Compute fast fingerprint for indexing target

Parameters:
  • file (str) – Absolute path to the file
  • algorithm (str, optional) – Fingerprint algorithm to use
Returns:

Hexadecimal checksum for the file

Return type:

str

get_modified(file)[source]

Returns (apparent) file modification time.

Note

Only miilsecond precision is supported as the ultimate target for this value is MongoDB, which only supports milliseconds due to a deficiency in the BSON specification.

get_size(file)[source]

Returns size in bytes for files (or DEFAULT_SIZE if unknown)

get_type(file)[source]

Resolves file type for a given file

get_version(file=None)[source]
sync()[source]

Fetch latest values for indexing target

to_dict()[source]

Render fixity record as a dictionary

Returns:Representation of this fixity record
Return type:dict
updated()[source]

Helper to manage updated state

datacatalog.linkedstores.fixity.schema module

class datacatalog.linkedstores.fixity.schema.FixityDocument(inheritance=True, **kwargs)[source]

Bases: datacatalog.linkedstores.basestore.heritableschema.HeritableDocumentSchema

Schema-driven document to represent file fixity

datacatalog.linkedstores.fixity.schemas module

datacatalog.linkedstores.fixity.schemas.get_schemas()[source]

Get JSON schemas for FixityDocument

Returns:Object and document JSON schema that define the store
Return type:JSONSchemaCollection

datacatalog.linkedstores.fixity.store module

class datacatalog.linkedstores.fixity.store.FixityStore(mongodb, agave=None, config={}, session=None, **kwargs)[source]

Bases: datacatalog.linkedstores.basestore.agaveclient.AgaveClient, datacatalog.linkedstores.basestore.store.LinkedStore, datacatalog.linkedstores.basestore.ratelimit.RateLimiter

Defines fixed attributes for a managed file

LOG_JSONDIFF_UPDATES = False
get_typeduuid(payload, binary=False)[source]
index(filename, storage_system=None, **kwargs)[source]

Capture or update current properties of a file

Fixity includes creation and modification date (rounded to msec), sha256 checksum, size in bytes, and inferred file type.

Parameters:
  • filename (str) – Agave-canonical absolute path to the target
  • storage_system (str, optional) – Agave storage system for the target
Returns:

A LinkedStore document containing fixity details

Return type:

dict

class datacatalog.linkedstores.fixity.store.StoreInterface(mongodb, agave=None, config={}, session=None, **kwargs)[source]

Bases: datacatalog.linkedstores.fixity.store.FixityStore