datacatalog.linkedstores.basestore package¶
Submodules¶
datacatalog.linkedstores.basestore.admin module¶
datacatalog.linkedstores.basestore.agaveclient module¶
-
class
datacatalog.linkedstores.basestore.agaveclient.
AgaveClient
(mongodb, config={}, session=None, agave=None, **kwargs)[source]¶ Bases:
datacatalog.linkedstores.basestore.store.LinkedStore
Adds use of AgaveHelper to a LinkedStore
datacatalog.linkedstores.basestore.diff module¶
-
class
datacatalog.linkedstores.basestore.diff.
DocumentDiff
(delta, uuid, admin, action)[source]¶ Bases:
datacatalog.extensible.ExtensibleAttrDict
-
updated
¶ Were any differences found?
-
-
datacatalog.linkedstores.basestore.diff.
get_diff
(source={}, target={}, action='update')[source]¶ Determine the differences between two documents
Generates a document for the updates store that describes the diff between source and target documents. The resulting document includes the document UUID, a timestamp, the document’s tenancy details, and the JSON-diff encoded in URL-safe base64. The encoding is necessary because JSON diff and patch formats include keys beginning with $, which are prohibited in MongoDB documents.
Parameters: Returns: A json-diff record LinkEdgesDiff: a record of differences in linkage fields bool: Whether the json-diff was empty or not
Return type:
datacatalog.linkedstores.basestore.documentschema module¶
-
class
datacatalog.linkedstores.basestore.documentschema.
DocumentSchema
(**kwargs)[source]¶ Bases:
datacatalog.jsonschemas.schema.JSONSchemaBaseObject
Extends the JSON schema-driven document class with LinkedStore functions
DocumentSchema objects validate against the schema defined in schema.json, have a defined LinkedStore type and specify fields used to uniquely identify the document. Their get_schemas method can to emit both document (which contains all administrative fields) and object schema (only core data fields).
- Attributes
- _filters (list): A private attribute defining how to render document and object schemas from the larger JSON schema
-
DEFAULT_DOCUMENT_NAME
= 'schema.json'¶ Filename of the JSON schema document, relative to __file__.
-
DEFAULT_FILTERS_NAME
= 'filters.json'¶ Filename of the JSON schema filters document, relative to __file__.
-
DELETE_FIELD
= '_visible'¶
-
RETURN_DOC_FILTERS
= ['_id', '_salt', '_admin', '_properties', '_update_token', '_visible']¶ These keys should never be returned in a document
-
TYPED_UUID_FIELD
= ['id']¶ List of fields used to generate a typed UUID
-
TYPED_UUID_TYPE
= 'generic'¶ The named type for UUIDs assigned to this class of LinkedStore documents
-
get_collection
()[source]¶ Returns the name of the MongoDB containing documents with this schema
Documents from a LinkedStore are stored in a specific named MongoDB collection. This method returns the collection name. It is good practive for the collection and name of the LinkedStore-derived class to be related intuitively.
Returns: The name of a MongoDB collection Return type: str
-
get_filename
(document=False)[source]¶ Returns basename for the schema file
When a LinkedStore’s schema is rendered, its relationship with other datacatalog-managed schemas is established via a common base URL. The basename of the URI that is embedded in the
$id
field of the schema is defined in__filename
in the extended JSON schema document and is returned by this method.Returns: The filename at which this schema is expected to be resolvable Return type: str
-
get_identifiers
()[source]¶ Returns the list of top-level keys that are identifiers
In the extended-form schema,
__identifiers
describes which keys can be used to uniquely identify documents written using this schema:Returns: The list of identifying key names Return type: list
-
get_indexes
()[source]¶ Returns the list of indexes declared for documents of this schema
In the extended-form schema,
__indexes
declare the indexing strategy for documents written using this schema.Returns: The list of indexes declared for this schema Return type: list
-
get_required
()[source]¶ Returns the list of required fields
This is defined by
__required
in extended-form schema.Returns: The list of indexes declared for this schema Return type: list
-
get_serialized_document
(document, **kwargs)[source]¶ Serializes a complex object into a string
Some UUIDs are constructed from complex data structures like Agave job definitions. Rather than implement specific strategies for selecting from arbitrary nested structures, this method provides guaranteed- order serialization of the object to a linear string.
Parameters: document (object) – A dict or list object to serialize Returns: JSON serialized and minified representation of document
Return type: str
-
get_typeduuid
(payload, binary=False)[source]¶ Generate a UUID with the appropriate type prefix
Parameters: - payload (str/dict) – If
payload
is string, the UUID is generated - from it. Otherwise, it is serialized before being used to (directly) –
- the UUID. (generate) –
- binary (bool, optional) – Whether to return a Binary-encoded UUID.
- to False. (Defaults) –
Returns: A string validating as UUID5 with a 3-character typing prefix
Return type: - payload (str/dict) – If
-
get_uuid_fields
()[source]¶ Returns the key names used to generate the document’s TypedUUID
Returns: A list of key names found in the document that contribute to its UUID Return type: list
-
get_uuid_type
()[source]¶ Returns the TypedUUID name for documents with this schema
Each document is assigned a UUID which is a hash of values of specific named keys in the document. The UUID is typed with a prefix to indicate which kind of object it is. All LinkedStore documents have typed UUIDs, but there are several other types as well.
Returns: One of the list of UUID types known to the datacatalog library Return type: str
-
classmethod
time_stamp
()[source]¶ Get a UTC time stamp rounded to millisecond precision
Returns: datetime.datetime representation of utc_now() Return type: object
-
to_dict
(private_prefix='_', document=False, **kwargs)[source]¶ Render LinkedStore object as a dict suitable for serialization
Parameters: Returns: A dictionary containing fields represented in the document’s JSON schema
Return type:
-
update_id
(document=False)[source]¶ Update the
id
field in the JSON schemaThis method is used solely to let us differentiate object- from document-form JSON schemas by incorporating a specific string into the schema’s
id
field.Parameters: document (bool) – Whether the schema is a document schema Returns: The updated value for schema id
Return type: string
datacatalog.linkedstores.basestore.exceptions module¶
-
exception
datacatalog.linkedstores.basestore.exceptions.
CatalogError
[source]¶ Bases:
Exception
Generic DataCatalog error has been encountered
-
exception
datacatalog.linkedstores.basestore.exceptions.
CatalogQueryError
[source]¶ Bases:
datacatalog.linkedstores.basestore.exceptions.CatalogError
Querying the DataCatalog has failed
-
exception
datacatalog.linkedstores.basestore.exceptions.
CatalogUpdateFailure
[source]¶ Bases:
datacatalog.linkedstores.basestore.exceptions.CatalogError
Writing to the DataCatalog has failed
-
exception
datacatalog.linkedstores.basestore.exceptions.
CatalogDataError
[source]¶ Bases:
datacatalog.linkedstores.basestore.exceptions.CatalogError
Invalid data has been encountered
-
exception
datacatalog.linkedstores.basestore.exceptions.
CatalogDatabaseError
[source]¶ Bases:
datacatalog.linkedstores.basestore.exceptions.CatalogError
An error has occurred that is definitely database-related
datacatalog.linkedstores.basestore.extensible module¶
datacatalog.linkedstores.basestore.heritableschema module¶
-
class
datacatalog.linkedstores.basestore.heritableschema.
HeritableDocumentSchema
(inheritance=True, document='schema.json', filters='filters.json', **kwargs)[source]¶ Bases:
datacatalog.linkedstores.basestore.documentschema.DocumentSchema
Extends DocumentSchema with inheritance from parent object’s JSON schema
HeritableDocumentSchema objects validate build a schema from their local schema.json, but that document is layered over the contents of the schema defined by the root class using a right-favoring merge. Filters, which are used in formatting object vs document schemas, are not inherited.
-
DEFAULT_DOCUMENT_NAME
= 'schema.json'¶ Filename of the JSON schema document, relative to __file__.
-
DEFAULT_FILTERS_NAME
= 'filters.json'¶ Filename of the JSON schema filters document, relative to __file__.
-
datacatalog.linkedstores.basestore.linkmanager module¶
-
class
datacatalog.linkedstores.basestore.linkmanager.
LinkageManager
[source]¶ Bases:
object
-
add_link
(uuid, linked_uuid, relation='child_of', token=None)[source]¶ Link a Data Catalog record with one or more records by UUID
Parameters: Returns: Contents of the revised Data Catalog record
Return type: Raises: LinkageManagerError
– Returned if an invalid relation type or unknown UUID is encountered
-
get_links
(uuid, relation='child_of')[source]¶ Return linkages to this LinkedStore
Return a list of typed UUIDs representing all connections between this LinkedStore and other LinkedStores. This list can be traversed to return a list of all LinkedStore objects using datacatalog.managers.catalog.get()
Parameters: Returns: A list of typed UUIDs that establish relationhips to other LinkedStores
Return type:
-
datacatalog.linkedstores.basestore.managedfields module¶
-
class
datacatalog.linkedstores.basestore.managedfields.
ManagedField
[source]¶ Bases:
str
A MongoDB managed field
-
exception
datacatalog.linkedstores.basestore.managedfields.
ManagedFieldError
[source]¶ Bases:
ValueError
datacatalog.linkedstores.basestore.merge module¶
datacatalog.linkedstores.basestore.mongomerge module¶
datacatalog.linkedstores.basestore.ratelimit module¶
-
class
datacatalog.linkedstores.basestore.ratelimit.
RateLimiter
(*args, batch_size=1024, batch_window=60, batch_pause=2.5, batch_random=True, except_on_limit=False, **kwargs)[source]¶ Bases:
object
Adds a limit() method to a class
-
BATCH_PAUSE
= 2.5¶
-
BATCH_RANDOM
= True¶
-
BATCH_SIZE
= 1024¶
-
BATCH_WINDOW
= 60¶
-
elapsed
¶ Time in seconds elapsed in current rate-limiting window
-
limit
(increment=True)[source]¶ Enforce a rate limit
Parameters: increment (bool, optional) – Whether to increase the usage count Raises: RateLimitExceeded
– This is raised when the usage rate is exceeded andexcept_on_limit
is TrueReturns: Always returns True Return type: bool
-
remaining
¶ A tuple of time and calls remaining
-
remaining_calls
¶ How many metered calls remain
-
remaining_time
¶ How long until the current limiting window expires
-
datacatalog.linkedstores.basestore.record module¶
-
class
datacatalog.linkedstores.basestore.record.
Record
(value, *args, **kwargs)[source]¶ Bases:
datacatalog.extensible.ExtensibleAttrDict
New document for BaseStore with schema enforcement
-
PARAMS
= [('uuid', False, 'uuid', None), ('child_of', False, 'child_of', []), ('generated_by', False, 'generated_by', []), ('derived_using', False, 'derived_using', []), ('derived_from', False, 'derived_from', [])]¶
-
datacatalog.linkedstores.basestore.schemas module¶
datacatalog.linkedstores.basestore.softdelete module¶
-
class
datacatalog.linkedstores.basestore.softdelete.
SoftDelete
(mongodb, config={}, session=None, **kwargs)[source]¶ Bases:
datacatalog.linkedstores.basestore.store.LinkedStore
Adds field-based soft delete to a LinkedStore
-
DELETE_FIELD
= '_visible'¶
-
add_document
(document, token=None, force=True)[source]¶ Write a new managed document
Parameters: document (dict) – The contents of the document Raises: CatalogError
– Raised when document cannot be writtenReturns: Dictionary representation of the new document Return type: dict
-
delete_document
(uuid, token=None, force=False)[source]¶ Delete a document by UUID
Managed interface for removing a document from its linkedstore collection by its typed UUID.
Parameters: Raises: CatalogError
– Raised when unknown UUID, invalid token is passed, or on general write failure.Returns: MongoDB deletion response
Return type:
-
datacatalog.linkedstores.basestore.store module¶
-
class
datacatalog.linkedstores.basestore.store.
LinkedStore
(mongodb, config={}, session=None, **kwargs)[source]¶ Bases:
datacatalog.linkedstores.basestore.linkmanager.LinkageManager
JSON-schema informed MongoDB document store with diff-based logging
If the class has public attributes, they may be documented here in an
Attributes
section and follow the same formatting as a function’sArgs
section. Alternatively, attributes may be documented inline with the attribute’s declaration (see __init__ method below).Properties created with the
@property
decorator should be documented in the property’s getter method.-
schema (obj
dict): A JSON schema document
-
identifiers
¶ Ordered list of keys that can be used to uniquely retrieve documents in this schema
Type: list
-
DELETE_FIELD
= '_visible'¶
-
LINKAGE_POLICIES
= ('extend', 'replace')¶ Set of valid strategies for updating document linkages
-
LINK_FIELDS
= ('child_of', 'derived_from', 'derived_using', 'generated_by')¶ Allowed linkage types for this LinkedStore
-
LOG_JSONDIFF_UPDATES
= True¶ Field used to mark a record as deleted
-
MANAGED_FIELDS
= ('uuid', '_admin', '_properties', '_salt', '_enforce_auth')¶ Fields in this LinkedStore that are managed solely by the framework
-
MERGE_DICT_OPTS
= ('left', 'right', 'replace')¶ Set of valid strategies for merging dictionaries
-
MERGE_LIST_OPTS
= ('append', 'replace')¶ Set of valid strategies for merging lists
-
NEVER_INDEX_FIELDS
= 'data'¶ Fields that should never be indexed
-
PROPERTIES_TEMPLATE
= {'_properties': {'created_date': None, 'modified_date': None, 'revision': 0}}¶ Template for a properties subdocument
-
READONLY_FIELDS
= ('child_of', 'derived_from', 'derived_using', 'generated_by', 'uuid', '_admin', '_properties', '_salt', '_enforce_auth')¶ Additional fields that are read-only in this LinkedStore
-
TOKEN_FIELDS
= ('uuid', '_admin')¶ Default set of keys used to issue update tokens
-
add_document
(document, token=None)[source]¶ Write a new managed document
Parameters: document (dict) – The contents of the document Raises: CatalogError
– Raised when document cannot be writtenReturns: Dictionary representation of the new document Return type: dict
-
coll
= None Name of MongoDB collection housing this LinkedStore
-
db
= None Name of MongoDB database housing this LinkedStore
-
delete_document
(uuid, token=None, **kwargs)[source]¶ Delete a document by UUID
Managed interface for removing a document from its linkedstore collection by its typed UUID.
Parameters: Raises: CatalogError
– Raised when unknown UUID, invalid token is passed, or on general write failure.Returns: MongoDB deletion response
Return type:
-
document_schema
= None¶ Dictionary containing the LinkedStore’s full document schema
-
find_one_by_id
(**kwargs)[source]¶ Find and return a LinkedStore document by any of its identifiers
Examples
resp = find_one_by_id(name=’uniquename’) resp = find_one_by_id(uuid=’105bf45a-6282-5e8c-8651-6a0ff78a3741’) resp = find_one_by_id(id=’lab.sample.12345’)
Raises: CatalogError
– Raised when query fails due to an error or invalid valueReturns: Object containing the LinkedStore document Return type: dict
-
find_one_by_uuid
(uuid)[source]¶ Find and return a LinkedStore document by its typed UUID
Parameters: uuid (str) – The UUID to search for Raises: CatalogError
– Raised when query fails due to an error or invalid valueReturns: Object containing the LinkedStore document Return type: dict
-
get_token_fields
(record_dict)[source]¶ Get values for issuing a document’s update token
The fields used to define an update token are set in TOKEN_FIELDS. This method fetches values from those fields and returns as a list.
Parameters: record_dict (dict) – Contents of the document from which to extract values Returns: List of values from keys matching TOKEN_FIELDS Return type: list
-
identifiers
= None List of identifying keys for this LinkedStore
-
logcoll
= None Name of MongoDB collection housing the general update log
-
name
= None Human-readable name of the LinkedStore schema
-
otherindexes
= None¶ Indexed fields defined in schema.json
-
query
(query={}, projection=None, attr_dict=False, limit=None, skip=None, attr_filters={'filters': [], 'private_prefix': '_'})[source]¶ Query the LinkedStore MongoDB collection and return a Cursor
Parameters:
-
replace_document
(source_document, target_document, token=None)[source]¶ Replace a document distinguished by UUID with a new instance
Parameters: Raises: CatalogError
– Raised when document cannot be replacedReturns: Dict representation of the new content for the document
Return type:
-
schema
= None¶ Dictionary containing the LinkedStore’s object schema
-
schema_name
= None Canonical filename for the document’s JSON schema
-
session
= None Optional correlation string for interlinked events
-
setup
(update_indexes=False)[source]¶ Set up the MongoDB collection that houses data for the LinkedStore
-
update_attrs
(schema)[source]¶ Updates LinkedStore with values in loaded schema
This is used to allow the schema to be patched or amended at runtime
Parameters: schema (dict) – A JSON schema documented loaded into a dict
-
update_document
(source_document, target_document, token=None, merge_dicts='right', merge_lists='append', linkage_policy='extend')[source]¶
-
uuid_fields
= None List of keys that are rolled into the document’s UUID
-
uuid_type
= None Named type for this LinkedStore
-
write_key
(uuid, key, value, token=None)[source]¶ Write a value to a top-level key in a document
Managed interface for writing to specific top-level keys, where some keys, namely any that are specified in the schema as identifiers or contributors to UUID generation, are enforced to be read-only.
Parameters: Raises: CatalogError
– Raised when a read-only key is specified or an invalid token is passedReturns: Dict representation of the updated document
Return type:
-
-
class
datacatalog.linkedstores.basestore.store.
StoreInterface
(mongodb, config={}, session=None, **kwargs)[source]¶ Bases:
datacatalog.linkedstores.basestore.store.LinkedStore
Alias for the LinkedStore defined in this module
This alias is used generically in methods that iterate over all known linikedstores.
-
class
datacatalog.linkedstores.basestore.store.
DocumentSchema
(**kwargs)[source]¶ Bases:
datacatalog.jsonschemas.schema.JSONSchemaBaseObject
Extends the JSON schema-driven document class with LinkedStore functions
DocumentSchema objects validate against the schema defined in schema.json, have a defined LinkedStore type and specify fields used to uniquely identify the document. Their get_schemas method can to emit both document (which contains all administrative fields) and object schema (only core data fields).
- Attributes
- _filters (list): A private attribute defining how to render document and object schemas from the larger JSON schema
-
DEFAULT_DOCUMENT_NAME
= 'schema.json'¶ Filename of the JSON schema document, relative to __file__.
-
DEFAULT_FILTERS_NAME
= 'filters.json'¶ Filename of the JSON schema filters document, relative to __file__.
-
DELETE_FIELD
= '_visible'¶
-
RETURN_DOC_FILTERS
= ['_id', '_salt', '_admin', '_properties', '_update_token', '_visible']¶ These keys should never be returned in a document
-
TYPED_UUID_FIELD
= ['id']¶ List of fields used to generate a typed UUID
-
TYPED_UUID_TYPE
= 'generic'¶ The named type for UUIDs assigned to this class of LinkedStore documents
-
get_collection
()[source]¶ Returns the name of the MongoDB containing documents with this schema
Documents from a LinkedStore are stored in a specific named MongoDB collection. This method returns the collection name. It is good practive for the collection and name of the LinkedStore-derived class to be related intuitively.
Returns: The name of a MongoDB collection Return type: str
-
get_filename
(document=False)[source]¶ Returns basename for the schema file
When a LinkedStore’s schema is rendered, its relationship with other datacatalog-managed schemas is established via a common base URL. The basename of the URI that is embedded in the
$id
field of the schema is defined in__filename
in the extended JSON schema document and is returned by this method.Returns: The filename at which this schema is expected to be resolvable Return type: str
-
get_identifiers
()[source]¶ Returns the list of top-level keys that are identifiers
In the extended-form schema,
__identifiers
describes which keys can be used to uniquely identify documents written using this schema:Returns: The list of identifying key names Return type: list
-
get_indexes
()[source]¶ Returns the list of indexes declared for documents of this schema
In the extended-form schema,
__indexes
declare the indexing strategy for documents written using this schema.Returns: The list of indexes declared for this schema Return type: list
-
get_required
()[source]¶ Returns the list of required fields
This is defined by
__required
in extended-form schema.Returns: The list of indexes declared for this schema Return type: list
-
get_serialized_document
(document, **kwargs)[source]¶ Serializes a complex object into a string
Some UUIDs are constructed from complex data structures like Agave job definitions. Rather than implement specific strategies for selecting from arbitrary nested structures, this method provides guaranteed- order serialization of the object to a linear string.
Parameters: document (object) – A dict or list object to serialize Returns: JSON serialized and minified representation of document
Return type: str
-
get_typeduuid
(payload, binary=False)[source]¶ Generate a UUID with the appropriate type prefix
Parameters: - payload (str/dict) – If
payload
is string, the UUID is generated - from it. Otherwise, it is serialized before being used to (directly) –
- the UUID. (generate) –
- binary (bool, optional) – Whether to return a Binary-encoded UUID.
- to False. (Defaults) –
Returns: A string validating as UUID5 with a 3-character typing prefix
Return type: - payload (str/dict) – If
-
get_uuid_fields
()[source]¶ Returns the key names used to generate the document’s TypedUUID
Returns: A list of key names found in the document that contribute to its UUID Return type: list
-
get_uuid_type
()[source]¶ Returns the TypedUUID name for documents with this schema
Each document is assigned a UUID which is a hash of values of specific named keys in the document. The UUID is typed with a prefix to indicate which kind of object it is. All LinkedStore documents have typed UUIDs, but there are several other types as well.
Returns: One of the list of UUID types known to the datacatalog library Return type: str
-
classmethod
time_stamp
()[source]¶ Get a UTC time stamp rounded to millisecond precision
Returns: datetime.datetime representation of utc_now() Return type: object
-
to_dict
(private_prefix='_', document=False, **kwargs)[source]¶ Render LinkedStore object as a dict suitable for serialization
Parameters: Returns: A dictionary containing fields represented in the document’s JSON schema
Return type:
-
update_id
(document=False)[source]¶ Update the
id
field in the JSON schemaThis method is used solely to let us differentiate object- from document-form JSON schemas by incorporating a specific string into the schema’s
id
field.Parameters: document (bool) – Whether the schema is a document schema Returns: The updated value for schema id
Return type: string
-
class
datacatalog.linkedstores.basestore.store.
HeritableDocumentSchema
(inheritance=True, document='schema.json', filters='filters.json', **kwargs)[source]¶ Bases:
datacatalog.linkedstores.basestore.documentschema.DocumentSchema
Extends DocumentSchema with inheritance from parent object’s JSON schema
HeritableDocumentSchema objects validate build a schema from their local schema.json, but that document is layered over the contents of the schema defined by the root class using a right-favoring merge. Filters, which are used in formatting object vs document schemas, are not inherited.
-
DEFAULT_DOCUMENT_NAME
= 'schema.json'¶ Filename of the JSON schema document, relative to __file__.
-
DEFAULT_FILTERS_NAME
= 'filters.json'¶ Filename of the JSON schema filters document, relative to __file__.
-
-
exception
datacatalog.linkedstores.basestore.store.
CatalogError
[source]¶ Bases:
Exception
Generic DataCatalog error has been encountered
-
exception
datacatalog.linkedstores.basestore.store.
CatalogUpdateFailure
[source]¶ Bases:
datacatalog.linkedstores.basestore.exceptions.CatalogError
Writing to the DataCatalog has failed
-
exception
datacatalog.linkedstores.basestore.store.
CatalogQueryError
[source]¶ Bases:
datacatalog.linkedstores.basestore.exceptions.CatalogError
Querying the DataCatalog has failed
-
exception
datacatalog.linkedstores.basestore.store.
DuplicateKeyError
(error, code=None, details=None, max_wire_version=None)[source]¶ Bases:
pymongo.errors.WriteError
Raised when an insert or update fails due to a duplicate key error.
-
datacatalog.linkedstores.basestore.store.
time_stamp
(dt=None, rounded=False)[source]¶ Get time in seconds :param dt: Optional datetime object. [current_time()] :type dt: datetime :param rounded: Whether to round respose to nearest int :type rounded: bool
Returns: Time expressed as a float
(orint
)
-
datacatalog.linkedstores.basestore.store.
validate_token
(token, salt=None, *args, permissive=True)[source]¶ Validate a token
Both record-level tokens and administrative tokens can be validated using this function.
Parameters: Raises: InvalidToken
– The outcome when validation fails and permissive is not set.Returns: If permissive is set, validity is a Boolean value
Return type:
-
datacatalog.linkedstores.basestore.store.
validate_admin_token
(token, key=None, permissive=True)[source]¶ Validate an adminstrative token
Only administrative tokens can be validated using this function.
Parameters: Raises: InvalidAdminToken
– The outcome when validation fails and permissive is not set.Returns: If permissive is set, validity is a Boolean value
Return type: