Definitions¶
Definitions are statically-defined nodes in the JSON schema. They are suited to
for constraining string values and enumerations and for defining
simple data structures comprised of Javascript primitives or other subschemas.
They are implemented in datacatalog.definitions
.
Add a definition¶
Adding a new definition requires just a few steps:
- Decide on a schema id. It must be unique within the SD2 namespace, between 6 and 32 characters long, contain only
[a-z0-9_]
, and be descriptive of the schema contents. - Create a file in
datacatalog/definitions/jsondocs
named after the schema id you have chosed (e.g. <id>.json) - Write your schema in JSON schema Draft 7 (with 7 being preferable).
- Optional: Include examples of valid values for the schema in the
examples
array in the schema file - Validate your schema using the NewtonSoft JSON Schema Validator or equivalent
- Ensure your schema builds and does not disrupt building of other schemas. Use Makefile target
make schemas-test
for this. - Build a persistent instance of your new schema. Use``make schemas`` for this. Check that a schema with the appropriate name was generated and is not empty.
- Validate all built schemas. This is especially important if you have referenced other schemas from within the one you are building. Use
make schemas-validate
for this. - When all is working according to plan, open a pull request to the
python-datacatalog
repository indicating what you have added.
Below is an example of using that Makefile target. If you see both
SCHEMA PACKAGE
and SCHEMA
for definitions, it was likely successful.
$ make schemas-test
LOCALONLY=1 MAKETESTS=1 python -m scripts.build_schemas
...
SCHEMA PACKAGE definitions
SCHEMA: definitions
...
Update a definition¶
The workflow for updating a definition is nearly identical, save that you do not need to create a new JSON file. Be very careful in updating existing definitions (especially their validations and enumerations) as this can break pipeline and data management components that depend on the project schema if the PR including your change makes it into production.
Measure twice. Confirm with someone else what you’re going to cut. Then, cut once