datacatalog.filetypes package¶
Submodules¶
datacatalog.filetypes.anytype module¶
datacatalog.filetypes.filetype module¶
-
class
datacatalog.filetypes.filetype.
FileType
(label, comment)¶ Bases:
tuple
-
comment
¶ Alias for field number 1
-
label
¶ Alias for field number 0
-
-
class
datacatalog.filetypes.filetype.
FileTypeComment
[source]¶ Bases:
str
Verbose human-readable name for a file type
-
exception
datacatalog.filetypes.filetype.
FileTypeError
[source]¶ Bases:
ValueError
Error that occurs when working with FileTypes
datacatalog.filetypes.infer module¶
-
datacatalog.filetypes.infer.
infer_filetype
(filename, check_exists=False, permissive=True)[source]¶ Infer a file’s canonical
file type
Parameters: Note
Use of
check_exists
requiresfilename
to be an absolute pathRaises: OSError
– Existence of the target file cannot be verifiedFileTypeError
– The target file’s type could not be inferred
Returns: The
type
for the fileReturn type:
datacatalog.filetypes.listing module¶
-
datacatalog.filetypes.listing.
listall
(filter_attrname=None)[source]¶ Lists rule- and MIME-based types, labels, or comments
Parameters: filter_attrname (str, optional) – Attribute name to extract from list Returns: A list of FileType, FileTypeLabel, or FileTypeComment objects Return type: list
datacatalog.filetypes.mime module¶
datacatalog.filetypes.rules module¶
datacatalog.filetypes.ruleset module¶
-
datacatalog.filetypes.ruleset.
FILETYPES
= [('BEDGRAPH', 'UCSC Genome Browser bedGraph format', ['.bedgraph$']), ('ABI', 'ABI Sequencer Chromatogram file', ['.ab1$', '.abi$', '.ab$', '.ab!$']), ('SCF', 'Standard Chromatogram Format file', ['.scf$']), ('LOG', 'Log file', ['.err$', '.out$', '.log$']), ('ENV', 'Environment file', ['.env$', '.rc$']), ('FASTQC', 'FASTQC outputs', ['fastqc.html$', 'fastqc.zip$']), ('MULITIQC', 'FASTQC outputs', ['multiqc_report.html$']), ('FASTA', 'FASTA sequence file', ['.fa$', '.fasta$', '.fa.gz$', '.fasta.gz$', '.fas$']), ('TSV', 'Tab-separated values (override TAB-SEPARATED-VALUES)', ['.tab$', '.tsv$']), ('BAM', 'Binary SAM', ['.bam$']), ('BAI', 'Binary SAM Index', ['.bai$']), ('VCF', 'Variant Call Format', ['.vcf$']), ('BCF', 'Binary Variant Call Format', ['.bcf$']), ('MD5', 'MD5 checksum file', ['.md5$']), ('SAM', 'Sequence Alignment/MAP', ['.sam$']), ('FASTQ', 'FASTQ sequence file', ['.fastq$', '.fastq.gz$', '.fq$', '.fq.gz$']), ('FCS', 'Flow Cytometry Standard', ['.fcs$']), ('SRAW', 'Raw proteomics file', ['.sraw$']), ('MZML', 'Proteomics mzML file', ['.mzML$']), ('MSF', 'Magellan storage file', ['.msf$']), ('SAMPLES', 'Sample Set Metadata (JSON)', ['^metadata-[a-z0-9-]+.json$']), ('BPROV', 'Biofab Provenance (JSON)', ['^provenance_dump.json$']), ('INI', 'INI config file', ['.ini$']), ('SECRETS', 'Abaco secrets file', ['^secrets.json$']), ('CONFIG', 'Configuration file', ['config.rc$', 'reactor.rc$', 'config.yml$']), ('GIT', 'Git file', ['.git']), ('JENKINS', 'Jenkins Pipeline file', ['^Jenkinsfile$']), ('DOCKERFILE', 'Docker build file', ['^Dockerfile$']), ('REQUIREMENTS', 'Python requirements file', ['^requirements.txt$']), ('COMPOSEFILE', 'Docker compose file', ['^docker-compose.yml$']), ('GFF3', 'Sequence Ontology General Feature Format', ['.gff$', '.gff3$']), ('GTF', 'Ensembl Gene Transfer Format', ['.gtf$']), ('AB1', 'ABI Sequencer Chromatogram file', ['.ab1$']), ('JPEG', 'Alias for JPEG file', ['.jpg$'])]¶ A list of tuples defining classifcation rules for filenames
datacatalog.filetypes.schemas module¶
-
class
datacatalog.filetypes.schemas.
FileTypeLabelDoc
(**kwargs)[source]¶ Bases:
datacatalog.jsonschemas.schema.JSONSchemaBaseObject
Schema document enumerating all FileTypeLabels
-
datacatalog.filetypes.schemas.
get_schemas
()[source]¶ Returns the filetype_label subschema
Returns: One or more schema documents Return type: JSONSchemaCollection