Skip to content

enriched_identification_dna

Info

Data contract for the Identification DNA table of the dna domain of the Biocloud. Contains taxonomic identifications derived from DNA sequence analysis via BLAST or similar methods. - name: Identification DNA - version: 0.0.1 - status: active

Terms of Use

Purpose

Data contract for the Identification DNA table of the dna domain of the Biocloud. Contains taxonomic identifications derived from DNA sequence analysis via BLAST or similar methods.

Servers

Name Type Attributes
production databricks No description.
environment: production
roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}]
catalog: dna_production
host: dbc-2030845a-6c3b.cloud.databricks.com
schema_: enriched
development databricks No description.
environment: development
roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}]
catalog: dna_development
host: dbc-03ed8bbb-c0ec.cloud.databricks.com
schema_: enriched

Schema

identification_dna

The Identification DNA table contains taxonomic identifications derived from DNA sequence analysis. These identifications are generated by comparing consensus sequences against reference databases like BOLD or GenBank using BLAST.

Field Type Attributes
identification_dna_id bigint Biocloud generated identifier for the DNA identification.
primaryKey
primaryKeyPosition: 1
required
consensus_sequence_id bigint Foreign key to the consensus sequence table, linking to the sequence that was BLASTed.
consensus_sequence string Copy of the consensus sequence that was used for identification.
blast_database string Name of the BLAST database used for identification.
examples: ['BOLD', 'GenBank', 'NCBI_NT']
date_identified_ts_utc timestamp UTC timestamp when the BLAST identification was performed.
identification_blasted string The taxonomic name that was originally submitted for BLASTing (often from morphological identification).
identification_dna string The scientific name returned by the DNA identification (BLAST result).
examples: ['Apis mellifera', 'Bombus terrestris']
identification_dna_count long Count of identical BLAST hits that were merged into this single record.
identification_dna_family string Family-level taxonomic classification from the DNA identification.
examples: ['Apidae', 'Formicidae']
identification_dna_kingdom string Kingdom-level taxonomic classification from the DNA identification.
examples: ['Animalia', 'Fungi', 'Plantae']
identification_dna_rank string Taxonomic rank of the identification result.
examples: ['species', 'genus', 'family']
identification_method string Method used for DNA-based identification.
examples: ['blast', 'metabarcoding']
is_valid_protein string Whether the sequence translates to a valid protein.
examples: ['yes', 'no', 'not translated']
max_identity_percentage double Maximum identity percentage across all BLAST hits for this identification.
examples: [99.5, 98.2, 100.0]
minimum_evalue double Minimum E-value across all BLAST hits (lower values indicate better matches).
source string The name of the source system used to create this record in the table.
required
examples: ['galaxy', 'bioinformaticians']
source_database string The source of the sequence and taxon ID in the reference database.
examples: ['BOLD', 'GenBank']
source_id string The ID of this record in the source system, including negative blast results.
required
blast_hit_id string The ID of this record in the source system for positive BLAST hits.
required
taxonomically_validated boolean Whether the DNA identification has been taxonomically validated against the morphological identification.
taxonomically_validated_threshold double The threshold value used to determine taxonomic validation.
inserted_ts_utc timestamp UTC timestamp when record was inserted.
updated_ts_utc timestamp UTC timestamp when record was last updated.

SLA Properties