enriched_identification_dna¶
Info¶
Data contract for the Identification DNA table of the dna domain of the Biocloud. Contains taxonomic identifications derived from DNA sequence analysis via BLAST or similar methods. - name: Identification DNA - version: 0.0.1 - status: active
Terms of Use¶
Purpose¶
Data contract for the Identification DNA table of the dna domain of the Biocloud. Contains taxonomic identifications derived from DNA sequence analysis via BLAST or similar methods.
Servers¶
| Name | Type | Attributes |
|---|---|---|
| production | databricks | No description. • environment: production • roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}] • catalog: dna_production • host: dbc-2030845a-6c3b.cloud.databricks.com • schema_: enriched |
| development | databricks | No description. • environment: development • roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}] • catalog: dna_development • host: dbc-03ed8bbb-c0ec.cloud.databricks.com • schema_: enriched |
Schema¶
identification_dna¶
The Identification DNA table contains taxonomic identifications derived from DNA sequence analysis. These identifications are generated by comparing consensus sequences against reference databases like BOLD or GenBank using BLAST.
| Field | Type | Attributes |
|---|---|---|
| identification_dna_id | bigint | Biocloud generated identifier for the DNA identification. • primaryKey• primaryKeyPosition: 1 • required |
| consensus_sequence_id | bigint | Foreign key to the consensus sequence table, linking to the sequence that was BLASTed. |
| consensus_sequence | string | Copy of the consensus sequence that was used for identification. |
| blast_database | string | Name of the BLAST database used for identification. • examples: ['BOLD', 'GenBank', 'NCBI_NT'] |
| date_identified_ts_utc | timestamp | UTC timestamp when the BLAST identification was performed. |
| identification_blasted | string | The taxonomic name that was originally submitted for BLASTing (often from morphological identification). |
| identification_dna | string | The scientific name returned by the DNA identification (BLAST result). • examples: ['Apis mellifera', 'Bombus terrestris'] |
| identification_dna_count | long | Count of identical BLAST hits that were merged into this single record. |
| identification_dna_family | string | Family-level taxonomic classification from the DNA identification. • examples: ['Apidae', 'Formicidae'] |
| identification_dna_kingdom | string | Kingdom-level taxonomic classification from the DNA identification. • examples: ['Animalia', 'Fungi', 'Plantae'] |
| identification_dna_rank | string | Taxonomic rank of the identification result. • examples: ['species', 'genus', 'family'] |
| identification_method | string | Method used for DNA-based identification. • examples: ['blast', 'metabarcoding'] |
| is_valid_protein | string | Whether the sequence translates to a valid protein. • examples: ['yes', 'no', 'not translated'] |
| max_identity_percentage | double | Maximum identity percentage across all BLAST hits for this identification. • examples: [99.5, 98.2, 100.0] |
| minimum_evalue | double | Minimum E-value across all BLAST hits (lower values indicate better matches). |
| source | string | The name of the source system used to create this record in the table. • required• examples: ['galaxy', 'bioinformaticians'] |
| source_database | string | The source of the sequence and taxon ID in the reference database. • examples: ['BOLD', 'GenBank'] |
| source_id | string | The ID of this record in the source system, including negative blast results. • required |
| blast_hit_id | string | The ID of this record in the source system for positive BLAST hits. • required |
| taxonomically_validated | boolean | Whether the DNA identification has been taxonomically validated against the morphological identification. |
| taxonomically_validated_threshold | double | The threshold value used to determine taxonomic validation. |
| inserted_ts_utc | timestamp | UTC timestamp when record was inserted. |
| updated_ts_utc | timestamp | UTC timestamp when record was last updated. |