Skip to content

enriched_consensus_sequence

Info

Data contract for the Consensus Sequence table of the dna domain of the Biocloud. Contains DNA sequences generated from sequencing amplicons, representing the consensus of multiple reads. - name: Consensus Sequence - version: 0.0.1 - status: active

Terms of Use

Purpose

Data contract for the Consensus Sequence table of the dna domain of the Biocloud. Contains DNA sequences generated from sequencing amplicons, representing the consensus of multiple reads.

Servers

Name Type Attributes
production databricks No description.
environment: production
roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}]
catalog: dna_production
host: dbc-2030845a-6c3b.cloud.databricks.com
schema_: enriched
development databricks No description.
environment: development
roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}]
catalog: dna_development
host: dbc-03ed8bbb-c0ec.cloud.databricks.com
schema_: enriched

Schema

consensus_sequence

The consensus sequence table contains DNA sequences generated from sequencing runs. When an amplicon is sequenced, multiple reads are combined to produce a consensus sequence. This table must be processed after amplicon and sequencing_run tables.

Field Type Attributes
consensus_sequence_id bigint Biocloud generated identifier for the consensus sequence.
primaryKey
primaryKeyPosition: 1
required
amplicon_id bigint Foreign key to the amplicon table, linking this sequence to its source amplicon.
is_valid_protein string Indicates if the sequence translates to a valid protein without stop codons. 'not translated' means translation was not attempted.
examples: ['yes', 'no', 'not translated']
supporting_read_count bigint Number of individual reads used to construct this consensus sequence.
read_ratio decimal Ratio metric related to read support for the consensus.
consensus_sequence_length bigint Length of the consensus sequence in base pairs.
examples: [658, 312, 1500]
consensus_sequence string The actual DNA sequence string (ATCG format).
required
consensus_sequence_hash string SHA256 hash of the consensus sequence for deduplication and comparison.
sequence_generated_datetime timestamp Timestamp when the consensus sequence was generated by the sequencing pipeline.
source_id string The ID of this record in the source system.
required
source string The name of the source system used to create this record in the table.
required
examples: ['nanopore']
inserted_ts_utc timestamp UTC timestamp when record was inserted.
updated_ts_utc timestamp UTC timestamp when record was last updated.

SLA Properties