Skip to content

Biocloud

enriched_consensus_sequence

enriched_consensus_sequence¶

Info¶

Data contract for the Consensus Sequence table of the dna domain of the Biocloud. Contains DNA sequences generated from sequencing amplicons, representing the consensus of multiple reads. - name: Consensus Sequence - version: 0.0.1 - status: active

Terms of Use¶

Purpose¶

Data contract for the Consensus Sequence table of the dna domain of the Biocloud. Contains DNA sequences generated from sequencing amplicons, representing the consensus of multiple reads.

Servers¶

Name	Type	Attributes
production	databricks	No description. • environment: production • roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}] • catalog: dna_production • host: dbc-2030845a-6c3b.cloud.databricks.com • schema_: enriched
development	databricks	No description. • environment: development • roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}] • catalog: dna_development • host: dbc-03ed8bbb-c0ec.cloud.databricks.com • schema_: enriched

Schema¶

consensus_sequence¶

The consensus sequence table contains DNA sequences generated from sequencing runs. When an amplicon is sequenced, multiple reads are combined to produce a consensus sequence. This table must be processed after amplicon and sequencing_run tables.

Field	Type	Attributes
consensus_sequence_id	bigint	Biocloud generated identifier for the consensus sequence. • `primaryKey` • primaryKeyPosition: 1 • `required`
amplicon_id	bigint	Foreign key to the amplicon table, linking this sequence to its source amplicon.
is_valid_protein	string	Indicates if the sequence translates to a valid protein without stop codons. 'not translated' means translation was not attempted. • examples: ['yes', 'no', 'not translated']
supporting_read_count	bigint	Number of individual reads used to construct this consensus sequence.
read_ratio	decimal	Ratio metric related to read support for the consensus.
consensus_sequence_length	bigint	Length of the consensus sequence in base pairs. • examples: [658, 312, 1500]
consensus_sequence	string	The actual DNA sequence string (ATCG format). • `required`
consensus_sequence_hash	string	SHA256 hash of the consensus sequence for deduplication and comparison.
sequence_generated_datetime	timestamp	Timestamp when the consensus sequence was generated by the sequencing pipeline.
source_id	string	The ID of this record in the source system. • `required`
source	string	The name of the source system used to create this record in the table. • `required` • examples: ['nanopore']
inserted_ts_utc	timestamp	UTC timestamp when record was inserted.
updated_ts_utc	timestamp	UTC timestamp when record was last updated.

SLA Properties¶