enriched_consensus_sequence¶
Info¶
Data contract for the Consensus Sequence table of the dna domain of the Biocloud. Contains DNA sequences generated from sequencing amplicons, representing the consensus of multiple reads. - name: Consensus Sequence - version: 0.0.1 - status: active
Terms of Use¶
Purpose¶
Data contract for the Consensus Sequence table of the dna domain of the Biocloud. Contains DNA sequences generated from sequencing amplicons, representing the consensus of multiple reads.
Servers¶
| Name | Type | Attributes |
|---|---|---|
| production | databricks | No description. • environment: production • roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}] • catalog: dna_production • host: dbc-2030845a-6c3b.cloud.databricks.com • schema_: enriched |
| development | databricks | No description. • environment: development • roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}] • catalog: dna_development • host: dbc-03ed8bbb-c0ec.cloud.databricks.com • schema_: enriched |
Schema¶
consensus_sequence¶
The consensus sequence table contains DNA sequences generated from sequencing runs. When an amplicon is sequenced, multiple reads are combined to produce a consensus sequence. This table must be processed after amplicon and sequencing_run tables.
| Field | Type | Attributes |
|---|---|---|
| consensus_sequence_id | bigint | Biocloud generated identifier for the consensus sequence. • primaryKey• primaryKeyPosition: 1 • required |
| amplicon_id | bigint | Foreign key to the amplicon table, linking this sequence to its source amplicon. |
| is_valid_protein | string | Indicates if the sequence translates to a valid protein without stop codons. 'not translated' means translation was not attempted. • examples: ['yes', 'no', 'not translated'] |
| supporting_read_count | bigint | Number of individual reads used to construct this consensus sequence. |
| read_ratio | decimal | Ratio metric related to read support for the consensus. |
| consensus_sequence_length | bigint | Length of the consensus sequence in base pairs. • examples: [658, 312, 1500] |
| consensus_sequence | string | The actual DNA sequence string (ATCG format). • required |
| consensus_sequence_hash | string | SHA256 hash of the consensus sequence for deduplication and comparison. |
| sequence_generated_datetime | timestamp | Timestamp when the consensus sequence was generated by the sequencing pipeline. |
| source_id | string | The ID of this record in the source system. • required |
| source | string | The name of the source system used to create this record in the table. • required• examples: ['nanopore'] |
| inserted_ts_utc | timestamp | UTC timestamp when record was inserted. |
| updated_ts_utc | timestamp | UTC timestamp when record was last updated. |