curated_metadata_to_bold¶
Info¶
Data contract for the metadata_to_bold table in the curated layer of the Biocloud. Contains specimen metadata transformed to BOLD's schema format for export, including taxonomic, geographic, and collection information. - name: Metadata to BOLD - version: 0.0.1 - status: active
Terms of Use¶
Purpose¶
Data contract for the metadata_to_bold table in the curated layer of the Biocloud. Contains specimen metadata transformed to BOLD's schema format for export, including taxonomic, geographic, and collection information.
Servers¶
| Name | Type | Attributes |
|---|---|---|
| production | databricks | No description. • environment: production • roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}] • catalog: dna_production • host: dbc-2030845a-6c3b.cloud.databricks.com • schema_: curated |
| development | databricks | No description. • environment: development • roles: [{'role': 'Admins', 'description': 'Access to all the data and settings'}] • catalog: dna_development • host: dbc-03ed8bbb-c0ec.cloud.databricks.com • schema_: curated |
Schema¶
metadata_to_bold¶
Contains specimen metadata transformed to BOLD's upload template schema.
| Field | Type | Attributes |
|---|---|---|
| SAMPLEID | string | Sample identifier, derived from catalog_number. Primary key for BOLD metadata records. • primaryKey• primaryKeyPosition: 1 • required |
| FIELDID | string | Field identifier. Currently empty placeholder. |
| MUSEUMID | string | Museum identifier, derived from catalog_number. • required |
| COLLECTION_CODE | string | Collection code. Currently empty placeholder. |
| INST | string | Institution name, always "Naturalis Biodiversity Center". • required |
| FUNDING_SRC | string | Funding source, always "Dutch Research Council (NWO)". |
| PHYLUM | string | Phylum name from resolved taxonomy. • required |
| CLASS | string | Class name from resolved taxonomy. |
| ORDER | string | Order name from resolved taxonomy. |
| FAMILY | string | Family name from resolved taxonomy. |
| SUBFAMILY | string | Subfamily name. Currently empty placeholder. |
| TRIBE | string | Tribe name. Currently empty placeholder. |
| GENUS | string | Genus name from resolved taxonomy. |
| SPECIES | string | Species name from verbatim identification (only species-level matches included). • required |
| SUBSPECIES | string | Subspecies name. Currently empty placeholder. |
| IDENTIFIED_BY | string | Person or entity who identified the specimen. |
| IDENTIFICATION_METHOD | string | Method used for identification (e.g. Morphology). |
| TAXONOMY_NOTES | string | Additional taxonomy notes, includes original verbatim identification from collector. |
| SEX | string | Sex of the specimen. • examples: ['male', 'female', 'hermaphrodite', 'mixed'] |
| REPRODUCTION | string | Reproduction information. Currently empty placeholder. |
| LIFE_STAGE | string | Life stage of the specimen. • examples: ['larve', 'pupa', 'juvenile', 'adult', 'imago'] |
| SHORT_NOTE | string | Short notes. Currently empty placeholder. |
| NOTES | string | Additional notes. Currently empty placeholder. |
| VOUCHER_TYPE | string | Type of voucher, always "Vouchered:Registered Collection". • required |
| TISSUE_TYPE | string | Type of tissue sample. Currently empty placeholder. |
| SPECIMEN_LINKOUT | string | External link to specimen record. Currently empty placeholder. |
| ASSOCIATED_TAXA | string | Associated taxa from synecology data, concatenation of type and name. |
| ASSOCIATED_SPECIMENS | string | Associated specimens. Currently empty placeholder. |
| COLLECTORS | string | Name(s) of the collector(s), derived from recorded_by field. |
| COLLECTION_DATE_START | string | Start date of collection event in ISO format. |
| COLLECTION_DATE_END | string | End date of collection event in ISO format. |
| COUNTRY/OCEAN | string | Country or ocean where specimen was collected. • required |
| PROVINCE/STATE | string | Province or state. Currently empty placeholder. |
| REGION | string | Geographic region. Currently empty placeholder. |
| SECTOR | string | Geographic sector. Currently empty placeholder. |
| SITE | string | Collection site, derived from locality or locality_verbatim. |
| COORD:LAT | string | Latitude in decimal degrees. |
| COORD:LON | string | Longitude in decimal degrees. |
| ELEV | string | Elevation in meters, represented as single value or range. |
| DEPTH | string | Depth in meters, represented as single value or range. |
| ELEV_ACCURACY | string | Elevation accuracy. Currently empty placeholder. |
| DEPTH_ACCURACY | string | Depth accuracy. Currently empty placeholder. |
| COORD_SOURCE | string | Source of coordinates. Currently empty placeholder. |
| COORD_ACCURACY | string | Coordinate accuracy/uncertainty in meters. |
| COLLECTION_TIME | string | Collection time, represented as single value or range. |
| HABITAT | string | Habitat description where specimen was collected. • examples: ['inclosure dune-area', 'exclosure dune-area', 'sandy beach', 'forest'] |
| SAMPLING_PROTOCOL | string | Sampling protocol or method used. • examples: ['malaise trap', 'pitfall trap', 'leaf litter'] |
| COLLECTION_NOTES | string | Additional collection notes. Currently empty placeholder. |
| SITE_CODE | string | Site code identifier. Currently empty placeholder. |
| COLLECTION_EVENT_ID | string | Collection event identifier. Currently empty placeholder. |
| verbatim_kingdom | string | Kingdom in lowercase, used for partitioning exports. Used for grouping/filtering in Clickhouse, not included in BOLD export file. • required |
| exported_at | string | Timestamp when record was marked for export in format yyyy-MM-ddTHH-mm-ss. Used for grouping/filtering in Clickhouse, not included in BOLD export file. • required |