Uploading data to SquiDBase
Before uploading data to SquiDBase, you need to create an account. This ensures that each dataset is linked to a submitter and a principal investigator (PI) or responsible person.
You can sign up directly here or visit the SquiDBase website, click "Sign in" in the top-right corner, and select "Sign up here".
Checklist before upload
Before proceeding with your upload, please ensure that your data meets the following requirements:
- The data must be of microbial or viral origin and must not contain human data to comply with privacy regulations. Metadata must also be free of patient-identifiable information.
- Each upload should originate from a single Nanopore sequencing chemistry. This information is automatically extracted from your POD5 files during upload.
- Metadata should be as complete as possible. Once uploaded, metadata cannot be modified, except for the info field, which remains editable.
Once your data meets these criteria, you have two options:
1. If each POD5 file contains only one biological species, you can proceed with the upload process filtered data.
2. Alternatively, if your dataset contains a mix of viral or microbial species, you can filter out POD5 reads per species using SquiDPipe, a dedicated pipeline designed for this purpose. These steps are explained under upload process mixed data
Upload when data is already filtered
If your data has already been filtered (i.e., each POD5 file contains reads from only one microbial or viral species), you only need to upload:
- The POD5 files
- A CSV file with the corresponding metadata
Metadata CSV File Format
Each dataset uploaded to SquiDBase must be accompanied by a metadata/samplesheet CSV file. This file provides essential information about the samples and ensures proper indexing and retrieval.
General guidelines:
- The metadata file should be in CSV format (
.csv
). - Missing values should be filled with "NA" (not left empty).
- Both comma (
,
) and semicolon (;
) delimiters are accepted. - Each row in the file represents one POD5 file and its corresponding metadata.
- Metadata cannot be modified after upload, except for the
remarks
field.
Metadata Fields
The table below details the required fields in the metadata file, including their descriptions, expected data types, and example values.
Column Name | Description | Data Type | Example Value |
---|---|---|---|
filename |
The filename of the POD5 sequencing data to be uploaded. | String | 37124_CHIKV-1.pod5 |
species_taxid |
NCBI taxonomy identifier for the microbial species in the sample. | NCBI taxonomy | 37124 |
year_of_isolation |
The year the pathogen or microbial/viral species was isolated. | Integer | 2014 |
country_of_isolation |
The ISO 3166 country code indicating where the pathogen was isolated. | ISO 3166 country code | BE |
geographic_origin |
The geographic origin of the pathogen, if available. Can differ from country_of_isolation (e.g., in cases of imported infections). |
ISO 3166 country code | ET |
strain_lineage |
The specific strain, lineage, or sequence type of the uploaded pathogen data. | String | BA.5 |
source_id |
A unique identifier for the source of the pathogen sample, using the UBERON ontology. | UBERON ontology | UBERON:_0000178 |
host_taxid |
NCBI taxonomy identifier for the host species from which the pathogen was isolated, if applicable. | NCBI taxonomy | 9606 |
internal_lab_id |
Internal identification code assigned to the sample by the laboratory. | String | PLAS-ETH-2023-0147 |
diagnostic_method_id |
The diagnostic method used to detect the pathogen, using the OBI ontology. | OBI ontology | OBI:_0003045 |
remarks |
Additional notes or remarks about the sample, often for internal collection records. | Text field | "Strain donated by institute X." |
A template CSV file is available for download: Download Template.
Example Metadata Format
filename | species_name | species_taxid | year_of_isolation | country_of_isolation | geographic_origin | strain_lineage | source_id | host_taxid | internal_lab_id | diagnostic_method_id | remarks |
---|---|---|---|---|---|---|---|---|---|---|---|
file1.pod5 | DENV | 11053 | 2014 | NA | NA | ECSA | NA | NA | NA | NA | ITM collection |
file2.pod5 | HIV | 11676 | 2015 | BE | NI | ECSA | NA | NA | NA | NA | ITM collection |
file3.pod5 | ZIKV | 64320 | 11053 | 2015 | BE | ID | NA | NA | NA | NA | NA |
file4.pod5 | SARS-CoV-2_A | 2697049 | 11060 | 2018 | BE | PE | NA | NA | NA | NA | NA |
file5.pod5 | SARS-CoV-2_B | 2697049 | 11060 | 2018 | BE | PE | NA | NA | NA | NA | NA |
Upload Interface in SquiDBase
After logging in to SquiDBase, you can access the Submit page.
On this page, you will need to:
1. Fill in your contact details
2. Provide a short description of your dataset
3. Upload a Samplesheet CSV file containing metadata for your POD5 files
Dataset submission page in SquiDBase.
Uploading POD5 Files
Below the metadata submission section, you will find an area to upload your POD5 files, as shown in the image below.
Once both the CSV metadata file and the POD5 files are uploaded, SquiDBase will perform basic validation checks, such as:
- Ensuring that all filenames in the CSV match the uploaded POD5 files
- Checking for potential formatting issues
If no errors are detected, you can proceed by clicking the Upload button.
POD5 upload section.
Making Your Data Public
Once your data is uploaded, you will have the option to:
- Immediately make your dataset publicly available
- Keep the dataset private and enable the "public" toggle later when you are ready to share it (e.g., after your publication is under peer review)
Upload of mixed datasets – running SquiDPipe
Many Nanopore sequencing runs, particularly those involving barcoded samples with multiple microbial or viral species, contain mixed species per POD5 file.
To handle this, use SquiDPipe, a Nextflow pipeline designed to process barcoded runs, including FASTQ files. It will:
- Classify reads based on the basecalled sequences
- Automatically separate POD5 files by species
- Output a CSV for uploading to SquiDBase that matches the outputted POD5 filenames
For instructions on running SquiDPipe, visit:
SquiDPipe GitHub Repository