Uploading data to SquiDBase

Before uploading data to SquiDBase, you need to create an account. This ensures that each dataset is linked to a submitter and a principal investigator (PI) or responsible person.

You can sign up directly here or visit the SquiDBase website, click "Sign in" in the top-right corner, and select "Sign up here".

Checklist before upload

Before proceeding with your upload, please ensure that your data meets the following requirements:

The data must be of microbial or viral origin and must not contain human data to comply with privacy regulations. Metadata must also be free of patient-identifiable information.
Each upload should originate from a single Nanopore sequencing chemistry. This information is automatically extracted from your POD5 files during upload.
Metadata should be as complete as possible. Once uploaded, metadata cannot be modified, except for the info field, which remains editable.

Once your data meets these criteria, you have two options: 1. If each POD5 file contains only one biological species, you can proceed with the upload process filtered data.
2. Alternatively, if your dataset contains a mix of viral or microbial species, you can filter out POD5 reads per species using SquiDPipe, a dedicated pipeline designed for this purpose. These steps are explained under upload process mixed data

Upload when data is already filtered

If your data has already been filtered (i.e., each POD5 file contains reads from only one microbial or viral species), you only need to upload:

The POD5 files
A CSV file with the corresponding metadata

Metadata CSV File Format

Each dataset uploaded to SquiDBase must be accompanied by a metadata/samplesheet CSV file. This file provides essential information about the samples and ensures proper indexing and retrieval.

General guidelines:

The metadata file should be in CSV format (.csv).
Missing values should be filled with "NA" (not left empty).
Both comma (,) and semicolon (;) delimiters are accepted.
Each row in the file represents one POD5 file and its corresponding metadata.
Metadata cannot be modified after upload, except for the remarks field.

Metadata Fields

The table below details the required fields in the metadata file, including their descriptions, expected data types, and example values.

Column Name	Description	Data Type	Example Value
`filename`	The filename of the POD5 sequencing data to be uploaded.	String	`37124_CHIKV-1.pod5`
`species_taxid`	NCBI taxonomy identifier for the microbial species in the sample.	NCBI taxonomy	`37124`
`year_of_isolation`	The year the pathogen or microbial/viral species was isolated.	Integer	`2014`
`country_of_isolation`	The ISO 3166 country code indicating where the pathogen was isolated.	ISO 3166 country code	`BE`
`geographic_origin`	The geographic origin of the pathogen, if available. Can differ from `country_of_isolation` (e.g., in cases of imported infections).	ISO 3166 country code	`ET`
`strain_lineage`	The specific strain, lineage, or sequence type of the uploaded pathogen data.	String	`BA.5`
`source_id`	A unique identifier for the source of the pathogen sample, using the UBERON ontology.	UBERON ontology	`UBERON:_0000178`
`host_taxid`	NCBI taxonomy identifier for the host species from which the pathogen was isolated, if applicable.	NCBI taxonomy	`9606`
`internal_lab_id`	Internal identification code assigned to the sample by the laboratory.	String	`PLAS-ETH-2023-0147`
`diagnostic_method_id`	The diagnostic method used to detect the pathogen, using the OBI ontology.	OBI ontology	`OBI:_0003045`
`remarks`	Additional notes or remarks about the sample, often for internal collection records.	Text field	`"Strain donated by institute X."`

A template CSV file is available for download: Download Template.

Example Metadata Format

filename	species_name	species_taxid	year_of_isolation	country_of_isolation	geographic_origin	strain_lineage	source_id	host_taxid	internal_lab_id	diagnostic_method_id	remarks
file1.pod5	DENV	11053	2014	NA	NA	ECSA	NA	NA	NA	NA	ITM collection
file2.pod5	HIV	11676	2015	BE	NI	ECSA	NA	NA	NA	NA	ITM collection
file3.pod5	ZIKV	64320	11053	2015	BE	ID	NA	NA	NA	NA	NA
file4.pod5	SARS-CoV-2_A	2697049	11060	2018	BE	PE	NA	NA	NA	NA	NA
file5.pod5	SARS-CoV-2_B	2697049	11060	2018	BE	PE	NA	NA	NA	NA	NA

Upload Interface in SquiDBase

After logging in to SquiDBase, you can access the Submit page.

On this page, you will need to:
1. Fill in your contact details
2. Provide a short description of your dataset
3. Upload a Samplesheet CSV file containing metadata for your POD5 files

Uploading POD5 Files

Below the metadata submission section, you will find an area to upload your POD5 files, as shown in the image below.

Once both the CSV metadata file and the POD5 files are uploaded, SquiDBase will perform basic validation checks, such as:

Ensuring that all filenames in the CSV match the uploaded POD5 files
Checking for potential formatting issues

If no errors are detected, you can proceed by clicking the Upload button.

Making Your Data Public

Once your data is uploaded, you will have the option to:

Immediately make your dataset publicly available
Keep the dataset private and enable the "public" toggle later when you are ready to share it (e.g., after your publication is under peer review)

Upload of mixed datasets – running SquiDPipe

Many Nanopore sequencing runs, particularly those involving barcoded samples with multiple microbial or viral species, contain mixed species per POD5 file.

To handle this, use SquiDPipe, a Nextflow pipeline designed to process barcoded runs, including FASTQ files. It will:

Classify reads based on the basecalled sequences
Automatically separate POD5 files by species
Output a CSV for uploading to SquiDBase that matches the outputted POD5 filenames

For instructions on running SquiDPipe, visit:
SquiDPipe GitHub Repository