SpacerDB Documentation#
This web page is meant to provide documentation about access to and use of the "Global CRISPR Spacer Database" (or "SpacerDB"). This database of CRISPR spacers was established at the DOE Joint Genome Institute by mining publicly available metagenomes from NCBI SRA (up to Dec. 2023) using the SpacerExtractor tool. The database also includes information obtained as part of the analysis of this global CRISPR spacer dataset, including hits to IMG/VR (v4) and IMG/PR (v1), and enables analysis of spacer diversity within and across samples.
This web page includes links to download the entire database along with documentation of the database content and notebooks illustrating how to extract spacers associated with a given taxon or ecosystem, identify spacer hits to new potential targets (e.g. new virus or plasmid sequences), and extract relevant information about speicific spacers. It is meant as a companion to the manuscript describing the identification and analysis of those spacers, now available on bioRxiv.
Getting Started#
- Quick Start - Data access, file structure, and basic usage examples
Documentation#
- Database Overview - Database structure overview and relationships
- Example Notebooks - Interactive Jupyter notebooks with analysis examples
Connected Resources#
- SpacerExtractor tool: SpacerExtractor is available on GitLab and through bioconda
- Code Archive: Code originally used in the Global Spacer manuscript, available in a separate repository. Only for archival purposes.
Citation#
If you use SpacerDB in your research, please cite: