Spatial transcriptomic technologies were designed to interrogate gene expression across different regions of tissues. When a genetic alteration occurs in an expressed gene, it can be detected in RNA-sequencing data. Therefore, we developed software to detect and visualize somatic alterations in spatial transcriptomics data.

Chen L, Chang D, Tandukar B, Deivendran D, Pozniak J, Cruz-Pacheco N, Cho R, Cheng J, Yeh I, Marine C, Bastian BC, Ji AL, Shain AH. Visualizing somatic alterations in spatial transcriptomics data of skin cancer. Genome Biology. December 2023. Github. Tutorial on YoutubeAvailable Here.


CNVkit was designed to infer copy number information from next generation sequencing data. To infer copy number information from DNA-sequencing data, CNVkit recognizes common sources of bias (e.g. GC-content) and removes their effects on sequencing depth; the remaining variability in sequencing depth is assumed to be driven by copy number alterations. Another innovative feature of CNVkit is its use of off-target reads to infer copy number information.

Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Computational Biology. April 2016. Github. Tutorial on Youtube. Available Here.

We later added new functionalities to infer copy number information from RNA-sequencing data. A preprint and tutorial overviewing these new features are shown below.

Talevich E, Shain AH. CNVkit-RNA: Copy Number Inference from RNA-Sequencing Data. BioRxiv. September 2018. Github. Tutorial on Youtube. Available Here.


Sequencing coverage can vary dramatically within a bam file. To determine the exact number of basepairs that reach a user-defined threshold of coverage, we developed the Footprints software. We typically use Footprints to help calculate mutation burdens. Specifically, Footprints can provide the megabases of DNA with sufficient sequencing coverage to call a mutation.

Tang J, Fewins E, Chang D, Zeng H, Liu S, Jorapur A, Belote RL, McNeal AS, Yeh I, Arron ST, Judson-Torres RL, Bastian BC, Shain AH. The genomic landscapes of individual melanocytes from human skin. Nature. March 2020. Github. Tutorial on Youtube. Available Here.


The mutation burden across the genome is not uniform, and therefore some genes can accumulate an excess of mutations in high-mutation-burden cancers even if they are not under selection by the tumor. LOFsigRank performs a permutation-based statistical analysis to determine which gene genes have an enrichment of deleterious mutations (as opposed to an excess of any mutations) to identify tumor suppressor gene candidates.

Shain AH, Garrido M, Botton T, Talevich E, Yeh I, Sanborn JZ, Chung J, Wang NJ, Kakavand H, Mann GJ, Thompson JF, Wiesner T, Roy R, Olshen AB, Gagnon A, Gray JW, Huh N, Hur JS, Busam KJ, Scolyer RA, Cho RJ, Murali R, Bastian BC. Exome sequencing of desmoplastic melanoma identifies recurrent NFKBIE promoter mutations and diverse activating mutations in the MAPK pathway. Nature Genetics. October 2015. Github. Available Here.