Translational research hinges on the ability to make observations in model systems and to implement those findings into clinical applications, such as the development of diagnostic tools or targeted therapeutics. Tumor cell lines are commonly used to model carcinogenesis. The same tumor cell line can be simultaneously studied in multiple research laboratories throughout the world, theoretically generating results that are directly comparable. One important assumption in this paradigm is that researchers are working with the same cells. However, recent work using high throughput genomic analyses questions the accuracy of this assumption. Observations by our group and others suggest that experiments reported in the scientific literature may contain pre-analytic errors due to inaccurate identities of the cell lines employed. To address this problem, we developed a simple approach that enables an accurate determination of cell line and sample identity. We described (Demichelis F, et al, Nucleic Acids Research, 2008) the empirical development of a SNP panel identification assay (SPIA) compatible with routine use in the laboratory setting to ensure the identity of tumor cell lines and human tumor samples throughout the course of long term research use.
SPIA comes in two versions.
The R package SPIAssay can be installed directly from the CRAN project website.
The standalone script can be downloaded SPIA and works under linux.
SPIA requires R installation (>= 1.8.0).
The standalone script includes a file SPIA.R, that needs to be executable, and a file SPIAfunctions.R that includes all the functions used by SPIA.R. SPIA searches for Rscript in the environment and can be run from command line with the command
SPIA.R <config_file>
The <config_file>
follows R syntax and contains four sections:
## Location of SPIAfunctions.R
SPIAfunctions_location = “path to SPIAfunctions.R file”
## List of VCF files. Each VCF file must have at least one genotype column. If two VCF files contain the genotype of the same sample (identical sample ID), only the last one is used. If the list of SNPs in a VCF file does not match the list of SNPs of the first VCF, it will be ignored.
vcfFileList = “path to VCF file list”
## Probability that two matching samples (e.g. biological/technical replicates, normal and tumor from same individual) have different genotypes
Pmm = 0.1
## Given N SNPs, the maximum allowed distance between two matching samples is Pmm + N * nsigma
nsigma = 2
## Probability that two unrelated samples have different genotypes (e.g. with ideal SNPs close to 0.6)
Pmm_nonM = 0.6
## Given N SNPs, the minimum allowed distance between two unrelated samples is Pmm_nonM - N * nsigma_nonM
nsigma_nonM = 5
## Minimum percentage of valid SNPs genotypes required to perform the SPIA statistical test (out of N SNPs)
PercValidCall = 0.7
## SPIA table
outSPIAtable_file = “path of the SPIA output table”
## SPIA can optionally plot a graphical representation of the test
saveSPIAplot = T
## SPIA plot file name
SPIAplot_file = “path of the SPIA output graph”
## Save SPIA genotype (for debugging)
saveGenotype = F
## SPIA plot file name
genotypeTable_file = ””
# Print verbose information (for debugging purpose)
verbose = F
# Print output on screen (if F it create a log file)
print_on_screen = T
SPIA output table has a row for each possible pair of samples analyzed. Each line includes the following information:
The package SPIA contains a directory Bin with the SPIA.R and the file SPIAfunctions.R with the SPIA functions. The package also comes with a ready to use example folder Example. The folder contains a SPIa config file SPIA.configFile.R, a vcf file CEU.exon.2010_03.genotypes.143SNPs.vcf, and a list of one VCF file named 1000G.CEU.exon.vcfList.txt. To test SPIA unzip spia.zip package, enter into the folder SPIA, and type
./Bin/SPIA.R ./Example/SPIA.configFile.R
If SPIA successfully complete the analysis you will find two more files within the Example folder:
1000G.CEU.exon.vcfList.SPIAtest.csv
1000G.CEU.exon.vcfList.SPIAplot.pdf
that represent the tabular and graphical output of SPIA, respectively.