SNP panel identification assay (SPIA) user manual

SPIA requires R installation (>= 1.8.0) and it comes in two versions.
The R package SPIAssay can be installed directly from the CRAN project website.
The standalone script can be downloaded SPIA and works under linux.

Standalone SPIA

The standalone script includes a file SPIA.R, that needs to be executable, and a file SPIAfunctions.R that includes all the functions used by SPIA.R. SPIA searches for Rscript in the environment and can be run from command line with the command

SPIA.R <config_file>

The <config_file> follows R syntax and contains four sections:

  1. Input files
  2. Parameters for the SPIA statistical test
  3. Output files
  4. Other parameters
1. Input files

## Location of SPIAfunctions.R
SPIAfunctions_location = “path to SPIAfunctions.R file”

## List of VCF files. Each VCF file must have at least one genotype column. If two VCF files contain the genotype of the same sample (identical sample ID), only the last one is used. If the list of SNPs in a VCF file does not match the list of SNPs of the first VCF, it will be ignored.
vcfFileList = “path to VCF file list”

2. Parameters for the SPIA statistical test

## Probability that two matching samples (e.g. biological/technical replicates, normal and tumor from same individual) have different genotypes
Pmm = 0.1

## Given N SNPs, the maximum allowed distance between two matching samples is Pmm + N * nsigma
nsigma = 2

## Probability that two unrelated samples have different genotypes (e.g. with ideal SNPs close to 0.6)
Pmm_nonM = 0.6

## Given N SNPs, the minimum allowed distance between two unrelated samples is Pmm_nonM - N * nsigma_nonM
nsigma_nonM = 5

## Minimum percentage of valid SNPs genotypes required to perform the SPIA statistical test (out of N SNPs)
PercValidCall = 0.7

3. Output files

## SPIA table
outSPIAtable_file = “path of the SPIA output table”

## SPIA can optionally plot a graphical representation of the test
saveSPIAplot = T

## SPIA plot file name
SPIAplot_file = “path of the SPIA output graph”

## Save SPIA genotype (for debugging)
saveGenotype = F

## SPIA plot file name
genotypeTable_file = ””

4. Other parameters

# Print verbose information (for debugging purpose)
verbose = F

# Print output on screen (if F it create a log file)
print_on_screen = T

SPIA output table

SPIA output table has a row for each possible pair of samples analyzed. Each line includes the following information:

  • Sample_1 and Sample_2: identifiers of the two samples analyzed
  • Distance: genotype distance computed by SPIA
  • SPIA_Score: says if Sample_1 and Sample_2 are Similar, Different, or Uncertain
  • SNP_available: number of valid SNPs used for computing genotype distance
  • Total_SNP: total number of SNPs provided
  • One_SNP_NA: number of SNPs without genotype information in exactly one sample
  • Bot_SNP_NA: number of SNPs without genotype information in both samples
  • Diff_AvsB_or_BvsA: number of SNPs with genotype AA in Sample_1 and genotype BB in Sample_2, or vice versa
  • Diff_AorBvsAB_or_vic: number of SNPs with genotype AA or BB in Sample_1 and genotype AB in Sample_2, or vice versa
  • DiffABvsAorB: number of SNPs with genotype AA or BB in Sample_1 and genotype AB in Sample_2
  • counterBothHomoz: number of SNPs homozygous in both Sample_1 and Sample_2
  • counterBothHeter: number of SNPs heterozygous in both Sample_1 and Sample_2

Example

The package SPIA contains a directory Bin with the SPIA.R and the file SPIAfunctions.R with the SPIA functions. The package also comes with a ready to use example folder Example. The folder contains a SPIa config file SPIA.configFile.R, a vcf file CEU.exon.2010_03.genotypes.143SNPs.vcf, and a list of one VCF file named 1000G.CEU.exon.vcfList.txt. To test SPIA unzip spia.zip package, enter into the folder SPIA, and type

./Bin/SPIA.R ./Example/SPIA.configFile.R

If SPIA successfully complete the analysis you will find two more files within the Example folder:
1000G.CEU.exon.vcfList.SPIAtest.csv
1000G.CEU.exon.vcfList.SPIAplot.pdf
that represent the tabular and graphical output of SPIA, respectively.