Differences

This shows you the differences between two versions of the page.

Link to this comparison view

public:spia [2013/04/20 15:34]
f.demichelis@unitn.it
public:spia [2014/04/18 09:35] (current)
davide.prandi@unitn.it
Line 9: Line 9:
  
 | {{ :spia_distrib.jpg?300 }} Schematic illustration of probabilistic test settings. The figure shows the binomial distributions of real match pair population (red dots) and of non-pair population (blue dots) for N equal to 30 and PM and Pnon-M, for the real match pair population and the non-pair population, equal to 0.9 and 0.4. The red, blue and green bars define regions of ‘different’ (mnon-M set equal to 1), ‘uncertain’ and ‘similar’ (mM set equal to 2) SPIA test calls. The smaller the number of SNPs is, the narrower the region of uncertainty and the higher the probability of making an incorrect call. |  {{ :spia_schema.jpg?300 }} Schema of SNP panel identification assay (SPIA) applicability and use modality. | | {{ :spia_distrib.jpg?300 }} Schematic illustration of probabilistic test settings. The figure shows the binomial distributions of real match pair population (red dots) and of non-pair population (blue dots) for N equal to 30 and PM and Pnon-M, for the real match pair population and the non-pair population, equal to 0.9 and 0.4. The red, blue and green bars define regions of ‘different’ (mnon-M set equal to 1), ‘uncertain’ and ‘similar’ (mM set equal to 2) SPIA test calls. The smaller the number of SNPs is, the narrower the region of uncertainty and the higher the probability of making an incorrect call. |  {{ :spia_schema.jpg?300 }} Schema of SNP panel identification assay (SPIA) applicability and use modality. |
 +
 +==== USER MANUAL ====
 +
 +SPIA comes in two versions.\\ 
 +The R package SPIAssay can be installed directly from the [[http://cran.r-project.org/|CRAN]] project website.\\ 
 +The standalone script can be downloaded {{:spia.zip|SPIA}} and works under linux.
 +
 +=== System Requirements ===
 +
 +SPIA requires R installation (>= 1.8.0). \\
 +
 +
 +=== Standalone SPIA ===
 +
 +The standalone script includes a file SPIA.R, that needs to be executable, and a file SPIAfunctions.R  that includes all the functions used by SPIA.R. SPIA searches for Rscript in the environment and can be run from command line with the command\\ 
 +  SPIA.R <config_file>
 +The ''<config_file>'' follows R syntax and contains four sections:
 +  - Input files
 +  - Parameters for the SPIA statistical test
 +  - Output files
 +  - Other parameters
 +
 +== 1. Input files ==
 +
 +## Location of SPIAfunctions.R\\ 
 +SPIAfunctions_location = "path to SPIAfunctions.R file"
 +
 +## List of VCF files. Each VCF file must have at least one genotype column. If two VCF files contain the genotype of the same sample (identical sample ID), only the last one is used. If the list of SNPs in a VCF file does not match the list of SNPs of the first VCF, it will be ignored.\\
 +vcfFileList = "path to VCF file list"
 +
 +
 +== 2. Parameters for the SPIA statistical test ==
 +
 +## Probability that two matching samples (e.g. biological/technical replicates, normal and tumor from same individual) have different genotypes\\ 
 +Pmm = 0.1 
 +
 +## Given N SNPs, the maximum allowed distance between two matching samples is Pmm + N * nsigma\\ 
 +nsigma = 2 
 +
 +## Probability that two unrelated samples have different genotypes (e.g. with ideal SNPs close to 0.6)\\ 
 +Pmm_nonM = 0.6
 +
 +## Given N SNPs, the minimum allowed distance between two unrelated samples is Pmm_nonM - N * nsigma_nonM\\ 
 +nsigma_nonM = 5
 +
 +## Minimum percentage of valid SNPs genotypes required to perform the SPIA statistical test (out of N SNPs)\\ 
 +PercValidCall = 0.7
 +
 +
 +== 3. Output files ==
 +
 +## SPIA table\\ 
 +outSPIAtable_file = "path of the SPIA output table"
 +
 +## SPIA can optionally plot a graphical representation of the test\\ 
 +saveSPIAplot = T
 +
 +## SPIA plot file name\\ 
 +SPIAplot_file = "path of the SPIA output graph"
 +
 +## Save SPIA genotype (for debugging)\\ 
 +saveGenotype = F
 +
 +## SPIA plot file name\\ 
 +genotypeTable_file = ""
 +
 +== 4. Other parameters ==
 +
 +# Print verbose information (for debugging purpose)\\ 
 +verbose = F
 +
 +# Print output on screen (if F it create a log file)\\ 
 +print_on_screen = T
 +
 +
 +=== SPIA output table ===
 +
 +SPIA output table has a row for each possible pair of samples analyzed. Each line includes the following information:
 +  * Sample_1 and Sample_2: identifiers of the two samples analyzed
 +  * Distance: genotype distance computed by SPIA
 +  * SPIA_Score: says if Sample_1 and Sample_2 are Similar, Different, or Uncertain
 +  * SNP_available: number of valid SNPs used for computing genotype distance
 +  * Total_SNP: total number of SNPs provided 
 +  * One_SNP_NA: number of SNPs without genotype information in exactly one sample
 +  * Bot_SNP_NA: number of SNPs without genotype information in both samples
 +  * Diff_AvsB_or_BvsA: number of SNPs with genotype AA in Sample_1 and genotype BB in Sample_2, or vice versa
 +  * Diff_AorBvsAB_or_vic: number of SNPs with genotype AA or BB in Sample_1 and genotype AB in Sample_2, or vice versa
 +  * DiffABvsAorB: number of SNPs with genotype AA or BB in Sample_1 and genotype AB in Sample_2
 +  * counterBothHomoz: number of SNPs homozygous in both Sample_1 and  Sample_2
 +  * counterBothHeter: number of SNPs heterozygous in both Sample_1 and  Sample_2
 +
 +
 +=== Example ===
 +
 +The package {{spia.zip|SPIA}} contains a directory Bin with the SPIA.R and the file SPIAfunctions.R with the SPIA functions. The package also comes with a ready to use example folder Example. The folder contains a SPIa config file SPIA.configFile.R, a vcf file CEU.exon.2010_03.genotypes.143SNPs.vcf, and a list of one VCF file named 1000G.CEU.exon.vcfList.txt. To test SPIA unzip spia.zip package, enter into the folder SPIA, and type\\ 
 +
 +  ./Bin/SPIA.R ./Example/SPIA.configFile.R
 +
 +If SPIA successfully complete the analysis you will find two more files within the Example folder:\\ 
 +1000G.CEU.exon.vcfList.SPIAtest.csv\\ 
 +1000G.CEU.exon.vcfList.SPIAplot.pdf\\ 
 +that represent the tabular and graphical output of SPIA, respectively.
 +
  
 ==== DOWNLOADS ==== ==== DOWNLOADS ====
Line 15: Line 118:
   * [[http://cran.r-project.org/web/packages/SPIAssay/SPIAssay.pdf|Reference Manual ]]   * [[http://cran.r-project.org/web/packages/SPIAssay/SPIAssay.pdf|Reference Manual ]]
   * {{:public:nar_2008-demichelis-2446-56-1.pdf|Original Manuscript, NAR 2008}}   * {{:public:nar_2008-demichelis-2446-56-1.pdf|Original Manuscript, NAR 2008}}
 +  * {{:SPIA_v1.1.zip|SPIA}} 
 +