Differences

This shows you the differences between two versions of the page.

Link to this comparison view

public:ethseq_v1 [2017/04/08 21:35]
alessandro.romanel@unitn.it
public:ethseq_v1 [2017/04/08 21:38] (current)
alessandro.romanel@unitn.it
Line 1: Line 1:
  
 <html> <html>
- <span style="color:gray;font-size:200%;">EthSEQ version 1.0</span>+ <span style="color:gray;font-size:200%;">EthSEQ: ethnicity annotation from whole-exome sequencing data</span>
 </html> </html>
  
 ---- ----
  
-=== BASIC USAGE ===+=== BASIC INFO ===
 EthSEQ is an R script that allows to infer ethnicity of a set of samples for which whole exome sequencing (WES) data is available from differential SNP genotypes profiles. It combines: EthSEQ is an R script that allows to infer ethnicity of a set of samples for which whole exome sequencing (WES) data is available from differential SNP genotypes profiles. It combines:
-  - [[http://hapmap.ncbi.nlm.nih.gov/|HapMap]] data, used to generate reference models for specific WES platforms; +  - [[http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/|1,000 Genomes Project]] genotype data, used to generate reference models for specific WES platforms; 
-  - [[public:aseq|ASEQ]], used to genotype the input samples, and +  - [[public:aseq|ASEQ]], used to genotype the input samples with unknow ethnicity, and 
-  - [[http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm|EIGENSTRAT]], used to perform principal component analysis on the aggregated genotyped data. +  - [[http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm|EIGENSTRAT]], used to perform principal component analysis on the aggregated genotyped data. \\
  
-\\ +The package is organized as follows:\\
-EthSEQ script has the following syntax: \\ +
-  EthSEQ.R <ConfigurationFile.R> +
- +
-Folders are organized as follows:\\+
   EthSEQ   EthSEQ
    -> EthSEQ.R     -> EthSEQ.R 
    -> Functions.R     -> Functions.R 
 +   -> CreateReferenceModel.R
 +   -> MultiStepRefinementAnalysis.R
    -> Models    -> Models
    -> VCF    -> VCF
Line 25: Line 23:
    -> Example    -> Example
  
-EthSEQ.R is the R script required to infer ethnicity with Functions.R file containing utility functions.\\ +  * EthSEQ.R is the R script required to infer ethnicity while Functions.R contains utility functions.\\ 
-The Models folder contains HapMap models for specific WES platforms (Haloplex, SureSelect version2 and version 4 are currently available).\\ +  * CreateReferenceModel.R is the R script required to generate e new reference model given genotype data of ethnical groups and a set of specified captured regions.\\ 
-The VCF folder contains the **lists of SNPs** (in VCF format) used to generate the HapMap models for the different WES platforms.\\ +  * MultiStepRefinementAnalysis.R is the script R that implements  the multi-step refinement analysis.\\ 
-The Include folder contains ASEQ binaries and EIGENSTRAT software folder with pre-compiled binaries available. +  * Models folder contains reference models for a set WES platforms (Agilent Haloplex, Roche Nimblegen version 3, Agilent SureSelect version 2 and version 4 are currently available); models are built from 1,000 Genomes Project genotype data**Models folder is empty; models should be downloaded separately from the bottom links and uncompressed in the Models folder.**\\ 
-The Example folder contains and example of configuration file +  * The VCF folder contains the **lists of SNPs** (in VCF format) used to generate the reference models for the available WES platforms.\\ 
-\\ \\+  * The Include folder contains ASEQ binaries and EIGENSTRAT software folder with pre-compiled binaries available.\\ 
 +  * The Example folder contains and example of configuration file, and example of BAM input list of individuals with unknown ethnicity and an example of output report. 
 + 
 +---- 
 + 
 +=== INFER THE ETHNICITY OF A SET OF INDIVIDUALS === 
 + 
 +To infer the ethnicity of a set of individuals run the EthSEQ.R script in the following way: \\ 
 +  Rscript EthSEQ.R <ConfigurationFile.R> 
 The configuration file has the following structure:\\ \\ The configuration file has the following structure:\\ \\
 ##########################################################################\\ ##########################################################################\\
-## Basic folders\\+# Basic folders\\
 source.dir = "EthSEQ path"\\ source.dir = "EthSEQ path"\\
 bam.list = "path to a text file containing the list of BAM files to be analyzed"\\ bam.list = "path to a text file containing the list of BAM files to be analyzed"\\
 out.dir = "path to the output folder"\\ out.dir = "path to the output folder"\\
-eigenstrat.path = "path to EIGENTRAT binaries folder"\\ \\ +eigenstrat.path = "path to EIGENTRAT binaries folder"\\ 
- +\\ 
-## Models available \\ +# Models available \\ 
-## SS2 = Sure Select version 2\\ +# SS2 = Agilent Sure Select version 2\\ 
-## SS4 = Sure Select version 4\\ +# SS4 = Agilent Sure Select version 4\\ 
-## HALO = Haloplex\\+# HALO = Agilent Haloplex\\ 
 +# NimblegenV3 = Roche Nimblegen V3\\
 model = "HALO"\\ model = "HALO"\\
 \\ \\
-## ASEQ parameters\\+To run the analysis with your own reference model uncomment the following line and specify the needed variables\\ 
 +# model = "" # keep this empty\\ 
 +# vcf.file = "path to VCF file"\\ 
 +# sif.file = "path to file with ethnical annotations"\\ 
 +# model.ped = "path to PED file with genotype information"\\ 
 +# model.map = "path to MAP file with data of variant specified in the PED file" 
 +\\  
 +\\ 
 +# ASEQ parameters\\
 ASEQ.path = "path to ASEQ binary"\\ ASEQ.path = "path to ASEQ binary"\\
 mbq=20 # minimum base quality\\ mbq=20 # minimum base quality\\
Line 52: Line 68:
 cores=10 # number of cores to be used\\ cores=10 # number of cores to be used\\
 \\ \\
-## output details\\ +analysis options\\  
-verbose=F\\ +run.genotype=TRUE\\ 
-##########################################################################\\ \\+reduce.composite.model = TRUE\\ 
 +composite.model.call.rate = 1\\ 
 +\\ 
 +# output details\\ 
 +verbose=FALSE\\ 
 +########################################################################## 
 + 
 +---- 
 + 
 +=== MULTI-STEP REFINEMENT METHOD  === 
 + 
 +To infer the ethnicity of a set of individuals using the multi-step refinement method run the MultiStepRefinementModel.R script in the following way: \\ 
 +  Rscript MultiStepRefinementModel.R <ConfigurationFile.R> 
 + 
 +The configuration file extends the previous specification by adding the ethnic group sets specification:\\ 
 +# Subsets specification\\ 
 +subsets = list(c("AFR","SAS","EAS","EUR","ASH"),c("EUR","ASH"))\\ 
 + 
 +---- 
 + 
 +=== CREATE A REFERENCE MODEL  === 
 + 
 +To create a new reference model run the CreateReferenceModel.R script in the following way: \\ 
 +  Rscript CreateReferenceModel.R 
 + 
 +by specifying in the script code the following variables:\\ 
 +\\ 
 +## Parameters 
 +sif.file = "specify_path_to_file"\\ 
 +vcf.file = "specify_path_to_file"\\ 
 +phased = FALSE # TRUE if genotypes in VCF format are phased\\ 
 +out.dir = "specify_path_to_dir"\\ 
 +model.name = "specify_model_name"\\ 
 +call.rate = 1 # fraction of samples with genotype calls for a specific SNP\\ 
 +\\ 
 +Sample information file (sif.file) should have the following format:\\ 
 +//Sample\tRace\tGender\\ 
 +S1\tEUR\tmale\\ 
 +S2\tAFR\tfemale\\ 
 +...//\\ \\ 
 +VCF file should specify in the INFO column the MAF information (e.g. MAF=0.32) for all variants.\\ 
 +Genotype should be specified using phased notation (e.g. 0|0,1|0,0|1,1|1,...) or unphased notation (e.g. 0/0,0/1,1/1,...). 
 ---- ----
  
 === REQUIREMENTS === === REQUIREMENTS ===
 EthSEQ requires Linux kernel >= 2.6.15.\\ EthSEQ requires Linux kernel >= 2.6.15.\\
-EthSEQ requires R >= 2.7 and the package SDMTools.\\+EthSEQ requires R >= 2.7 and the package "rgeos".\\
 EthSEQ requires global folder names.\\ EthSEQ requires global folder names.\\
-EthSEQ requires ASEQ (version 1.1.available in Tools folder) available also [[public:aseq|here]]\\+EthSEQ requires ASEQ (version 1.1.11 available in Tools folder) available also [[public:aseq|here]]\\
 EthSEQ requires EIGENSTRAT (version 5.0.2 available in Tools folder) available also [[http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm|here]]\\ EthSEQ requires EIGENSTRAT (version 5.0.2 available in Tools folder) available also [[http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm|here]]\\
 ---- ----
Line 74: Line 132:
 ---- ----
 === DOWNLOADS === === DOWNLOADS ===
- +== Tool versions == 
-  * {{:ethseq_v1_0|EthSEQ}}+  * {{:EthSEQ_v1_0.zip|EthSEQ_v1_0.zip}} 
 +== Reference Models == 
 +  * {{:1000GP_HALO.zip|Reference model for Agilent HaloPlex WES design}} 
 +  * {{:1000GP_SS2.zip|Reference model for Agilent SureSelectV2 WES design}} 
 +  * {{:1000GP_SS4.zip|Reference model for Agilent SureSelectV4 WES design}} 
 +  * {{:1000GP_NimblegenV3.zip|Reference model for Roche Nimblegen V3 WES design}}