Differences

This shows you the differences between two versions of the page.

Link to this comparison view

public:ethseq [2016/09/22 17:10]
alessandro.romanel@unitn.it
public:ethseq [2017/04/10 09:37] (current)
alessandro.romanel@unitn.it
Line 1: Line 1:
  
 <html> <html>
- <span style="color:gray;font-size:200%;">EthSEQ: Ethnicity inference from whole-exome sequencing data</span>+ <span style="color:gray;font-size:200%;">EthSEQ: ethnicity annotations from whole exome sequencing data</span>
 </html> </html>
  
 ---- ----
  
-=== BASIC INFO === +=== DESCRIPTION ===
-EthSEQ is an R script that allows to infer ethnicity of a set of samples for which whole exome sequencing (WES) data is available from differential SNP genotypes profiles. It combines: +
-  - [[http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/|1,000 Genomes Project]] genotype data, used to generate reference models for specific WES platforms; +
-  - [[public:aseq|ASEQ]], used to genotype the input samples with unknow ethnicity, and +
-  - [[http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm|EIGENSTRAT]], used to perform principal component analysis on the aggregated genotyped data. \\+
  
-The package is organized as follows:\\ +Whole exome sequencing (WES) is widely utilized both in translational cancer genomics studies and in the setting of precision medicineStratification of individual’s ethnicity is fundamental for the correct interpretation of personal genomic variation impactWe implemented EthSEQ to provide reliable and rapid ethnicity annotation from whole exome sequencing individual’s dataEthSEQ can be integrated into any WES based processing pipeline and exploits multi-core capabilities.
-  EthSEQ +
-   -> EthSEQ.R  +
-   -> Functions.R  +
-   -> CreateReferenceModel.+
-   -> MultiStepRefinementAnalysis.+
-   -> Models +
-   -> VCF +
-   -> Include +
-   -> Example+
  
-  * EthSEQ.R is the R script required to infer ethnicity while Functions.R contains utility functions.\\ +EthSEQ requires genotype data at SNPs positions for a set of individuals with known ethnicity (the reference model) and either list of BAM files or genotype data (in VCF format) of individuals with unknown ethnicity. EthSEQ annotates the ethnicity of each individual using an automated procedure and returns detailed information about individual’s inferred ethnicity, including aggregated visual reports.
-  * CreateReferenceModel.R is the R script required to generate e new reference model given genotype data of ethnical groups and a set of specified captured regions.\\ +
-  * MultiStepRefinementAnalysis.R is the script R that implements  the multi-step refinement analysis.\\ +
-  * Models folder contains reference models for set WES platforms (Agilent Haloplex, Roche Nimblegen version 3, Agilent SureSelect version 2 and version 4 are currently available); models are built from 1,000 Genomes Project genotype data. **Models folder is empty; models should be downloaded separately from the bottom links and uncompressed in the Models folder.**\\ +
-  * The VCF folder contains the **lists of SNPs** (in VCF format) used to generate the reference models for the available WES platforms.\\ +
-  * The Include folder contains ASEQ binaries and EIGENSTRAT software folder with pre-compiled binaries available.\\ +
-  * The Example folder contains and example of configuration file, and example of BAM input list of individuals with unknown ethnicity and an example of output report.+
  
-----+=== INSTALLATION ===
  
-=== INFER THE ETHNICITY OF A SET OF INDIVIDUALS ===+You can either install EthSEQ v2 from github repository using devtools package or directly from CRAN repository.
  
-To infer the ethnicity of a set of individuals run the EthSEQ.R script in the following way: \\ +[[https://github.com/aromanel/EthSEQ|EthSEQ on github]]\\ 
-  Rscript EthSEQ.R <ConfigurationFile.R>+[[https://cran.r-project.org/web/packages/EthSEQ/index.html|EthSEQ on CRAN]]
  
-The configuration file has the following structure:\\ \\ +=== REFERENCE === 
-##########################################################################\\ +
-# Basic folders\\ +
-source.dir "EthSEQ path"\\ +
-bam.list "path to a text file containing the list of BAM files to be analyzed"\\ +
-out.dir "path to the output folder"\\ +
-eigenstrat.path "path to EIGENTRAT binaries folder"\\ +
-\\ +
-# Models available \\ +
-# SS2 Agilent Sure Select version 2\\ +
-# SS4 Agilent Sure Select version 4\\ +
-# HALO = Agilent Haloplex\\ +
-# NimblegenV3 = Roche Nimblegen V3\\ +
-model = "HALO"\\ +
-\\ +
-# To run the analysis with your own reference model uncomment the following line and specify the needed variables\\ +
-# model = "" # keep this empty\\ +
-# vcf.file = "path to VCF file"\\ +
-# sif.file = "path to file with ethnical annotations"\\ +
-# model.ped = "path to PED file with genotype information"\\ +
-# model.map = "path to MAP file with data of variant specified in the PED file" +
-\\  +
-\\ +
-# ASEQ parameters\\ +
-ASEQ.path = "path to ASEQ binary"\\ +
-mbq=20 # minimum base quality\\ +
-mrq=20 # minimum read quality\\ +
-mdc=20 # minimum depth of coverage\\ +
-cores=10 # number of cores to be used\\ +
-\\ +
-# analysis options\\  +
-run.genotype=TRUE\\ +
-reduce.composite.model = TRUE\\ +
-\\ +
-# output details\\ +
-verbose=FALSE\\ +
-##########################################################################+
  
-----+Alessandro Romanel, Tuo Zhang, Olivier Elemento, Francesca Demichelis.** EthSEQ: ethnicity annotation from whole exome sequencing data**. //Bioinformatics// 2017 btx165.
  
-=== MULTI-STEP REFINEMENT METHOD  ===+=== OLD VERSIONS ===
  
-To infer the ethnicity of a set of individuals using the multi-step refinement method run the MultiStepRefinementModel.R script in the following way\\ +[[https://demichelislab.unitn.it/doku.php?id=public:ethseq_v1|EthSEQ 1.0]]
-  Rscript MultiStepRefinementModel.R <ConfigurationFile.R> +
- +
-The configuration file extends the previous specification by adding the ethnic group sets specifications:\\ +
-##########################################################################\\ +
-# Basic folders\\ +
-subsets = list(c("AFR","SAS","EAS","EUR","ASH"),c("EUR","ASH"))\\ +
-########################################################################## +
- +
----- +
- +
-=== CREATE A REFERENCE MODEL  === +
- +
-To create a reference model run the CreateReferenceModel.R script in the following way: \\ +
-  Rscript CreateReferenceModel.R +
- +
-by specifying in the code the following variables:\\ +
-\\ +
-## Parameters +
-sif.file = "specify_path_to_file"\\ +
-vcf.file = "specify_path_to_file"\\ +
-phased = FALSE # TRUE if genotypes in VCF format are phased\\ +
-out.dir = "specify_path_to_dir"\\ +
-model.name = "specify_model_name"\\ +
-call.rate = 1 # fraction of samples with genotype calls for a specific SNP\\ +
-\\ +
-Sample information file should have the following format:\\ +
-//Sample\tRace\tGender\\ +
-S1\tEUR\tmale\\ +
-S2\tAFR\tfemale\\ +
-...\\ +
-VCF file should specify in the INFO column the MAF information (e.g. MAF=0.32) for all variants.\\ +
-Genotype should be specifed using phased notation (e.g. 0|0,1|0,0|1,1|1,...) or unphased notation (e.g. 0/0,0/1,1/1,...).\\ +
- +
----- +
- +
-=== REQUIREMENTS === +
-EthSEQ requires Linux kernel >= 2.6.15.\\ +
-EthSEQ requires R >= 2.7 and the package "rgeos".\\ +
-EthSEQ requires global folder names.\\ +
-EthSEQ requires ASEQ (version 1.1.11 available in Tools folder) available also [[public:aseq|here]]\\ +
-EthSEQ requires EIGENSTRAT (version 5.0.2 available in Tools folder) available also [[http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm|here]]\\ +
----- +
-=== COPYRIGHT  === +
-Code by Alessandro Romanel\\ +
-Laboratory of Computational Oncology (F. Demichelis)\\ +
-Centre for Integrative Biology, University of Trento, Italy\\ +
-email contacts: romanel@science.unitn.it; demichelis@science.unitn.it\\ +
- +
-EthSEQ is distributed under the MIT Licence. +
- +
----- +
-=== DOWNLOADS === +
-== Tool versions == +
-  * {{:EthSEQ_v1.0.zip|EthSEQ_v1.0.zip}} +
-== Reference Models == +
-  * {{:1000GP_HALO.zip|1,000 Genomes Project reference model for Agilent HaloPlex WES design}} +
-  * {{:1000GP_SS2.zip|1,000 Genomes Project reference model for Agilent SureSelectV2 WES design}} +
-  * {{:1000GP_SS4.zip|1,000 Genomes Project reference model for Agilent SureSelectV4 WES design}} +
-  * {{:1000GP_NimblegenV3.zip|1,000 Genomes Project reference model for Roche Nimblegen V3 WES design}}+