This is an old revision of the document!


EthSEQ version 1.0


BASIC USAGE

EthSEQ is an R script that allows to infer ethnicity of a set of samples for which whole exome sequencing (WES) data is available from differential SNP genotypes profiles. It combines:

  1. HapMap data, used to generate reference models for specific WES platforms;
  2. ASEQ, used to genotype the input samples, and
  3. EIGENSTRAT, used to perform principal component analysis on the aggregated genotyped data.


EthSEQ script has the following syntax:

EthSEQ.R <ConfigurationFile.R>

Folders are organized as follows:

EthSEQ
 -> EthSEQ.R 
 -> Functions.R 
 -> Models
 -> VCF
 -> Include
 -> Example

EthSEQ.R is the R script required to infer ethnicity with Functions.R file containing utility functions.
The Models folder contains HapMap models for specific WES platforms (Haloplex, SureSelect version2 and version 4 are currently available).
The VCF folder contains the lists of SNPs (in VCF format) used to generate the HapMap models for the different WES platforms.
The Include folder contains ASEQ binaries and EIGENSTRAT software folder with pre-compiled binaries available. The Example folder contains and example of configuration file

The configuration file has the following structure:

##########################################################################
## Basic folders
source.dir = “EthSEQ path”
bam.list = “path to a text file containing the list of BAM files to be analyzed”
out.dir = “path to the output folder”
eigenstrat.path = “path to EIGENTRAT binaries folder”

## Models available
## SS2 = Sure Select version 2
## SS4 = Sure Select version 4
## HALO = Haloplex
model = “HALO”

## ASEQ parameters
ASEQ.path = “path to ASEQ binary”
mbq=20 # minimum base quality
mrq=20 # minimum read quality
mdc=20 # minimum depth of coverage
cores=10 # number of cores to be used

## output details
verbose=F
##########################################################################


REQUIREMENTS

EthSEQ requires Linux kernel >= 2.6.15.
EthSEQ requires R >= 2.7 and the package SDMTools.
EthSEQ requires global folder names.
EthSEQ requires ASEQ (version 1.1.8 available in Tools folder) available also here
EthSEQ requires EIGENSTRAT (version 5.0.2 available in Tools folder) available also here


Code by Alessandro Romanel
Laboratory of Computational Oncology (F. Demichelis)
Centre for Integrative Biology, University of Trento, Italy
email contacts: romanel@science.unitn.it; demichelis@science.unitn.it

EthSEQ is distributed under the MIT Licence.


DOWNLOADS