Sequence analysis of Genome-Wide Human SNP Array 6.0 markers


Duplicated regions, important for NAHR-mediated CNV formation, are represented on common oligi-nucleotode platforms. We observed that such regions require special attention due to possible cross-hybridization problems. To many locations in the human genome can be difficult to interpret, since a detectable variation in probe log2 intensity ratio may reflect copy number variation at one or several indistinguishable loci. Additionally, if a significant proportion of a probe's raw signal comes from cross-hybridization to (off-target sequences). As a quality control step prior to analyzing genomic data generated using the Genome-Wide Human SNP Array 6.0 platform, we flagged CN and SNP probes that align to multiple regions of the genome or that align to four or more locations with a single base pair mismatch (NCBI36/hg18). SNP genotype reproducibility was also tested.

Evaluation of Marker Sensitivity Affymetrix SNP 6.0 marker sensitivity as measured by the regression coefficient obtained from linear regression of marker intensity ratio against copy number calls for a subset of CNVs reported in McCarroll et al. (1) across 270 HapMap samples. 202 CNVs are included for a total of 5,955 probes/regression.The x-axis bins represent the total number of off-target perfect and single base pair mismatch hits after alignment. Marker sensitivity decreases as the number of off-target hits increases (markers with 50 or more off-target hits exhibit on average 15% of the response sensitivity of probes with no off-target hits). The inset shows the linear dependence of intensity ratio on underlying copy number for 13 markers overlapping Variant_38811 (chr1:12,768,450-12,805,683). The median intensity ratio is plotted across all copy number classes on a marker basis for a set of HapMap individuals. The color gradient represents the number of off-target perfect match or 1 bp mismatch hits that were detected by aligning each marker sequence to the whole genome.



Oldridge DA et al, Optimizing Copy Number Variation Analysis using Genome-Wide Short Sequence Oligonucleotide Arrays, Nucleic Acids Res. 2010 Jun;38(10):3275-86.