GMAP Microarray

Web supplement to
"A comprehensive platform for highly multiplexed mammalian functional genetic screens"

Troy Ketela, Lawrence E Heisler, Kevin R Brown, Ron Ammar, Dahlia Kasimer, Anuradha Surendra, Elke Ericsson, Kim Blakely, Dina Karamboulas, Andrew W Smith, Tanja Durbic, Anthony Arnoldo, Kahlin Cheung-Ong, Judice LY Koh, Shuba Gopal, Glenn S Cowley, Xiaoping Yang, Jennifer K Grenier, Guri Giaever, David E Root, Jason Moffat and Corey Nislow

BMC Genomics

GMAP Extraction and Analysis

Imaging of the GMAP chip through AGCC results in generation of the one cel file per chip. This binary file contains feature intensities organized by position on the chip. The notes here discuss how to extract appropriate information from the chip, and perform basic analysis of these intensities. In addition, we have created a number of utilities to assist in these operations.

Extraction of data from cel files generated from the GMAP chip can be performed with standard affymetrix tools using the library files posted on this site. A number of approaches are available

1. Use of the standard affymetrix Expression Console software to extract feature intensities and summarize probeset signal. This is available as a free software download directly through affymetrix (www. affymetrix.com). [Requires registration and login]

2. The affymetrix power tools are a set of command line executables provided by affymetrix, available on a variety of platforms, to allow extraction and manipulation of data stored in the cel files. A number of accessory files are needed, including

The APT collection is available through affymetrix. Below are some sample operations that can be used to extract information from these arrays.

3. We have developed a series of R scripts useful for analysis of the data on the GMAP chip. Starting with a single file containing probe signal intensities for a subset of the GMAP probes, the scripts can be used for application of normalization procedures, generation of quality control plots and summarization of replicate probe intensities. TODO : package these into a distributable R library.

4. We have developed a java application that extracts data from the Cel files using several resources as a backend. This includes the affymetrix power tools, R and Bioconductor libraries as well as R scripts we have created for analysis. Our own R code is embedded into the jar file. Other components must be downloaded and installed separately, then identified in the java application.

Library Files The library files are used to extract the intensities from the cel files and identify the probe at that feature location (and the subset of probes to which it belongs) as well as to indicate to which probeset it belongs for application of probeset summarization methods. There are two types of libraries, the cdf file or the pgf/clf combination. Additional infromation on probes can be found in the annotation files.  

Individual Probe files (generated from the pgf file using apt-pgf-extract)

 

hairpin features :

  • hairpin probes (22mers, replicated 3X on the array)
  • hairpin controls (22mers replicated 33X onthe array)
  • hairpin spike in controls (22mers, replicated 25X on the array),
  • all 22mer hairpin features, in a single file
  • the TMM hairpin set (17-25mer variants of the hairpin probes)

barcode features

  • barcodes (20 mers, replicated 3X on the array)

human ORF features

  • newly created probes, organized into probesets (25mers)
  • probes derived from the hugene 1.0 array, organized into probesets (25 mers)
  • all horf probes (25 mers)

yeast ORF features

  • organized as probesets, 2 probes per ORF (25mers)

background features :These are collection of GC-binned probes that can be used for assessment of background on the array, removal of background signal from size-matched probes and calculation of DABG statistics for size-matched probes. There are two distinct sets

 

 

 
Subset Annotations

hairpins : maps probe identifiers to gene against which the hairpin was designed and design considerations

orfeome : maps probes to the gene agains which the probe was designed, with information about the gene

 

 

 
R Scripts these can be sourced into R to provide access to functions for analysis of the data  
Java Application

This java application (windows based) can be used with the proper backend support to

  1. extract data from cel files and store in an annotated rawsummary file
  2. do downstream analysis on a previously generated rawsummary file
 
     

 

 

Multiplexed Barcode SequencingInquiries can be addressed to guri.giaever@utoronto.ca OR corey.nislow@utoronto.ca