GMAP Extraction and Analysis
Imaging of the GMAP chip through AGCC results in generation of the one cel file per chip. This binary file contains feature intensities organized by position on the chip. The notes here discuss how to extract appropriate information from the chip, and perform basic analysis of these intensities. In addition, we have created a number of utilities to assist in these operations.
Extraction of data from cel files generated from the GMAP chip can be performed with standard affymetrix tools using the library files posted on this site. A number of approaches are available
1. Use of the standard affymetrix Expression Console software to extract feature intensities and summarize probeset signal. This is available as a free software download directly through affymetrix (www. affymetrix.com). [Requires registration and login]
- Required library file (cdf) : GMAP-Uts520601.cdf
- Standard analysis protocols are centered on implementation of probeset algorithms. Features designated as part of a probeset will be summarized using the algorithm selected in the software. This includes the horfeome, yorfeome and hugene probesets. Other features are not indicated as probesets. For these, individual feature intensities will be extracted. The expression console allows modifications to the analysis protocols to allow alternate analysis. We tend to work with the affymetrix power tool collection to work with the files in these ways.
2. The affymetrix power tools are a set of command line executables provided by affymetrix, available on a variety of platforms, to allow extraction and manipulation of data stored in the cel files. A number of accessory files are needed, including
- either the cdf file OR the pgf/clf file
- probe files to extract subsets of probes from the chip
- bg probe files, if background corrections are required
The APT collection is available through affymetrix. Below are some sample operations that can be used to extract information from these arrays.
- convert cel files to text (human-readable) format
- extract all feature intensities from a set of chips into a single file
- extract feature intensites from a set of chips for all the 22mer hairpin probes
- extract feature intensities from a set of chips, for the hugene probes, applying a background correction based on the 25mer gc bin probes.
3. We have developed a series of R scripts useful for analysis of the data on the GMAP chip. Starting with a single file containing probe signal intensities for a subset of the GMAP probes, the scripts can be used for application of normalization procedures, generation of quality control plots and summarization of replicate probe intensities. TODO : package these into a distributable R library.
4. We have developed a java application that extracts data from the Cel files using several resources as a backend. This includes the affymetrix power tools, R and Bioconductor libraries as well as R scripts we have created for analysis. Our own R code is embedded into the jar file. Other components must be downloaded and installed separately, then identified in the java application.
Library Files | The library files are used to extract the intensities from the cel files and identify the probe at that feature location (and the subset of probes to which it belongs) as well as to indicate to which probeset it belongs for application of probeset summarization methods. There are two types of libraries, the cdf file or the pgf/clf combination. Additional infromation on probes can be found in the annotation files. | |
Individual Probe files (generated from the pgf file using apt-pgf-extract)
|
hairpin features :
barcode features
human ORF features
yeast ORF features
background features :These are collection of GC-binned probes that can be used for assessment of background on the array, removal of background signal from size-matched probes and calculation of DABG statistics for size-matched probes. There are two distinct sets
|
|
Subset Annotations | hairpins : maps probe identifiers to gene against which the hairpin was designed and design considerations orfeome : maps probes to the gene agains which the probe was designed, with information about the gene
|
|
R Scripts | these can be sourced into R to provide access to functions for analysis of the data | |
Java Application | This java application (windows based) can be used with the proper backend support to
|
|