BarSeq Analysis

Web supplement to
"BarSeq : Identification and Counting of Short Molecular Barcodes in HT-Seq data"





BarSeq Analysis

BarSeq : Identification and Counting of Short Molecular Barcodes in HT-Seq data

This process involves the following analysis steps

  1. Filtering of reads based on basecall quality scores
  2. Clustering of reads, based on the predominance of a sequence over single base mismatches of the sequence within the dataset
  3. Counting of reads associated with each cluster
  4. Matching of cluster core sequences to barcodes expected in the sample. The barcode sequences must be provided in a sequence orientation that matches the sequences generated.
  5. Identification of unmatched clusters by sequence alignment to barcodes expected in the sample

The final output of this process is a list of barcodes expected in the sample and counts of frequency within the datafile. These counts are annotated to indicate which proportion of the total counts have been assigned through approaches beyond a direct identification of the cluster core sequence.

In addition to the final output, intermediate files are generated which can be reviewed to identify sequences not assigned. These include

  1. A list of clusters generated from the data, the counts and the sequences associated with the cluster
  2. A list of clusters with the assigned barcode, if any, and details on the assignment
Bucket EvaluationsInquiries can be addressed to guri.giaever@utoronto.ca OR corey.nislow@utoronto.ca