BarSeq Analysis

Web supplement to
"BarSeq : Identification and Counting of Short Molecular Barcodes in HT-Seq data"





Binary usage

A single compiled binary is available. Successive execution of this program requires calling of the appropriate function and redirecting output to a file, which serves as input for the next step in the pipeline. Alternately, output from one call can be piped to a successive call without capturing the output. In general, the former is preferred to ensure that intermediate information is stored and available for review and troubleshooting.

4 functions can be called, listed in order
  1. barseq count: input is a qseq or fastq read file, output is a list of unique trimmed & truncated reads after quality filtering.
    Options:
    -i X:Y trim X bases on the 5' end and Y bases on the 3' end
    -u N truncates reads to N bases
    -f force input to be parsed as fastq file
    -q force input to be parsed as qseq file
    -o FILE additionally prints results to FILE
    -m S|X|I|J filters based on Sanger, Solex, Illumina 1.3+, Illumina 1.5+ (respectively)
    -c only unfiltered results will be reported in output

    eg. sh: barseq count -c -i1:1 -u18 het_test_qseq.txt > het_test_counts.txt
    Trims the first and last base of every entry, truncates it to 18 bases, and reports entries only unfiltered entries

  2. barseq cluster: input is a count file from barseq count. Clusters sequences based on mismatch scoring. Options:
    -s FILE calculates additional statistics and outputs them to FILE
    -a FILE stores alignment results in FILE
    -f forces entries to be clustered to, at most, one core otherwise, non-core sequences will split their counts between matching cores, weighted by core count
    -c N align score must meet cutoff N for a sequence clustering to a core
    -n N only attempts to align the top N candidates during alignment; lower numbers speeds up alignment, but may result in non-optimal alignments defaults to 0 to skip aligning
    -m N a minimum of N counts are required for a sequence to be considered as a core
    -o FILE additionally outputs results to FILE

    eg. sh: barseq cluster -f -n 10 -c 90 het_test_counts.txt > het_test_clusters.txt
    Clusters sequences to at most one core. Unmatched entries will then attempt to align to the top 10 candidate cores. Alignment will only succeed if align score is at least 90.

  3. barseq map: input is a cluster file from barseq cluster and a barcode file. This will map clusters to barcodes.
    Options:
    -c N clusters require a total count of at least N to be considered for alignment
    -n N only attempts to align the top N candidates during alignment; lower numbers speeds up alignment, but may result in non-optimal alignments
    -o FILE additionally outputs results to FILE
    -s skips aligning clusters

    eg. sh: barseq map -c 5 -n 10 het_test_clusters.txt HET.txt > het_test_map.txt
    Maps clusters with combined count of at least 5 to the top 10 candidate barcodes.

  4. barseq tabulate: input is a map file from barseq map and a barcode file. This will list the total counts contributing towards each barcode as well as a breakdown of where the counts come from (eg. mismatches, alignments).
    Options:
    -c N align score needs to be at least N to match a barcode, defaults to 70

    eg. sh: barseq tabulate het_test_map.txt HET.txt > het_test_tabulate.txt
    Tabulates all the counts for each barcode in HET.txt from het_test_map.txt.


Bucket EvaluationsInquiries can be addressed to guri.giaever@utoronto.ca OR corey.nislow@utoronto.ca