Chemical Probe Prioritization Methods

Web supplement to
"Compound prioritization methods increase rates of chemical probe discovery in model organisms"

Iain M Wallace, Malene L Urbanus, Genna M Luciani, Andrew R Burns, Mitchell Han, Hao Wang, Kriti Arora, Lawrence E Heisler, Michael Proctor, Robert P St. Onge, Terry Roemer, Peter J Roy, Carolyn L Cummins, Gary D Bader, Corey Nislow, Guri Giaever

Submitted for Publication


Supplementary Datasets

Dataset Title Files Description
Dataset S1 Novacore compounds NovaCore_compounds.sdf This file contains all of the compounds from the NovaCore SAR library that were screened at 200uM in yeast. Compounds that were scored as active were marked with the property active set to 1. Inactive compounds have the same property set to 0. There were 49218 compounds in total, of which 4,965 were active.
Dataset S2 Chemdiv compounds

ChemDiv_Compounds.sdf

 

This file contains all of the compounds from the ChemDiv Diverse library that were screened at 200 uM in yeast. Compounds that were scored as active were marked with the property active set to 1. Inactive compounds have the same property set to 0. There were 27,680 compounds in total, of which 2,120 were active
Dataset S3 Novacore Diverse compounds NovaCore_Diverse_compounds.sdf This file contains all of the compounds from the NovaCore Diverset library that were screened at 200 uM in yeast. Compounds that were scored as active were marked with the property active set to 1. Inactive compounds have the same property set to 0. There were 4,422 compounds in total, of which 391 were active.
Dataset S4 Spectrum Compounds Spectrum_Compounds.sdf This file contains all of the compounds from the Spectrum library that were screened at 50 uM in yeast. Compounds that were scored as active were marked with the property active set to 1. Inactive compounds have the same property set to 0. There were 1,998 compounds in total, of which 68 were active.
Dataset S5 S. pombe compounds screened_pombe_compounds.sdf This file contains all of the compounds that were screened S. pombe. Compounds that were scored as active in S. pombe were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 3,707 compounds were screened and 2,776 were active
Dataset S6 B. subtilis compounds screened_bsub_compounds.sdf This file contains all of the compounds that were screened B. subtilis. Compounds that were scored as active in B. subtilis were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 4,255 compounds were screened and 1,697 were active
Dataset S7 E. coli compounds screened_ecoli_compounds.sdf This file contains all of the compounds that were screened E. coli. Compounds that were scored as active in E. coli were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 4,852 compounds were screened and 247 were active
Dataset S8 Mammalian compounds screened_mammalian_compounds.sdf This file contains all of the compounds that were screened Human lung cancer cell line (A549). Compounds are defined to be active if the cell has a viability of less than 50% in the presence of the compound. In total 167 compounds were screened, and 116 were found to be active.
Dataset S9 C. elegans compounds screened_celegans_compounds.sdf This file contains all of the compounds that were screened in a C. elegans phenotype assay. Compounds that caused a gross phenotype were scored as active in C. elegans were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 5,899 compounds were screened and 809 were active
Dataset S10 C. albicans compounds screened_calbicans.sdf This file contains all of the compounds that were screened C. albicans. Compounds that were scored as active in C. albicans were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 835 compounds were screened and 130 were active
Dataset S11 C. neoformans compounds screened_neoformans.sdf This file contains all of the compounds that were screened C. neoformans. Compounds that were scored as active in C. neoformans were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 804 compounds were screened and 373 were active
Dataset S12 NIH Molecular Library Screening Network Compounds MLSCR_bioavailability_filter_model.ta This is a tab delimited file containing the entire Molecular Libraries screening collection, downloaded from PubChem. The final column is how the compound scored with Naive Bayes model for growth inhibition. The higher the score, the more likely the compound will inhibit yeast growth. A score of 1 in the second last column indicates the compound passes the bioavailability filter.
Dataset S13 Zinc Purchasable compounds all_purchasable_compounds_zinc_6_4_10.tab This is a tab delimited file containing all purchasable compounds as defined by the Zinc database. It contains a column indicating how many of the Lipinski parameters the molecule pass, aswell as a column indicating if it passes the 2-property filter and a score from the yeast model. As the yeast model was built on drug-like compounds that pass all 4 of Lipinski's criteria, these are the compounds that model is more likely to predict correctly.
Dataset S14 Raw and analyzed microarray data for the 20 compounds screened by HIP profiling

HIP_ratios.xlsx
Celfile_Archive.zip
Celfiles_index.xlsx
Chip_Spot_Tag4.txt

All raw data is available as Affymetrix cel files (version 4), containing coordinates and intensity values, in the file Celfiles_Archive.zip. The information needed to translate cel file information for tag4 arrays into orf_tag information can be found in the Chip_Spot_Tag4.txt file. The following tag types are present:affy expression repaired repaired-bad tag3 tag3-bad. For HIP profiling spots of type tag3 and repaired are used.

The cel file descriptions (cel file name, compound, concentration value and units) can be found in the Celfile_index.xlsx.

Compound sensitivity for all strains, including homozygous deletion strains, to each compound is available in the file HIP_ratios.xlsx. Log2 ratios were calculated as described in the Supplemental Methods.

Dataset S15 Validation of screening Library compounds

Mass-spec_Dalton_Pharma.pdf
Chembridge Library data.zip
Chembridge_reorder_LC_MS.pdf


30 reordered compounds and 30 additional compounds, chosen at random from four hitplates, were analyzed using liquid chromatography and mass spectrometry (LC-MS) to verify if the compound of interest was present. The data and methods are available in the file Mass-spec_Dalton_Pharma.pdf and Supplementary Methods.

LC-MS data for the 60 compounds from the library and the 30 reordered compounds supplied by Chembridge at time of purchase are available in the following files, Chembridge Library data.zip and Chembridge_reorder_LC_MS.pdf

 

 

ACCESS, an automated platform for chemogenomic screeningInquiries can be addressed to guri.giaever@utoronto.ca OR corey.nislow@utoronto.ca