Chemical Probe Prioritization Methods

Web supplement to
"Compound prioritization methods increase rates of chemical probe discovery in model organisms"

Iain M Wallace, Malene L Urbanus, Genna M Luciani, Andrew R Burns, Mitchell Han, Hao Wang, Kriti Arora, Lawrence E Heisler, Michael Proctor, Robert P St. Onge, Terry Roemer, Peter J Roy, Carolyn L Cummins, Gary D Bader, Corey Nislow, Guri Giaever

Submitted for Publication

Home | Link to paper

Supplementary Datasets

Dataset	Title	Files	Description
Dataset S1	Novacore compounds	NovaCore_compounds.sdf	This file contains all of the compounds from the NovaCore SAR library that were screened at 200uM in yeast. Compounds that were scored as active were marked with the property active set to 1. Inactive compounds have the same property set to 0. There were 49218 compounds in total, of which 4,965 were active.
Dataset S2	Chemdiv compounds	ChemDiv_Compounds.sdf	This file contains all of the compounds from the ChemDiv Diverse library that were screened at 200 uM in yeast. Compounds that were scored as active were marked with the property active set to 1. Inactive compounds have the same property set to 0. There were 27,680 compounds in total, of which 2,120 were active
Dataset S3	Novacore Diverse compounds	NovaCore_Diverse_compounds.sdf	This file contains all of the compounds from the NovaCore Diverset library that were screened at 200 uM in yeast. Compounds that were scored as active were marked with the property active set to 1. Inactive compounds have the same property set to 0. There were 4,422 compounds in total, of which 391 were active.
Dataset S4	Spectrum Compounds	Spectrum_Compounds.sdf	This file contains all of the compounds from the Spectrum library that were screened at 50 uM in yeast. Compounds that were scored as active were marked with the property active set to 1. Inactive compounds have the same property set to 0. There were 1,998 compounds in total, of which 68 were active.
Dataset S5	S. pombe compounds	screened_pombe_compounds.sdf	This file contains all of the compounds that were screened S. pombe. Compounds that were scored as active in S. pombe were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 3,707 compounds were screened and 2,776 were active
Dataset S6	B. subtilis compounds	screened_bsub_compounds.sdf	This file contains all of the compounds that were screened B. subtilis. Compounds that were scored as active in B. subtilis were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 4,255 compounds were screened and 1,697 were active
Dataset S7	E. coli compounds	screened_ecoli_compounds.sdf	This file contains all of the compounds that were screened E. coli. Compounds that were scored as active in E. coli were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 4,852 compounds were screened and 247 were active
Dataset S8	Mammalian compounds	screened_mammalian_compounds.sdf	This file contains all of the compounds that were screened Human lung cancer cell line (A549). Compounds are defined to be active if the cell has a viability of less than 50% in the presence of the compound. In total 167 compounds were screened, and 116 were found to be active.
Dataset S9	C. elegans compounds	screened_celegans_compounds.sdf	This file contains all of the compounds that were screened in a C. elegans phenotype assay. Compounds that caused a gross phenotype were scored as active in C. elegans were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 5,899 compounds were screened and 809 were active
Dataset S10	C. albicans compounds	screened_calbicans.sdf	This file contains all of the compounds that were screened C. albicans. Compounds that were scored as active in C. albicans were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 835 compounds were screened and 130 were active
Dataset S11	C. neoformans compounds	screened_neoformans.sdf	This file contains all of the compounds that were screened C. neoformans. Compounds that were scored as active in C. neoformans were marked with the property active set to 1. Inactive compounds have the same property set to 0. In total 804 compounds were screened and 373 were active
Dataset S12	NIH Molecular Library Screening Network Compounds	MLSCR_bioavailability_filter_model.ta	This is a tab delimited file containing the entire Molecular Libraries screening collection, downloaded from PubChem. The final column is how the compound scored with Naive Bayes model for growth inhibition. The higher the score, the more likely the compound will inhibit yeast growth. A score of 1 in the second last column indicates the compound passes the bioavailability filter.
Dataset S13	Zinc Purchasable compounds	all_purchasable_compounds_zinc_6_4_10.tab	This is a tab delimited file containing all purchasable compounds as defined by the Zinc database. It contains a column indicating how many of the Lipinski parameters the molecule pass, aswell as a column indicating if it passes the 2-property filter and a score from the yeast model. As the yeast model was built on drug-like compounds that pass all 4 of Lipinski's criteria, these are the compounds that model is more likely to predict correctly.
Dataset S14	Raw and analyzed microarray data for the 20 compounds screened by HIP profiling	HIP_ratios.xlsx Celfile_Archive.zip Celfiles_index.xlsx Chip_Spot_Tag4.txt	All raw data is available as Affymetrix cel files (version 4), containing coordinates and intensity values, in the file Celfiles_Archive.zip. The information needed to translate cel file information for tag4 arrays into orf_tag information can be found in the Chip_Spot_Tag4.txt file. The following tag types are present:affy expression repaired repaired-bad tag3 tag3-bad. For HIP profiling spots of type tag3 and repaired are used. The cel file descriptions (cel file name, compound, concentration value and units) can be found in the Celfile_index.xlsx. Compound sensitivity for all strains, including homozygous deletion strains, to each compound is available in the file HIP_ratios.xlsx. Log2 ratios were calculated as described in the Supplemental Methods.
Dataset S15	Validation of screening Library compounds	Mass-spec_Dalton_Pharma.pdf Chembridge Library data.zip Chembridge_reorder_LC_MS.pdf	30 reordered compounds and 30 additional compounds, chosen at random from four hitplates, were analyzed using liquid chromatography and mass spectrometry (LC-MS) to verify if the compound of interest was present. The data and methods are available in the file Mass-spec_Dalton_Pharma.pdf and Supplementary Methods. LC-MS data for the 60 compounds from the library and the 30 reordered compounds supplied by Chembridge at time of purchase are available in the following files, Chembridge Library data.zip and Chembridge_reorder_LC_MS.pdf

ACCESS, an automated platform for chemogenomic screeningInquiries can be addressed to guri.giaever@utoronto.ca OR corey.nislow@utoronto.ca