Warning Your browser does not support the latest version of HTML for you to be able to use SGAtools. Please update your browser.

For SGAtools, We recommend the latest versions of Chrome (download here) or Firefox(download here).

Help / Documentation

Image analysis

File names
should have the following format: {username}_{screentype}_{query}_{arrayplateid}....
  • username is the name of the person who carried out the experiment,
  • screentype is the type of screen carried, where wt or ctrl denotes a control screen, and any other value represents a case screen (double mutant, mutant / drug combination, ...)
  • query is the name of the query used in the experiment
  • arrayplateida value indicating the array plate id, used to map the file to its plate layout file. In a typical SGA yeast screen, this value ranges from 1-14
Note: it is not required that your file names follow this format. However, in order to map the query/array coordinates to their ORF name and group control/case experiments by array plate id for scoring, correct file naming must be used.
Due to security restrictions on web applications, all files must be selected in a single dialog. We recommend moving the relevant images to one directory for easy selection.
Image analysis requires that your plate to have defined plate edges. Plates must also have at least one colony in each row and column for proper segmentation. If both these requirements are met, and segmented images still look weird, please contact the developers with your image.

Below are examples of the two known modes of failure for segmentation. We are working on improving SGAtools to handle these cases as well.
  • No defined plate edges
  • Missing rows/columns
SGAtools was developed primarily for pinned colonies. Depending on your plate, it might work. However, we recommend analyzing your spotted colony images with a separate image analysis software and normalizing/scoring with SGAtools
Yes. Images must first be uploaded to the web server, which can take some time depending on the size of your images. Image analysis will also take some time processing each image.

Normalization & scoring

File names should have the same format as images, which
Each file in your directory will has a 9 columns tab-delimited format:
    You can also find this information in the normalization & scoring tab of the help section or the README file provided in the output.
    SGAtools needs to know what genes are in the different positions on different plates to include the gene names in output. To achieve this, you need to 1. have the file names in the correct format and 2. specify the plate layout in the Normalization / Scoring page
    The content of the plates can be specified using the "Plate layout" input in the normalization / scoring page. You can select from a list of commonly used arrays that SGAtools knows about, or upload your own. Read more about how it is done here
    To score strains, SGAtools needs to compare a control plate to a case (double mutant / drug / ...) one. The information about which plates correspond to cases or controls is taken from the file name. Thus, to score the plates, the filenames must adhere to a format (see here for the specification). If SGAtools cannot parse the filenames, the scoring option is disabled.
    We recommend the default value of 200 kb. For standard reference strain crosses, over 1 crossover event on average takes place in this interval, thus linkage should not skew the results. It can still be useful to manually examine the scores within the linkage region to observe unexpected deviations.
    The scores measure the difference between expected and observed fitness of the strain. A negative score indicates that the treatment (extra mutation, drug, or other) induces a more severe fitness defect than expected. Conversely, a positive scores corresponds to an unexpectedly large fitness increase. See Baryshnikova et al (2010) for a full description of the underlying methodology

    The larger the magnitude of the score, the larger the effect. For reference, scores of about -0.3 can be reliably spotted by eye, and a cutoff of -0.18 was used in a large scale SGA screen. Examples of case-control pairs for different magnitudes are given below:

    The p-values give a one-tailed probability of observing the scores in the replicates if the true effect size is zero. This should not be interpreted as the effect strength. The best practice is to not trust the score if the p-value is not very low (as this would indicatelarge variability between replicates). Low p-value by itself is not an indicator of biological significance, as it can be obtained from minute changes in colony sizes that happen to be very concordant.

    Data analysis

    SGAtools can use g:Profiler to perform GO enrichment analysis if it knows what genes are on the plate. This information can be provided in the Normalization and Scoring step to enable this. Check the "Define plate layout" box, select the array used in the screen or upload your own, and rerun the analysis to enable GO enrichment.


    Simply share the link in your address bar after each step of analysis. For example, after image analysis, you would share a link which looks something like

    Note: data is only saved on our server for 30 days. After that time period, results will no longer be available
    SGAtools relies on new web technologies for layout and visualization, as well as data pre-processing and validation. Some of the required features are only available in the modern browser. We have tested SGAtools extensively on Google Chrome Version 26.0.1410.43 and Mozilla Firefox Version 19.0.2. Please update your browser to at least one of the recommended versions to use SGAtools.
    SGAtools normalization and scoring was developed in R. You can download R from here. The SGAtools code can be found in SGAtools.R and has sufficient commenting to get you started.

    There are three main parts to the SGAtools pipeline. Image analysis, Normalization & scoring, and Data analysis & visualization. There are two possible start points in the pipeline shown as A, starting out with plate images and B, starting out with quantified colony sizes generated by an alternate image analysis software. It is highly recommended that input files at either start point are named using a certain convention in order to maximize functionality

    The tabs on the right contain further details on each of the steps of the SGAtools pipeline.

    SGAtools workflow

    For analyzing the data from your plates, SGAtools needs additional information about their contents. First, to score the plates after image analysis, the "case" plates (double mutant, drug treatment, etc) are compared against the control, or wild type plates, and have to be labeled as such. Second, to filter out genes linked to the query strain, the query gene name is needed. Finally, if an entire array of plates is screened, the number of the plate in the array has to be provided to link the right case and control plate, as well as to determine which genes are in each plate. extra information can be given in the filename, which should have the format:

    • Screen type ― control vs case
    • Query name / ORF
    • Array plate id

    This extra information can be given in the filename, which should have the format:

    {username}_{screentype}_{query}_{arrayplateid} …

    such that the screen type name is prefixed by wt or ctrl to indicate a control plate. Anything else will define the plate as a case

    You can select which array was used in the screen on the normalization / scoring page.


    Indicates a case screen carried out by michael with the screen having the query YDL108W and an array plate id of 1


    Indicates a control screen carried out by charlie with the screen having the query YOR341W and an array plate id of 9

    • It is not mandatory to name the files in the mentioned format. Files named with a format other than the one mentioned above will not me mapped a query/array ORF and will not be scored
    • The first part of the file name before plate type definition does not need to be the user name, but can be any string that does not contain underscores.

    Image analysis is the first step of the SGAtools pipeline and involves fitting a grid onto your plate images, then using the bounds of the grid to quantify the colony size using pixel intensities


    1. Plate images

    Images of your screen in .jpg format. Images are expected to be an approximate resolution of 160 dots per inch (dpi) and should either be cropped to the size of the experimental plate (i.e. must have plate edges) or have a black background outside of the plate. Note: the higher resolution the image is, the more accurate the quantified colony size is.

    2. Plate format

    SGAtools supports 4 core screen formats

    Screen format Number of rows Number of columns Example
    1536 32 48
    768 -diagonal replicates 16 24
    384 16 24
    96 8 12

    3. Crop option

    • Automatically choose method (recommended): The program will automatically choose between the other options.
    • Always autodetect plate edges: The program will locate the edges of the plate by finding the edge of the black background surrounding the plate.
    • Images are already cropped to plate edges: The program will assume that the images have already been cropped to the edges of the agar plate.

    3. Noise removal

    Flag indicating if noise/speckles should be removed from the thresholded image prior to analysis.

    4. Autorotate

    Flag indicating if image should be auto-rotated prior to processing. Only select this option if image is extremely rotated. gitter is able to handle small variations in rotations (1-2 degrees) without auto-rotating.

    5. Inverse

    Flag indicating if input image is inverted, meaning colonies are darker compared to their background.

    Type Example image
    Bright colonies on darker background.
    Dark colonies on lighter background


    After your images have been processed, you will be directed to a page containing a summary of the analysis. This page allows you to:

    Download the quantified colony sizes
    The resulting files will be space-delimited and will have the following columns
    1. Row
    2. Column
    3. Quantified colony size
    4. Circularity
    5. Median
    # This is a comment line and is ignored	
    # row	column	size-1	circ-1	MedInt-1
    1 1 277 0.925 228
    1 2 171	0.944 229
    1 3 127 0.929 234
    1 4 156 0.959 238

    Review the gridding applied to your images
    The gridded image will be displayed. Hover over the output image for a zoom in on specific colonies to ensure correct gridding

    Proceed to normalization & scoring
    To proceed to normalization, select the desired plates you would like normalized and/or scored and click normalize and score. This will direct you to the normalization & scoring page with your analyzed image data preloaded


    1. Plate files

    The main input to normalization & scoring is the quantified colony sizes in a space-delimited file with 3 or more columns as follows:

    1. Row
    2. Column
    3. Quantified colony size
    # This is a commented line and is ignored
    # rows	columns	size	circularity: 
    1	1	2205	0.981750  
    1	2	1734	1.065585  
    1	3	1996	1.057621  
    1	4	1704	1.032656  
    1	5	1755	1.109302  
    Note: In the example above, the circularity column and any column that follows it are ignored

    2. Plate layout file

    This file allows a mapping from row and columns to a gene name. If an array layout file is not selected the array column in results will contain numbers, such that each number represents a group of replicates.
    There are two ways of selecting an array layout file:

    Selecting a predefined plate layout
    You can select between commonly used arrays from the drop down menu. The number of the plate detected from the file name (e.g. plate 4 for user_ctrl_YOR101W_4_130301.jpg) is matched to the corresponding plate in the array, and the output is populated with the names of the strains in the plate.
    Uploading a custom plate layout file
    Select the "Upload custom" option for plate definition, and upload your file. The format for the plate definition file is three tab-delimited columns:
    1. Column
    2. Row
    3. Gene name
    A name is assigned to each replicate group. Below an example shows the head section of an array layout file along with a graphical representation.
    # This is a comment line and is ignored		    			
    # c	r	Gene
    1	1	HIS3
    1	2	HIS3
    1	3	HIS3
    2	1	HIS3
    2	2	RKM3
    2	3	YPK3
    3	1	HIS3
    3	2	TPS1
    3	3	GAL1

    3. Replicates

    This is the number of replicates in the experiment.
    A value of 4 indicates replicas are in quadruples
    A value of 1 indicates there is only one replicate

    4. Linkage correction

    This step ignores interactions between genes within a specific proximity to one another on the same chromosome from the analysis as it is considered an artifact. The proximity is provided in kilobases (KB) as the linkage cutoff

    5. Score results

    If this option is selected, a score will be computed for non-control plates with a corresponding control plate of the same array plate id. There are two ways SGAtools can score screens:
    Such that Cij represents the fitness of the double mutant, Ci the single mutant fitness of the query, Cj the single mutant fitness of the array


    After your data has been normalized and/or scored, you will be directed to a page containing a summary of the analysis. You will be able to:

    Download your processed data files
    These files will be tab-delimited with 9 columns as follows:
    1. Row: the row of the colony
    2. Column: the column of the colony
    3. Raw colony size: the size of the colony as quantified by the image analysis software
    4. Plate id: unique id for this plate, set as file name
    5. Query gene name/ORF: Name of the query ORF if image/dat files follow conventional file naming (see file naming in help). If they do not, a value of '1' is placed as the query ORF
    6. Array gene name/ORF: Name of the array ORF if plate layout file supplied. If not, a unique value is assigned to each group of replicate arrays
    7. Normalized colony size: the raw colony size after normalization. The size is relative to plate median colony size, and a proxy for fitness. Normalized value of 1 is as fit as the average strain, 1.3 means it is 30% fitter than the average strain, and 0.4 that it's 40% as fit as the average strain.
    8. Score: the colony fitness score computed using the normalized colony size (7) and the corresponding normalized colony size in the control screen
    9. Additional information as key-value pairs
    # This is a comment line and is ignored
    # row	col	size	plateid	query	array	norm	score		kvp
    3	6	196	file-name	Y8835	YBR138C	523.39	-0.008	NA
    3	7	173	file-name	Y8835	YBR028C	448.31	-0.0975	NA
    3	8	205	file-name	Y8835	YBR028C	526.61	0.0557	NA
    3	9	181	file-name	Y8835	YBR137W	NA	NA	status=JK
    3	10	198	file-name	Y8835	YBR137W	520.88	0.0647	NA
    3	11	186	file-name	Y8835	YBR027C	489.08	-0.044	NA
    3	12	191	file-name	Y8835	YBR027C	501.44	-0.020	NA
    3	13	172	file-name	Y8835	YBR134W	NA	NA	status=JK
    3	14	204	file-name	Y8835	YBR134W	534.83	0.0286	NA

    Additional information returned from SGAtools includes status codes for colonies that did not meet a filter. The codes and their corresponding descriptions are listed below:

    Status code Description
    SD Standard deviation of scores (in the combined file)
    PV P-value gives a measure of reproducibility of the effect across the replicate colonies (in the combined file)
    LK Linkage correction: The array exists too close to the query on the chromosome
    JK Jackknife filter: This colony induces too much variance in the sizes of other colonies in the replicate group
    BG Big replicates: At least three colonies of this replicate are too large. The whole replicate is excluded
    CP Cap: Normalized colony size was too large (> 1000) and was capped at 1000

    Visualize your results
    To proceed to data analysis, click data analysis. This will direct you to the data analysis page

    This is the final step of the SGAtools pipeline and involves visualizing your processed data. There are two visualizations available:

    1. Heatmap: If you analyzed your images with SGAtools, your plate will appear adjacent to a heatmap of your processed data. Hover over the heatmap for details of colonies

    2. Histogram: An interactive histogram of the data is shown. Click and drag the histogram to create a window of data you would like to inspect

    There are 3 types of processed data you can visualize:

    1. Raw colony sizes: The colony sizes as quantified by the image analysis software
    2. Normalized colony sizes: The colony sizes normalized by SGAtools
    3. Score: Colony score against control plate (this is only available if the plate is not a control and sufficient data was provided to score)

    Select a specific plate from the drop down and the type of data you would like to visualize. You can also adjust the range of values to be displayed.