Image analysis

File names
should have the following format: `{username}_{screentype}_{query}_{arrayplateid}...`.
• `username` is the name of the person who carried out the experiment,
• `screentype` is the type of screen carried, where `wt` or `ctrl` denotes a control screen, and any other value represents a case screen (double mutant, mutant / drug combination, ...)
• `query` is the name of the query used in the experiment
• `arrayplateid`a value indicating the array plate id, used to map the file to its plate layout file. In a typical SGA yeast screen, this value ranges from 1-14
Note: it is not required that your file names follow this format. However, in order to map the query/array coordinates to their ORF name and group control/case experiments by array plate id for scoring, correct file naming must be used.
Due to security restrictions on web applications, all files must be selected in a single dialog. We recommend moving the relevant images to one directory for easy selection.
Image analysis requires that your plate to have defined plate edges. Plates must also have at least one colony in each row and column for proper segmentation. If both these requirements are met, and segmented images still look weird, please contact the developers with your image.

Below are examples of the two known modes of failure for segmentation. We are working on improving SGAtools to handle these cases as well.
• Missing rows/columns
SGAtools was developed primarily for pinned colonies. Depending on your plate, it might work. However, we recommend analyzing your spotted colony images with a separate image analysis software and normalizing/scoring with SGAtools
Yes. Images must first be uploaded to the web server, which can take some time depending on the size of your images. Image analysis will also take some time processing each image.

Normalization & scoring

File names should have the same format as images, which
Each file in your directory will has a 9 columns tab-delimited format:
You can also find this information in the normalization & scoring tab of the help section or the `README` file provided in the output.
SGAtools needs to know what genes are in the different positions on different plates to include the gene names in output. To achieve this, you need to 1. have the file names in the correct format and 2. specify the plate layout in the Normalization / Scoring page
The content of the plates can be specified using the "Plate layout" input in the normalization / scoring page. You can select from a list of commonly used arrays that SGAtools knows about, or upload your own. Read more about how it is done here
To score strains, SGAtools needs to compare a control plate to a case (double mutant / drug / ...) one. The information about which plates correspond to cases or controls is taken from the file name. Thus, to score the plates, the filenames must adhere to a format (see here for the specification). If SGAtools cannot parse the filenames, the scoring option is disabled.
We recommend the default value of 200 kb. For standard reference strain crosses, over 1 crossover event on average takes place in this interval, thus linkage should not skew the results. It can still be useful to manually examine the scores within the linkage region to observe unexpected deviations.
The scores measure the difference between expected and observed fitness of the strain. A negative score indicates that the treatment (extra mutation, drug, or other) induces a more severe fitness defect than expected. Conversely, a positive scores corresponds to an unexpectedly large fitness increase. See Baryshnikova et al (2010) for a full description of the underlying methodology

The larger the magnitude of the score, the larger the effect. For reference, scores of about -0.3 can be reliably spotted by eye, and a cutoff of -0.18 was used in a large scale SGA screen. Examples of case-control pairs for different magnitudes are given below:

The p-values give a one-tailed probability of observing the scores in the replicates if the true effect size is zero. This should not be interpreted as the effect strength. The best practice is to not trust the score if the p-value is not very low (as this would indicatelarge variability between replicates). Low p-value by itself is not an indicator of biological significance, as it can be obtained from minute changes in colony sizes that happen to be very concordant.

Data analysis

SGAtools can use g:Profiler to perform GO enrichment analysis if it knows what genes are on the plate. This information can be provided in the Normalization and Scoring step to enable this. Check the "Define plate layout" box, select the array used in the screen or upload your own, and rerun the analysis to enable GO enrichment.

General

Simply share the link in your address bar after each step of analysis. For example, after image analysis, you would share a link which looks something like
`http://www.sgatools.ccbr.utoronto.ca/imageanalysis/48f3b5f6-5f0e-494f-b528-e48ca57d06ae`

Note: data is only saved on our server for 30 days. After that time period, results will no longer be available
SGAtools relies on new web technologies for layout and visualization, as well as data pre-processing and validation. Some of the required features are only available in the modern browser. We have tested SGAtools extensively on Google Chrome Version 26.0.1410.43 and Mozilla Firefox Version 19.0.2. Please update your browser to at least one of the recommended versions to use SGAtools.
SGAtools normalization and scoring was developed in R. You can download R from here. The SGAtools code can be found in `SGAtools.R` and has sufficient commenting to get you started.
There are several software suites available for quantifying colony sizes from images, and performing analyses on them. SGAtools combines the commonly used steps into a single web-based workflow, and implements the applicable normalization methods from Baryshnikova et. al (2010). For further reading and alternative methods, follow the links below:

There are three main parts to the SGAtools pipeline. Image analysis, Normalization & scoring, and Data analysis & visualization. There are two possible start points in the pipeline shown as A, starting out with plate images and B, starting out with quantified colony sizes generated by an alternate image analysis software. It is highly recommended that input files at either start point are named using a certain convention in order to maximize functionality

The tabs on the right contain further details on each of the steps of the SGAtools pipeline.

For analyzing the data from your plates, SGAtools needs additional information about their contents. First, to score the plates after image analysis, the "case" plates (double mutant, drug treatment, etc) are compared against the control, or wild type plates, and have to be labeled as such. Second, to filter out genes linked to the query strain, the query gene name is needed. Finally, if an entire array of plates is screened, the number of the plate in the array has to be provided to link the right case and control plate, as well as to determine which genes are in each plate. extra information can be given in the filename, which should have the format:

• Screen type ― control vs case
• Query name / ORF
• Array plate id

This extra information can be given in the filename, which should have the format:

`{username}_{screentype}_{query}_{arrayplateid} …`

such that the screen type name is prefixed by `wt` or `ctrl` to indicate a control plate. Anything else will define the plate as a case

You can select which array was used in the screen on the normalization / scoring page.

`michael_double-mutant_YDL108W_1_boone_15-12-12.jpg`

Indicates a `case` screen carried out by `michael` with the screen having the query `YDL108W` and an array plate id of `1`

`charlie_ctrl_YOR341W_9_sga_3-10-11.jpg`

Indicates a `control` screen carried out by `charlie` with the screen having the query `YOR341W` and an array plate id of `9`

Note:
• It is not mandatory to name the files in the mentioned format. Files named with a format other than the one mentioned above will not me mapped a query/array ORF and will not be scored
• The first part of the file name before plate type definition does not need to be the user name, but can be any string that does not contain underscores.

Image analysis is the first step of the SGAtools pipeline and involves fitting a grid onto your plate images, then using the bounds of the grid to quantify the colony size using pixel intensities

Input

1. Plate images

Images of your screen in `.jpg` format. Images are expected to be an approximate resolution of 160 dots per inch (dpi) and should either be cropped to the size of the experimental plate (i.e. must have plate edges) or have a black background outside of the plate. Note: the higher resolution the image is, the more accurate the quantified colony size is.

2. Plate format

SGAtools supports 4 core screen formats

Screen format Number of rows Number of columns Example
1536 32 48
768 -diagonal replicates 16 24
384 16 24
96 8 12

3. Crop option

• Automatically choose method (recommended): The program will automatically choose between the other options.
• Always autodetect plate edges: The program will locate the edges of the plate by finding the edge of the black background surrounding the plate.
• Images are already cropped to plate edges: The program will assume that the images have already been cropped to the edges of the agar plate.

3. Noise removal

Flag indicating if noise/speckles should be removed from the thresholded image prior to analysis.

4. Autorotate

Flag indicating if image should be auto-rotated prior to processing. Only select this option if image is extremely rotated. gitter is able to handle small variations in rotations (1-2 degrees) without auto-rotating.

5. Inverse

Flag indicating if input image is inverted, meaning colonies are darker compared to their background.

Type Example image
Bright colonies on darker background.
Dark colonies on lighter background

Output

After your images have been processed, you will be directed to a page containing a summary of the analysis. This page allows you to:

The resulting files will be space-delimited and will have the following columns
1. Row
2. Column
3. Quantified colony size
4. Circularity
5. Median
```# This is a comment line and is ignored
# row	column	size-1	circ-1	MedInt-1
1 1 277 0.925 228
1 2 171	0.944 229
1 3 127 0.929 234
1 4 156 0.959 238
...
```

Review the gridding applied to your images
The gridded image will be displayed. Hover over the output image for a zoom in on specific colonies to ensure correct gridding

Proceed to normalization & scoring
To proceed to normalization, select the desired plates you would like normalized and/or scored and click normalize and score. This will direct you to the normalization & scoring page with your analyzed image data preloaded

Input

1. Plate files

The main input to normalization & scoring is the quantified colony sizes in a space-delimited file with 3 or more columns as follows:

1. Row
2. Column
3. Quantified colony size
```# This is a commented line and is ignored
# rows	columns	size	circularity:
1	1	2205	0.981750
1	2	1734	1.065585
1	3	1996	1.057621
1	4	1704	1.032656
1	5	1755	1.109302
...
```
Note: In the example above, the circularity column and any column that follows it are ignored

2. Plate layout file

This file allows a mapping from row and columns to a gene name. If an array layout file is not selected the array column in results will contain numbers, such that each number represents a group of replicates.
There are two ways of selecting an array layout file:

Selecting a predefined plate layout
You can select between commonly used arrays from the drop down menu. The number of the plate detected from the file name (e.g. plate 4 for `user_ctrl_YOR101W_4_130301.jpg`) is matched to the corresponding plate in the array, and the output is populated with the names of the strains in the plate.
Select the "Upload custom" option for plate definition, and upload your file. The format for the plate definition file is three tab-delimited columns:
1. Column
2. Row
3. Gene name
A name is assigned to each replicate group. Below an example shows the head section of an array layout file along with a graphical representation.
```# This is a comment line and is ignored
# c	r	Gene
1	1	HIS3
1	2	HIS3
1	3	HIS3
2	1	HIS3
2	2	RKM3
2	3	YPK3
3	1	HIS3
3	2	TPS1
3	3	GAL1
...
```

3. Replicates

This is the number of replicates in the experiment.
A value of `4` indicates replicas are in quadruples
A value of `1` indicates there is only one replicate

This step ignores interactions between genes within a specific proximity to one another on the same chromosome from the analysis as it is considered an artifact. The proximity is provided in kilobases (KB) as the linkage cutoff

5. Score results

If this option is selected, a score will be computed for non-control plates with a corresponding control plate of the same array plate id. There are two ways SGAtools can score screens:
1. $\small \inline \dpi{300} C_{ij} - C_{i}C_{j}$
2. $\small \inline \dpi{300} C_{ij} / C_{i}C_{j}$
Such that Cij represents the fitness of the double mutant, Ci the single mutant fitness of the query, Cj the single mutant fitness of the array

Output

After your data has been normalized and/or scored, you will be directed to a page containing a summary of the analysis. You will be able to:

These files will be tab-delimited with 9 columns as follows:
1. Row: the row of the colony
2. Column: the column of the colony
3. Raw colony size: the size of the colony as quantified by the image analysis software
4. Plate id: unique id for this plate, set as file name
5. Query gene name/ORF: Name of the query ORF if image/dat files follow conventional file naming (see file naming in help). If they do not, a value of '1' is placed as the query ORF
6. Array gene name/ORF: Name of the array ORF if plate layout file supplied. If not, a unique value is assigned to each group of replicate arrays
7. Normalized colony size: the raw colony size after normalization. The size is relative to plate median colony size, and a proxy for fitness. Normalized value of 1 is as fit as the average strain, 1.3 means it is 30% fitter than the average strain, and 0.4 that it's 40% as fit as the average strain.
8. Score: the colony fitness score computed using the normalized colony size (7) and the corresponding normalized colony size in the control screen
9. Additional information as key-value pairs
```# This is a comment line and is ignored
# row	col	size	plateid	query	array	norm	score		kvp
3	6	196	file-name	Y8835	YBR138C	523.39	-0.008	NA
3	7	173	file-name	Y8835	YBR028C	448.31	-0.0975	NA
3	8	205	file-name	Y8835	YBR028C	526.61	0.0557	NA
3	9	181	file-name	Y8835	YBR137W	NA	NA	status=JK
3	10	198	file-name	Y8835	YBR137W	520.88	0.0647	NA
3	11	186	file-name	Y8835	YBR027C	489.08	-0.044	NA
3	12	191	file-name	Y8835	YBR027C	501.44	-0.020	NA
3	13	172	file-name	Y8835	YBR134W	NA	NA	status=JK
3	14	204	file-name	Y8835	YBR134W	534.83	0.0286	NA
...
```

Additional information returned from SGAtools includes status codes for colonies that did not meet a filter. The codes and their corresponding descriptions are listed below:

Status code Description
SD Standard deviation of scores (in the combined file)
PV P-value gives a measure of reproducibility of the effect across the replicate colonies (in the combined file)
LK Linkage correction: The array exists too close to the query on the chromosome
JK Jackknife filter: This colony induces too much variance in the sizes of other colonies in the replicate group
BG Big replicates: At least three colonies of this replicate are too large. The whole replicate is excluded
CP Cap: Normalized colony size was too large (> 1000) and was capped at 1000

To proceed to data analysis, click data analysis. This will direct you to the data analysis page

This is the final step of the SGAtools pipeline and involves visualizing your processed data. There are two visualizations available:

1. Heatmap: If you analyzed your images with SGAtools, your plate will appear adjacent to a heatmap of your processed data. Hover over the heatmap for details of colonies

2. Histogram: An interactive histogram of the data is shown. Click and drag the histogram to create a window of data you would like to inspect

There are 3 types of processed data you can visualize:

1. Raw colony sizes: The colony sizes as quantified by the image analysis software
2. Normalized colony sizes: The colony sizes normalized by SGAtools
3. Score: Colony score against control plate (this is only available if the plate is not a control and sufficient data was provided to score)

Select a specific plate from the drop down and the type of data you would like to visualize. You can also adjust the range of values to be displayed.