MCW MCW

Tutorial

IPDR is designed to help researchers quickly design primers and probes for identifying specific subsets of influenza strains. This tutorial will guide you through the entire process of using IPDR so that you may get the most out of all the website has to offer.

Searching the Database

To start using the website you must search the database for the specific subset of sequences that you are looking for. It is possible to search the database using any combination of the following parameters: virus species, host, region/country, segment, H/N type, subtype, years, length, and name. For any parameter that is in a list format it is possible to select multiple values within the list by holding down CTRL. If you wish to deselect a value you can do this also by holding down CTRL. Leaving any parameter blank will automatically select all values for that parameter.

Search

H/N Type and Subtypes
These two parameters both search the subtype of the influenza virus but they do it in slightly different ways. The H/N Type parameter is a list of all possible H and N types. If you select just an H type you will get all of the sequences for that H type no matter what the N type. The same is true if you select only an N type. If you select both an H type and an N type you will get only the sequences for that subtype.

Example:

H typeN typeSubtype
H3N2H3N2

If you select multiple H types and multiple N types you will get every possible combination of the two types.

Example:

H typeN typeSubtype
H3, H5N1, N2H3N1, H3N2, H5N1, H5N2

If you want to select only a few specific subtypes, then it is better to type them into the Subtypes box. When using the Subtypes box you will only get the specific subtypes that you entered.

Years and Length
Both of these parameters allow you to enter a beginning and ending value. For both of them you can either enter both values or you can just enter one of the values. If only the beginning values is entered it acts as a minumum (year or length). If only the ending value is entered it acts as a maximum.

Name
This parameter is designed to search the name of the virus strain. It is only to be used if you know the name of a specific strain that you are looking for. You have to be careful when searching by name because abbreviations are commonly used for the host and city names that are usually found in the virus strain name (Ex: New York = NY).

The Search Results

After submitting your search you will be taken to the search results page. On this page there will be a list of the search parameters used, the number of results found, and a table that contains your search results with the following information about each sequence found: Accession, Subtype, Segment, Host, Country, Year, Length, and Strain. From the results you may choose to analyze all of the results or only some of the results by clicking on the checkbox next to each of the records you are interested in. By default, if none of the boxes are checked then all of the results will be used in any further analysis. You also have the option to download the results as seen in the table on the Search Results page with the sequence added in a csv formatted file or download just the sequences in FASTA format. If you wish to continue the analysis you can click on Setup Alignment.

SearchList

Setting up the Alignment Analysis

On the next page, you setup the parameters that you want to use for determining the consensus sequence for the alignment and the Primer3 analysis.

Consensus Sequence Parameters
For calculating the consensus sequence from the alignment there are three parameters that can be adjusted.

Con_para

The Percent Conservation is used to determine whether a specific nucleotide is conserved at each position in the alignment. For example, if the percent conservation was 90% and 95% of the sequences in the alignment had a 'G' at the position being analyzed then there would be a 'G' at that position in the consensus sequence. If instead none of the nucleotides were present in greater than 90% of the sequences at that position the consensus sequence would contain an 'N' meaning that position is unconserved. You may enter multiple percent conservations seperated by the '+' sign.

The Gap Cutoff is used to determine if a position in the alignment should be considered a gap in the alignment, which is represented by a '-'. Frequently when performing alignments there will be a small percentage of sequences (usually 1-5%) that contain insertions (real or artificial) relative to the majority of sequences in the alignment, which creates a gap in the alignment. The Gap Cutoff is used to prevent these insertions from becoming part of the consensus sequences. Additionally the cutoff can be used to eliminate other regions of the alignment that have low amounts of sequence information from the consensus sequence, like the extreme ends of the gene segments which are often incompletely sequenced. The way the Gap Cutoff works is that its value represents the minimum percentage of sequences required to have a nucleotide at a position in the alignment in order for that position to not be considered a gap in the consensus sequence. For example, if the Gap Cutoff was 20% (the default) and at the position being analyzed only 15% of the sequences had any nucleotide then that position would be considered a gap in the consensus sequence.

In order to understand the last consensus sequence parameter, it is necessary to first look at how the percent conservation is calculated at each position in the alignment when determining the consensus sequence. The following formula is used by default:

# of Sequences with Nucleotide
Total # of Sequences - # of Sequences with Gap

This formula eliminates the sequences without a nucleotide at the position in the alignment being analyzed (# of Sequences with Gap) because they artificially reduce the percent conservation of the nucleotides. Here is an example of why this is important. Lets say that we are looking for a 90% consensus sequence and at the position we are looking at only 80% of the sequences have a nucleotide and all 80% are a 'G'. Using the formula listed above the percent conservation at that position would be 100% 'G' and would be a 'G' in the consensus sequence. In most cases this would be what a person would expect and would want to happen. However, if the sequences without a nucleotide were not subtracted then the percent conservation would be 80% 'G' and there would be an 'N' in the consensus sequence. We call this the Absolute Percent Conservation and if you would like to you can have the program use the absolute value for determining the consensus sequence by checking the box next to Use Absolute Percent Conservation.

Primer3 Parameters
For each consensus sequence that will be generated a Primer3 analysis will also be performed. Primer3 is a program that is commonly used to find primers and probes and is useful because it allows you to specify a variety of parameters that will allow the program to select primers that are specific to your interests. You can get more information about Primer3's parameters at the program's website: http://frodo.wi.mit.edu/primer3/input-help.htm

Primer3_para

The IPDR Results Page

When IPDR is done with your analysis you will receive an email with a link to your results. On this page will be three different sets of results.

Sequence Conservation Results
The first set of results shows the sequence conservation as determined from the alignment, which may be downloaded by clicking on the Download Alignment link. In the results table you are provided with a variety of information. First of all, there is a consensus sequence for each of the percents that were entered. With each consensus sequence there is also a conserved sequence. The conserved sequence is determined by removing regions of the consensus sequence with many unconserved residues and replacing them with a '-'. It is designed to make it easier for you to find conserved regions in the consensus sequence. Next, you will find the most conserved sequence, which displays the most conserved residue at each position in the alignment. This is followed by a bar graph of the absolute percent conservation of the most conserved nucleotide at each position. Underneath the graph is the absolute percent conservation for each nucleotide. The final information in the table is the gap fraction of each position, which shows the percentage of sequences that do not have a nucleotide at each position in the alignment.

Con_result

Primer/Probe Database BLAST Results
For the second analysis, each of the consensus sequences is BLASTed against a database of influenza primers and probes that have been in publications. Included in this database are several primers and probes that we have used in our laboratory, which have Lab listed as their reference. This analysis is useful for identify primers or probes that have already been experimentally tested which you may use for the sequences you are interested in. These results are displayed in a graphical and table form. In the graphical form the consensus sequence is displayed with the matching primers aligned with the sequence. Each primer is represented by its name, a colored bar, and the sequence of the matching portion of the primer. The color of the bar represents the score of the BLAST hit with the darker the blue representing the higher the score.

Blast_pic

The summary table provides you with both published information about what the primer was originally used for and how it was designed and BLAST information. When looking at the sequence in the summary table you may notice that sometimes part of the sequence is in capital letters and the rest is in lower case. Since in some cases only part of the sequence will match in BLAST, the upper case part of the sequence corresponds to the BLAST hit. It is also important to understand that the values for the rest of the fields that fall under the "Published Primer Information" only represent information from the publication in which the primer was originally described and in many instance will not match with your specific set of sequences. For example, a primer designed for the NS gene segment may match a set of HA sequences. However, all of the "BLAST Information" does correspond directly with your consensus sequence.

Blast_table

Primer3 Results
The Primer3 results are displayed in a similar manner with both a graphical view and a summary table for each consensus sequence. In the graphical view each set of oligos is displayed aligned with the consensus sequence. In the summary table each oligo is listed with its start position, length, melting temperature, GC percent, self complementarity, 3' self complementarity, and sequence. Also for each set the product size, pair complementarity, and pair 3' complementarity are listed.

Primer3_pic

Primer3_table