UniproUGENE UserManual
UniproUGENE UserManual
UniproUGENE UserManual
Version 1.16.0
About Unipro
Established in 1992 Unipro company has its headquarters located in Novosibirsk Akademgorodok (the home of Siberian Branch of Russian
Academy of Sciences). The companys primary activity is IT outsourcing solutions. To learn more about the company, please, visit the compa
ny website.
About UGENE
Unipro UGENE is a free cross-platform genome analysis suite. It is distributed under the terms of the GNU General Public License.
To learn more about UGENE visit UGENE website.
It works on Windows, Mac OS X or Linux and requires only a few clicks to install.
Key Features
User Interface
High Performance Computing
Cooperation
Key Features
Creating, editing and annotating nucleic acid and protein sequences
Search through online databases: NCBI, ENSEMBL, PDB, SWISS-PROT, UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, UniProt(D
AS), Ensembl Human Genes (DAS)
Multiple sequence alignment: ClustalW, ClustalO, MUSCLE, Kalign, MAFFT, T-Coffee
Online and local BLAST and BLAST+ search
Restriction analysis with integrated REBASE restriction enzyme database
Integrated Primer3 package for PCR primers design
Search for direct, inverted and tandem repeats in DNA sequences
Constructing dotplots for nucleic acid sequences
Search for transcription factor binding sites (TFBS) with weight matrix and SITECON algorithms
Aligning short reads with Bowtie, Bowtie 2, BWA, BWA-SW and UGENE Genome Aligner
Contig assembly with CAP3
Search for ORFs
Cloning in silico
3D structure viewer for files in PDB and MMDB formats, anaglyph view support
Protein secondary structure prediction with GOR IV and PSIPRED algorithms
HMMER2 and HMMER3 packages integration
Building (using integrated PHYLIP and MrBayes packages) and viewing phylogenetic trees
Local sequence alignment with optimized Smith-Waterman algorithm
Combining various algorithms into custom workflows with UGENE Workflow Designer
Search for a pattern of various algorithms' results in a nucleic acid sequence with UGENE Query Designer
Visualization of next generation sequencing data (BAM files) using UGENE Assembly Browser
PCR in silico
Spade de novo assembler
User Interface
Visual and interactive genome browsing including circular plasmid view
Multiple alignment editor
Chromatograms visualization
3D viewer for files in PDB and MMDB formats with anaglyph stereo mode support
Phylogenetic tree viewer
Easy to use Workflow Designer for custom computational workflows
Easy to use Query Designer for analyze a nucleotide sequence using different algorithms at the same time
Assembly Browser for visualize and efficiently browsing large next generation sequence assemblies
Cooperation
Can be used for education purposes in schools and universities
Features to be included into the next release are initiated by users
UGENE team is ready for collaboration in related projects, both free and commercial
System Requirements
The system requirements for UGENE are these:
Operating system (32 or 64 bit):
Windows XP, Windows Vista, Windows 7, Windows 8
Using a zip package it is possible to use UGENE without administrative rights on Windows
Linux
Ubuntu 12.04 or later
Fedora 19 or later
If you have another Linux system, you may use a universal binary package
RAM:
512 Mb RAM required
2 Gb RAM recommended
Disk space:
Minimum required disk space depends on the UGENE package
Standard package: 200-300 Mb
Full package: 500-900 Mb
NGS package: 21-24 Gb
Display:
It is recommended to set the screen resolution to a value greater than 1280x720.
Internet:
Internet connection is required for some tasks like loading data from online databases.
UGENE takes care to use capabilities of your system: the more RAM and cores you have, the more quickly you'll get results of
your calculations.
Also, if you have an OpenCL-capable video card, you can use GPU-optimized versions of the following tools:
Smith-Waterman Search
UGENE Genome Aligner
UGENE Packages
Besides selecting an appropriate package for your operating system (Windows, Mac OS X, or Linux; 32 or 64 bit), you should take into
account the following considerations.
Should I download standard, full, or NGS package?
In most cases the full package is the best choice. Exceptions are:
Use the standard package, if:
10
If you have administrative rights on Windows, use the installer package. It will make integration with your Windows system more tight. For
example, it will add associations for bioinformatics formats supported by UGENE, so that corresponding files are opened in it by default.
I have Linux. Which package should I use?
If your Linux is not Ubuntu or Fedora, then universal binary package is the only choice. Otherwise, for more tight with the systems, you can
install UGENE from corresponding repositories, following these guides:
Native installation on Ubuntu
Native installation on Fedora
Please note that the repositories may be updated a little later the official UGENE release date.
Installation on Windows
To install UGENE on Windows:
Download UGENE Windows installation package:
Launch the downloaded *.exe le and follow the Unipro Setup wizard:
11
Be sure that you launch the installer with an administrative Windows account. If you have a problem with installation, try to do the
following: right-click on the installer .exe le and select Run as administrator item.
Alternatively, to use UGENE without installing:
Download UGENE zip package:
Unpack it.
Launch the ugeneui.exe le.
Installation on Mac OS X
Download the Mac OS X Disk image le using the appropriate link on the download page:
12
Launch the *.dmg le and accept the GNU license agreement. The following window will appear:
To start UGENE click on the ugeneui icon. You can also copy UGENE to the Applications folder by dragging it.
Installation on Linux
Download the appropriate version of the installation package (32-bit or 64-bit). The downloaded le has *.tar.gz extension:
13
./ugene -ui
./ugene
Several native packages for specic Linux distributions are also available. UGENE is a part of Ubuntu and Fedora Linux
distributions. See the next chapter.
Now, as a one-off, you should tell your system to pull down the latest list of software from ugene archive it knows about, including
the PPA:
14
15
Basic Functions
UGENE Terminology
UGENE Window Components
Welcome Page
Project View
Task View
Log View
Notifications
Main Menu Overview
Creating New Project
Creating Document
Opening Document
Opening for the First Time
Advanced Dialog Options
Opening Document Present in Project
Opening Several Documents
Opening Containing Folder
Exporting Documents
Locked Documents
Using Objects and Object Views
Exporting Objects
Exporting Sequences to Sequence Format
Exporting Sequences as Alignment
Exporting Alignment to Sequence Format
Exporting Nucleic Alignment to Amino Translation
Export Sequences Associated with Annotation
Using Bookmarks
Exporting Project
Options Panel
Adding and Removing Plugins
Searching NCBI Genbank
Fetching Data from Remote Database
UGENE Application Settings
General
Resources
Network
File Format
Directories
Logging
Alignment Color Scheme
External Tools Settings
Genome Aligner
Workflow Designer Settings
OpenCL
UGENE Terminology
Project
Storage for a set of data files and visualization options.
Document
A single file (can be stored on a local hard drive or be a remote web page). Each document contains a set of objects.
Object
A minimal and complete model of biological data. For example: a single sequence, a set of annotations, a multiple sequence
alignment.
Task
A process, usually asynchronous, that works in background. For example: some computations, loading and writing files.
Plugin
A dynamically loaded module that adds new functionality to UGENE.
Object View
A graphical view for a single or a set of objects.
Project View
A visual component used to manage active project.
16
17
Welcome Page
The Welcome Page is the first page that will appear when UGENE has been launched.
From the Welcome Page you can open files, create sequence, create workflow, open the Quick Start Guide and open recent files directly.
To return to the Welcome Page go to the Window->Start Page main menu item.
Project View
The Project View shows documents and bookmarks of the current project. The documents are files added to the project. And the bookmarks
are visual view states of the documents. Read Using Bookmarks to learn more about bookmarks.
To show/hide the Project View, click the Project button in the main UGENE window:
18
You can also use the Alt+1 hotkey to show/hide the Project View.
To create a new project, refer to Creating New Project. Note that if you have no project created when opening file with a sequence, an
alignment or any other biological data, a new anonymous project is created automatically.
Task View
The Task View shows active tasks, for example, algorithms computations.
To show/hide the Task View, click the Tasks button in the main UGENE window:
Log View
The Log View shows the program log information.
To show/hide the Log View click the Log button in the main UGENE window:
19
Notifications
The Notifications component shows notifications for tasks reports.
If a task has finished without errors, the notification is blue. If an error has occured during the task execution, the notification is red. If a
warning has occured during the task execution, the notification is yellow.
To open a task report, click on the corresponding notification. See an example of a task report below:
To remove a notification from the Notifications popup window, click the notification cross button.
Note that you can click on the clip button of the Notifications popup window to show the window always on top.
20
Description
File
Actions
Settings
Tools
Window
Help
The menus can be dynamically populated with new actions added by plugins. Check the Plugins documentation to learn how each plugin
affects global and context menus.
Here you need to specify the visual name for the project and the directory and file to store it.
After you click the Create button the Project View window is opened.
Creating Document
To create a new sequence file from text, select the File New document from text main menu item:
21
You can input the created sequence to the Paste data here field:
The following Custom settings are available:
Alphabet here you can select the alphabet:
22
Opening Document
UGENE stores information about documents you are working with in a project. Once a document has been opened, the information about it
is saved in the current project.
Opening for the First Time
Advanced Dialog Options
Opening Document Present in Project
Opening Several Documents
23
Here you can choose how to interpret the data stored in the file. The format is detected automatically, but you can select it manually.
24
Exporting Documents
If a document has a format that supports writing in UGENE (see the Supported File Formats chapter), you can export the document to a new
document in a required format.
To do it use the Export document item in the context menu:
Here you may select the name of the output file in the Save to file field and, optionally, choose the format of the output file in the File format fi
eld. Use the Compress file checkbox to compress the file. The Add to project checkbox, checked by default, adds the output file to the
current project. After choosing all parameters click the Export button.
Locked Documents
The lock icon in the document element indicates that the document cant be modified:
UGENE does not allow modification of some formats that were created not by UGENE.
If UGENE is able only to read a document (see the Supported File Formats chapter), you can export the document objects to a file. To do it
use the built-in export utilities.
Also, you can export the document objects of unlocked documents.
25
Below is the list of object types supported by the current version of UGENE.
Object types:
Symbol
Icon
Description
[3d]
A 3D model.
[a]
[as]
An assembly.
[c]
Chromatogram data.
[i]
[m]
[s]
[t]
A plain text.
[tr]
A phylogenetic tree.
You can edit names of particular objects, such as sequence objects, by selecting them in the Project View and then pressing F2. To be able
to do so, the document containing the target object must be unlocked.
To see the list of all available views for a given object select the object and activate the context menu inside the Project View window and
select the Open view submenu:
26
The picture above illustrates an option to visualize the selected DNA sequence object using the Sequence View a complex and extensible
Object View that focuses on visualization of sequence objects in combination with different kinds of related data: sequence annotations,
graphs, chromatograms, sequence analysis algorithms. Note, that the Sequence View is described in more details in the separate document
ation section.
Exporting Objects
The document objects can be exported into a new document. For more details see the following chapters:
Exporting Sequences to Sequence Format
Exporting Sequences as Alignment
Exporting Alignment to Sequence Format
Exporting Nucleic Alignment to Amino Translation
Export Sequences Associated with Annotation
27
Here you can select the location of the result file and a sequence file format. You can choose to add newly created document to the current
project and use custom sequence name. To do it check the corresponding checkboxes.
Use the Conversion options to choose a strand for saving sequence(s). Also you can translate sequence(s) to amino alphabet.
Also it is possible to specify whether to merge the exported sequences into a single sequence or store them as separate sequences. If you
merge the sequences, youre allowed to select the gap symbols between sequences. This is the length of the insertion region between
sequences that contain N symbols for nucleic or X for protein sequences.
Export sequence with annotations
28
To export sequence with annotations choose Genbank or GFF format. The Export with annotations checkbox will ba available. Check the
checkbox and sequence will be exported with annotations .
The Export Sequences as Alignment dialog will appear where you can point the result alignment file location, to select a multiple alignment
file format, to use Genbank SOURCE tags as a name of sequences for Genbank sequences and optionally add the created document to
the current project:
29
Here it is possible to specify the result file location, to select a sequence file format, to define whether to keep or remove gaps ( chars) in
the aligned sequences and optionally add the created document to the current project.
30
Here it is possible to specify the result file location, to select a file format and an amino translation, to export whole alignment or selected
rows and optionally add the created document to the current project.
31
Here you can select the location of the result file and a sequence file format. You can choose to add newly created document to the current
project and use custom sequence name. To do it check the corresponding checkboxes.
Use the Conversion options to choose a strand for saving sequence(s). Also you can translate sequence(s) to amino alphabet.
Also it is possible to specify whether to merge the exported sequences into a single sequence or store them as separate sequences. If you
merge the sequences, youre allowed to select the gap symbols between sequences. This is the length of the insertion region between
sequences that contain N symbols for nucleic or X for protein sequences.
Export sequence with annotations
To export sequence with annotations choose Genbank or GFF format. The Export with annotations checkbox will ba available. Check the
checkbox and sequence will be exported with annotations .
Using Bookmarks
One of the most important features supported by most Object Views is an ability to save and restore visual view state. Saving and restoring
visual state of an Object View enables rapid switching between different data regions and is similar to bookmarks used in Web browsers.
Initially an Object View is created as transient. It means that its state is not saved. To save current state of a view select an item with the view
name in the Bookmarks part of the Project View windows and select the Add bookmark item in the context menu:
32
For every persistent view UGENE automatically saves the state of the view in the Auto saved bookmark when the view is closed.
Now, by activating bookmarks you can restore the original view state. For example for the Sequence View bookmarks you can store a visual
position and zoom scale for the sequence region.
Use the F2 keyboard shortcut to rename a bookmark. To remove a bookmark press the Delete key.
UGENE has limited set of built-in Object Views. Extensions modules or plugins can be used to adjust the existing views or to add new views
to the tool.
Exporting Project
All the opened documents and bookmarks (along with the corresponding views states) can be saved within a project file. To do so, select File
Export Project. It will invoke the Export project dialog, where you can select the destination folder and the project file name.
33
To load a saved project later, select File Open and specify the path to the project file.
Options Panel
The Options Panel is available in the Sequence View and in the Assembly Browser. By default, it is closed. To open a tab of the Options
Panel click on the corresponding icon at the right side of a Sequence View or Assembly Browser window. To close the tab click again on the
tab icon.
More detailed information about different Options Panel tabs can be found in the following chapters:
Options Panel in Sequence View
Information about Sequence
Search in Sequence
Highlighting Annotations
Options Panel in Assembly Browser
Navigation in Assembly Browser
Assembly Browser Settings
Assembly Statistic
34
When you select the Remove plugin item for a plugin, the plugins status is changed to the to remove after restart value. The Remove plugin i
s no more available in the context menu of the plugin. Instead the Enable plugin item appears in the context menu:
If you select this item the plugin will be enabled again, i.e. it will not be removed after restart. Otherwise, the plugin will not be available after
UGENE restart.
35
To search data in the nucleotide or protein databases enter a general text query to the search field, select the database and click on the Sear
ch button. You can use a protein name, gene name, or gene symbol directly. Searching with a submitter or author name in the following
format will produce the best results.
Use the boolean operator AND to find records that contain every one of your search terms, the intersection of search results.
Use the boolean operator OR to find records that include one of several search terms, the union of search results.
Use the boolean operator NOT to exclude records matching a search term.
To limit results use the Result limit field.
After you click the Search button, UGENE searches the biological objects and shows it in the Results field. You can download the object(s).
Select one or several objects (for selecting several objects use the Ctrl button) and click the Download button. The dialog will appear:
After you click the OK button, UGENE downloads the biological objects and adds it to the current project.
36
Here you need to enter unique id of the biological object and choose a database. The following databases are available: NCBI Genbank
(DNA sequence), NCBI protein sequence database, ENSEMBL, PDB, SWISS-PROT, UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, UniProt
(DAS), Ensembl Human Genes (DAS). Unique identifiers are different for various databases. For example, for NCBI GenBank such unique id
could be Accession Number or NCBI GI number. If you select the UniProt (DAS) or Ensembl Human Genes (DAS) database you can select
the DAS features. For example:
Optionally, you can browse for a directory to save the fetched file to.
After you click the OK button, UGENE downloads the biological object (DNA sequence, protein sequence, 3d model, etc.) and adds it to the
current project.
If something goes wrong check the Log View, it will help you to diagnose the problem.
37
General
38
Resources
On the Resources tab you can set resources that can be used by the application: Optimize for CPU count, Tasks memory limit and Threads
limit.
Network
39
On the Network settings tab of the dialog you can specify Proxy server parameters, select SSL settings and configure the Remote request
timeout.
Preferred Web browser you can use either System default browser or specify some other browser.
File Format
40
The Sequence Annotations settings allows to use upper/lower case annotations during the file reading process.
Format options:
1. Dont use case annotations (default mode) usual sequence reading and writing.
2. Use lower case annotation sequences are read and annotations with names lower_case are added. When these sequences are
written to file then the case becomes like original the file case (the case is saved).
3. Use upper case annotation there is a similar behavior but with upper_case annotations.
Directories
41
Logging
42
On the Logging tab you can select type of log information (ERROR, INFO, DETAILS, TRACE) for each Category that will be output to the Lo
g View.
You can select format for each log message by checking the Show date, Show log level and Show log category options.
On the Alignment Color Scheme tab you can create, change and delete custom color schemes.
43
Genome Aligner
Use this tab to configure the Genome Aligner settings:
44
OpenCL
If you have a video card that supports OpenCL you can use it to speed up some calculations in UGENE.
To do it install the latest video card driver and check the corresponding check box:
Now you can, for example, use OpenCL optimization for the Smith-Waterman algorithm.
45
Sequence View
Sequence View Components
Global Actions
Sequence Toolbar
Sequence Overview
Sequence Zoom View
Managing Zoom View Rows
Sequence Details View
Information about Sequence
Manipulating Sequence
Going To Position
Toggling Views
Capturing Screenshot
Zooming Sequence
Creating New Ruler
Selecting Amino Translation
Showing and Hiding Translations
Selecting Sequence
Copying Sequence
Search in Sequence
Load Patterns from File
Search Algorithm
Search in
Other Settings
Annotations Settings
Editing Sequence
Exporting Selected Sequence Region
Exporting Sequence of Selected Annotations
Locking and Synchronize Ranges of Several Sequences
Multiple Sequence Opening
Annotations Editor
Automatic Annotations Highlighting
The "db_xref" Qualifier
Manipulating Annotations
Creating Annotation
Selecting Annotations
Editing Annotation
Highlighting Annotations
Annotations Color
Annotations Visability
Show on Translation
Captions on Annotations
Creating and Editing Qualifier
Adding Column for Qualifier
Copying Qualifier Text
Finding Qualifier
Deleting Annotations and Qualifiers
Importing Annotations from CSV
Exporting Annotations
46
After the view is opened you can see a set of new buttons in the toolbar area. The actions provided by these buttons are available for all
sequences opened in the view. In the picture below these buttons are pointed by the Global actions arrow.
Below the toolbar there is an area for a single or several sequences. For each sequence a smaller toolbar with actions for the sequence and
the following areas are available:
47
You can change the focus by clicking on the corresponding sequence area. All sequences that are not in focus have the sequence name and
icon disabled.
The bottom area of the Sequence View is the Annotations editor. It contains a tree-like structure of all annotations available for all sequences
shown in the Sequence View and can be used to perform various actions on annotations: create a new annotation, modify the existing one,
group, sort, etc.
Global Actions
The global action toolbar provides possibility to go to the specified position (in all sequences at the same time).
Also it allows to lock or adjust ranges of sequences in the same Sequence View. See this paragraph for details.
Sequence Toolbar
A brief description of the sequence toolbar buttons is shown on the picture below:
See also:
48
Sequence Overview
The Sequence overview is an area of the Sequence View below the sequence toolbar. It shows the sequence in whole and provides handy
navigation in the Sequence zoom view and the Sequence details view.
When the sigma button (in the right part of the Sequence overview) is pressed, density of annotations in the sequence is shown. For example
in the picture below there are annotations in the parts of the sequence that are marked with dark grey color:
See also:
Sequence Zoom View
Sequence Details View
Below the annotation rows there is a ruler to show coordinates in the sequence.
Managing Zoom View Rows
49
When the Show All Rows item is checked all available annotations are always shown. You can also add rows by selecting the +5 Rows and
+1 Row items and remove rows by selecting the -5 Rows and -1 Row items. To restore the default number of rows select the Reset Rows
Number item.
See also:
Navigating Sequence zoom view using Sequence overview
Zooming Sequence
Creating New Ruler
Manipulating Annotations
50
To copy the statistical information about a sequence select it on the Options Panel and choose the copy item in the context menu, or use the
Ctrl+C shortcut.
Manipulating Sequence
Going To Position
Toggling Views
Capturing Screenshot
Zooming Sequence
Creating New Ruler
Selecting Amino Translation
Showing and Hiding Translations
Selecting Sequence
Copying Sequence
51
Going To Position
To go to a position, use the global actions toolbar:
Or use the Go to position context menu or the Actions main menu item.
Toggling Views
It is possible to switch the Sequence overview, Sequence zoom view and the Sequence details view visibility using the rightmost button in
the toolbar:
The sequence can be removed from the view using the same menu. Once you remove the last sequence in the view, the view is
automatically closed.
Capturing Screenshot
Use a sequence toolbar Capture screen button to save a screenshot of the sequence:
Zooming Sequence
To zoom a sequence in the Sequence zoom view you can use one of the zoom button on the sequence toolbar:
52
There are standard Zoom In and Zoom Out buttons. Additionally you can zoom to a selected region using the Zoom to Selection button. To
restore the default view of the Sequence zoom view (when the sequence is not zoomed) use the Zoom to Whole Sequence button.
The new ruler will be shown right above the default one:
53
The numbering of the genetic codes corresponds the NCBI Genbank database numbering.
54
Selecting Sequence
You can use different items from the Select submenu of the context menu to select a sequence.
Selecting the Sequence region context menu item opens the Select range dialog:
Here you can specify the sequence range you would like to select.
You can open the same dialog using the Select sequence region button on a sequence toolbar or using the Ctrl-A key sequence.
To use the Sequence between selected annotations item, select two annotations in the Annotations editor (holding the Ctrl key at the same
time):
55
And select the Select Sequence between selected annotations item in the context menu.
The Sequence around selected annotations item selects the selected annotations and the sequences between these annotations.
Another way to select a sequence around annotations is to hold Shift and Ctrl keys while clicking on the annotations either in the Sequence
details view or in the Sequence zoom view.
Copying Sequence
The selected sequence region, an annotation sequence or their amino translations can be copied to clipboard:
By pressing the corresponding buttons in the global toolbar.
56
Search in Sequence
To search for a pattern(s) in a sequence go to the Search in Sequence tab of the the Options Panel in the Sequence View.
Input the value you want to search in the text field and click the Search button. To search multiple patterns input the patterns separated by a
new line in the pattern text field. To add a new line symbol Ctrl+Enter may be used. You can input the value as sequence or name of the
sequence in the FASTA format and sequence after that.
By default, misc_feature annotations are created for regions that exactly match the pattern. Find below the description of the available
settings.
Load Patterns from File
Search Algorithm
Search in
Other Settings
Annotations Settings
57
Use this checkbox to load patterns from file. When this option is active the Search for field is disabled.
Search Algorithm
This group specifies the algorithm that should be used to search for a pattern. The algorithm can be one of the following:
InsDel there could be insertions and/or deletions, i.e. a pattern and the searched region can vary in their length. You can specify
the percentage of the pattern and a searched region match in the field nearby. Note that this value also depends on the pattern
length and is disabled when the pattern hasnt been specified.
Substitute a pattern may contain characters different from the characters in the searched region. When this algorithm has been
selected you can also specify the match percentage and additionally it is possible to take into account ambiguous bases.
Regular expression a regular expression may be specified instead of a pattern. For example character . matches any character,
.* matches zero or more of any characters. There is also the Limit result length option that specifies the maximum length of a result.
Exact - find a place where one or several patterns are found within a larger pattern.
Search in
In this group you can specify where to search for a pattern: in what region and in which strand (for nucleotide sequences). Also for nucleotide
sequences it is possible to search for a pattern on the sequence translations.
Strand for nucleotide sequences only. Specifies on which strand to search for a pattern: Direct, Reverse-complementary or Both strands.
Search in for nucleotide sequences you can select the Translation value for this option. In this case the input pattern will be searched in
the amino acid translations.
Region specifies the sequence range where to search for a pattern. You can search in the whole sequence, specify a custom region or
search in the selected region.
Other Settings
Annotations Settings
58
In the Save annotation(s) to group of parameters you can set up a file to store annotations. It could be either an existing annotation table
object or a new document (file).
In the Annotation parameters group you can specify
A group that the found annotations will be stored in (this affects the name of the folder in the Annotations Editor)
The way name(s) of the found annotations are assigned (see below)
After that click the Create annotations button. The annotations will be created. Also you can see the result statistic and navigation under the
Search for: field:
Searching for one or several patterns and names of the result annotations
If you search for one pattern only, than input the required name into the Annotation name field and leave the Use pattern name check box
unchecked.
You can also search for several patterns at a time by:
Inputting several patterns into the search field (click <Ctrl> + <Enter> keys to insert to a new line):
59
Editing Sequence
If the document is not locked, it is possible to edit the sequence:
The Edit sequence submenu is available in the Actions main menu and in the Sequence View context menu. Also you can use the
corresponding shortcuts.
When you press the Ctrl+I shortcut or select the Insert subsequence context menu item the following dialog is opened:
60
Also it is possible to remove selected subsequence from a sequence. When you select corresponding item (in the context menu or in the Acti
ons menu), the Remove subsequence dialog appears:
61
62
The Export Selected Sequence Region dialog will appear which is similar to the Export Selected Sequences dialog described here.
63
The Export Sequence of Selected Annotations dialog will appear which is similar to the Export Selected Sequences dialog described here.
64
Annotations Editor
The Annotations editor contains tools to manipulate annotations for a sequence. It provides a convenient way to organize, view and modify a
single annotation as well as annotation groups.
An annotation for a sequence consists of:
Name (or key) indicates the biological nature of the annotated feature.
Location coordinates in the sequence.
The list of qualifiers qualifiers are the general mechanism for supplying information about annotation. Qualifiers are stored as
pairs of (name, value) strings.
Below is the default layout of the Annotations editor with an extra column for the note qualifier added:
65
There are usually several objects with annotations in the Annotations editor. A special Auto-annotations object is always presented for each
sequnce opened. It contains annotations automatically calculated for the sequence (see below for details).
An object contains groups of annotations used by UGENE for logical organization of the annotations. An annotation must always belongs to
some group.
For documents created not by UGENE annotations are grouped by their names. For annotations created in UGENE it is possible to use
arbitrary group names.
Groups can contain both annotations and other groups. The numbers in the brackets after a group name in the Annotations editor are the
count of subgroups and annotations in the current group.
A single annotation is allowed to be presented in several groups simultaneously. An annotation is physically removed from the document
when it does not belong to any group.
Automatic Annotations Highlighting
The "db_xref" Qualifier
66
To disable/enable the automatic annotations calculations use the Automatic Annotations Highlighting menu button on the Sequence View too
lbar:
To create a permanent annotation click on the Make auto-annotations persistent context menu item and choose the annotation parameters in
the Create Permanent Annotation dialog.
When you click on the value a web page is opened or a file is loaded specified in the reference. The loaded file is added to the current project
.
Manipulating Annotations
Creating Annotation
Selecting Annotations
Editing Annotation
Highlighting Annotations
Annotations Color
Annotations Visability
Show on Translation
Captions on Annotations
Creating and Editing Qualifier
Adding Column for Qualifier
Copying Qualifier Text
Finding Qualifier
Deleting Annotations and Qualifiers
Importing Annotations from CSV
Exporting Annotations
Creating Annotation
To create a new annotation for the active sequence press the Ctrl-N key sequence, select the New annotation toolbar button or use the Add
New annotation or New annotation context menu item:
67
The dialog asks where to save the annotation. It could be either an existing annotation table object or a new document (file).
You can also specify the name of the group and the name of the annotation. If the group name is set to <auto> UGENE will use the
annotation name as the name for the group. You can use the / characters in this field as a group name separator to create subgroups.
The Location field contains annotation coordinates. The coordinates must be provided in the Genbank or EMBL file formats. If you want to
annotate complement sequence strand surround the coordinates with the complement() word or press the last button in the Location row to
do it automatically.
Note, that by default the Location field contains the coordinates of the selected sequence region.
Once the Create button is pressed the annotation is created and highlighted both in the Sequence overview and the Sequence details view a
reas:
68
Selecting Annotations
To select one annotation click on it. To select several annotations hold Ctrl key while clicking on the annotations. To invert the selection use
the Invert annotation selection item in the Annotations editor context menu.
Editing Annotation
If the document is not locked, it is possible to edit an annotation or an annotation group using the Rename item context menu from the Annot
ation Editor or from the Sequence View or with a help F2 key in the Annotation Editor. The result of pressing for an annotation:
Highlighting Annotations
To configure settings of annotation types go to the Annotation Highlighting tab in the Options Panel.
By default the tab shows annotations types of the opened Sequence View.
69
If you want to see all annotation types, click the Show all annotation types link. The Previous annotation and Next annotation buttons seek to
the previous or to the next annotation of the view correspondingly.
Find below information about annotations types properties that you can configure.
Annotations Color
Annotations Visability
Show on Translation
Captions on Annotations
Annotations Color
To change a color of all annotations of a certain type click on the corresponding color box in the annotations types table and select the
required color in the appeared Select Color dialog.
Annotations Visability
You can show/hide annotations of a certain type by selecting the type in the annotations types table and checking/unchecking the Show
annotations of this type check box.
Show on Translation
This option is available for nucleotide sequences only. It specifies to show the annotation on the corresponding amino sequence instead of
the original nucleotide sequence in the Sequence Detailed View, for example:
You can enable/disable this option by checking/unchecking the Show on translation checkbox.
Captions on Annotations
It is possible to show a value of a qualifier of an annotation instead of the annotation type name in the Sequence Zoom View. To enable this
option for an annotation type check the Show value of qualifier check box and input the values of the required qualifiers in the text field
nearby this check box. See the image below.
70
If you input several qualifiers names (separated by comma), then the first found qualifier is taken into account and shown on the annotation.
71
Here you can specify the name and the value of the qualifier.
You can use the F2 key to rename a qualifier:
To edit a qualifier, select the qualifier and press the F4 key or use the Edit qualifier context menu item:
Finding Qualifier
To find a qualifier select annotation(s) or group(s) of annotations and use the Find qualifier context menu.
72
Here you can specify the name and the value of the qualifier and select the searching parameter: Exact match or Contains substring.
73
Basically you need to specify the file to read annotations table from (required):
And the format of and the path to the file to write the annotations table into (required):
Check Add result file to project to link the annotations to the currently opened sequence.
To use a separator to split the table, check the Column separator item and specify the separator symbols. Also you can press Guess to try to
detect the separator from the input file.
Alternatively, you can press Edit and edit the script which will specify the separator for each parsed line. It is possible to use line number in
the script.
74
Using the arrows, you exclude the necessary number of lines at the beginning of the document from parsing. You can also skip all lines that
start with the specified text.
By pressing Preview one can bring up the view of the current annotations table (which is produced from the input file with the specified
parameters values). The input file contents will also be shown at the bottom part of the dialog.
The preview table headline indicates the types of the information contained in the corresponding columns. By default the values are [ignored]
. To specify a column role, click on the corresponding headline element:
75
The annotation start and end positions must be specified. It is possible to add an offset to every read start position by checking the Add offset
checkbox, and to shorten annotations by one from the end by uncheking the Inclusive checkbox.
When all the roles are specified, press Run. With the Add to project checkbox specified and a Sequence View opened, on success you will
see the Sequence View with annotations linked:
Exporting Annotations
Open the Sequence View with document that contains annotations. Select a single or several annotations or annotation groups in the Annota
tion editor, select the Export Export annotations context menu item.
The Export Annotations dialog will appear:
76
Here you can set the path to the file, choose the file format and optionally for CSV format you can save the sequence along with annotations
and save sequence names.
77
The 3D Structure Viewer adds 3D visualization for PDB and MMDB files:
78
The Chromatogram Viewer adds support for chromatograms visualization and editing:
79
The Dotplot provides a tool to build dotplots for DNA or RNA sequences.
80
A number of other instruments add graphical interface for popular sequence analysis methods:
81
82
Circular Viewer
The Circular Viewer plugin provides capability to show the circular view of a nucleotide sequence.
Usage example:
Open a nucleotide sequence object in the Sequence View. The Show circular view button is available on the sequence toolbar:
Pressing the button will show the circular view of the sequence:
If you work with file with many sequences the button closes circular views if some circular views are opened and if all circullar views are
closed, it opens all of them.
Also you can mark sequences as circular in UGENE by the Mark as circular sequence context menu item. When the sequences are marked
as Circular, the Circular View is automatically opened for them in all opened Sequence View windows.
The Restriction Sites Map will appear automatically. To show restriction sites the Show Restriction Sites menu should be checked. To hide
the map click on the following button:
83
The Circular Viewer is opened automatically when the Sequence View is opened for a plasmid.
The inner circle represents the sequence clockwise and the scale marks show the corresponding sequence positions. The sequence
annotations are represented as curved colored regions at the outer side of the circle.
The Circular Viewer helps to navigate within the sequence. You can select an annotation on the circular view and the annotation will also be
focused and highlighted in all Sequence View areas: Sequence overview, Sequence zoom view, Sequence details view and Annotations
editor.
You can also select a sequence region:
This will also affect the Sequence View. You can select a sequence region with Ctrl and the selection will be inverted.
Note that the circular view is zoomed automatically when the Circular Viewer area is resized:
84
85
Here you can browse for the file name, select the width, height and resolution of the image as well as its
format: svg, ps, pdf, bmp, jpeg, jpg, png, ppm, tif or tiff. Also you can include position and selection markers
to the image by the corresponding checkboxes.
Note, that if a sequence file contains several sequences it is possible to view the circular views of the
sequences in the same Circular Viewer area.
You can work with these circular views at the same time.
86
In the title section you can show or hide title and length, change font, size and attribute.
In the ruler section you can show or hide ruler line and coordinates and change the label font size.
In the annotation section you can select the label position and change the label size.
The following label positions are available:
inside - all labels are inside of the annotations
outside - all labels are outside of the annotations
inside/outside - if the label can fit the annotation and it is not auto-annotation, it's located inside. Otherwise outside.
none - no labels at all
3D Structure Viewer
The 3D Structure Viewer is intended for visualization of 3D structures of biological molecules.
Using the 3D Structure Viewer you can work with data from the Protein Data Bank (PDB) - a repository for the 3D structural data of large
biological molecules, such as proteins and nucleic acids, maintained by the Worlwide Protein Data Bank (wwPDB).
You can work as well with data from the NCBI Molecular Modeling DataBase (MMDB), also known as Entrez Structure, a database of
experimentally determined structures obtained from the RCSB Protein Data Bank.
Find the description of the 3D Structure Viewer features below.
Opening 3D Structure Viewer
Changing 3D Structure Appearance
Selecting Render Style
Selecting Coloring Scheme
Calculating Molecular Surface
Selecting Background Color
Selecting Detail Level
87
Notice the Links button on the toolbar. When you click the button the menu appears with quick links to online resources with detailed
information about the molecule opened:
PDB Wiki
RSCB PDB
PDBsum
NCBI MMDB
Note that if youre online, you can access the Protein Data Bank directly from UGENE and load a required file by its PDB ID (see Fetching
Data from Remote Database for details).
Hint
Dont forget to select the correct database (PDB) while fetching.
88
89
90
91
You can also overview the whole structure by spinning it automatically. Select the Spin item either in the 3D Structure Viewer context menu
or in the Display menu on the toolbar to do it.
To stop the spinning uncheck the Spin item.
92
93
To show all the models check the All item. To show only one model check the item and click the OK button. To show several models select it
and click OK button. To show the inverted selection click the Invert button and click OK button.
Structural Alignment
To use the structural alignment call the Structural alignment->Align with context menu item. The following dialog will appear:
Here you can change reference and mobile settings. After that click on the OK button. To reset structural alignment call the Structural
alignment->Reset context menu item.
94
Here you can browse for the file name, select the width and height of the image as well as its format: svg, png, ps, jpg, jpeg, tiff, tif, pdf, bmp
or ppm. For jpg, jpeg formats the quality score parameter is available.
Press the Add button on the toolbar. The Select Item dialog will appear. Select [3d] objects to add.
Hint
Use the Ctrl keyboard button to select several objects.
Below you can see the 3D Structure Viewer with two views:
95
To select an active view click on the view area or select an appropriate value in the Active view combo box on the toolbar.
To synchronize the views press the Synchronize 3D Structure Views sticky button on the toolbar (see the image above). When the button
has been pressed the 3D structures are moved, zoomed and spinned synchronously. Press the button again to stop the views
synchronization.
The views that are no more required can be closed by selecting the Close button in the 3D Structure Viewer context menu.
Also you can hide/show views for a while. Use the menu of the green arrow button on the toolbar to do it:
Notice that the 3D Structure Viewer can be closed from this menu.
Chromatogram Viewer
The Chromatogram Viewer plugin brings DNA chromatogram data viewing and editing capabilities into UGENE.
Currently supported chromatogram file formats are ABIF and SCF.
To view a chromatogram, just open an interesting file in UGENE by standard means (e.g. drag&drop the file or press the Ctrl-O shortcut).
The Chromatogram Viewer is automatically embedded into the generic Sequence View if chromatogram data are found, as on the
screenshot below:
96
To edit a sequence data, right-click on the chromatogram view and select the Edit new sequence item in the appeared context menu. The
following dialog will appear:
Select new document format and location and click on the Create button.
The original DNA sequence is not allowed to be changed; however you can add and modify a new sequence stored in a separate file.
The sequence being edited is displayed right above the original one. Symbols can be changed by clicking on interesting value, modifications
are shown in bold.
Also you can show/hide different signals of chromatogram with a help of the Show/hide trace context menu item:
97
After clicking on the item, the Export chromatogram file dialog will appear:
Check the Reversed and Complemented options if you want to create a reverse and complement chromatogram. Press the Export button.
The exported file will be opened in the Sequence View.
98
You can also use the Lock scales and Adjust scales global actions for the chromatograms.
For example if you lock the scales you are able to scroll the sequences simultaneously. Also when you select a sequence region in one
sequence, the same region is selected in the second sequence.
99
To see a graph select the corresponding graph item in the popup menu. A new area with the graph appears right above the Sequence zoom
view:
100
Each point on a graph is calculated for a window of a specified size. The window is moved along the sequence by a step. See Graph
Settings for instructions on how to modify these parameters.
It is possible to get information about each point of a graph. When a mouse is moved in the Graphs area, a small circle shows on the graph.
A coordinates hint shows above it. When you hold Shift and click on a graph, the circle and the hint locks:
To remove it click on the hint. Also you can delete all labels by Graph->Delete all labels context menu. To select all extremum points use the
Graph->Select all extremum points context menu item.
All graphs are always aligned to the range shown in the Sequence zoom view. It means that if you change the visible range in the overview
(either by zooming or scrolling) the graph will also be updated. The minimum and maximum values of the visible range are shown at the right
lower and upper corners of the graph.
To close a graph, uncheck its item in the popup menu.
Description of Graphs
Graph Settings
Saving Graph Cuttoffs as Annotations
Description of Graphs
Find below the detailed description of each graph. Note that characters A, C, G and T in the formulas denote the number of corresponding
nucleotide in a window.
101
DNA Flexibility searches for regions of high DNA helix flexibility in a DNA sequence. The average Threshold in a window is
calculated by the following formula:
(G+C)/(A+G+C+T)*100
AG Content (%) shows the percentage of nitrogenous bases (either adenine or guanine) on a DNA molecule. It is calculated by
the following formula:
(A+G)/(A+G+C+T)*100
GC Frame Plot this graph is similar to the GC content graph but shows the GC content of the first, second and third position
independently. It is most effective in organisms with GC rich genomic sequence but it also works on all microbial sequences.
GC Deviation (G-C)/(G+C) shows the difference between the G content of the forward strand and the reverse strand. GC
Deviation is calculated by the following formula:
(G-C)/(G+C)
AT Deviation (A-T)/(A+T) shows the difference between the A content of the forward strand and the reverse strand. AT
Deviation is calculated by the following formula:
(A-T)/(A+T)
Karlin Signature Difference dinucleotide absolute relative abundance difference between the whole sequence and a sliding
window. Let:
The Karlin Signature Difference for a window is calculated by the following formula:
sum(p_seq(XY) - p_win(XY)) / 16
Informational Entropy is calculated from a table of overlapping DNA triplet frequencies. The use of overlapping triplets smooths
the frame effect. Informational Entropy is calculated by the following formula:
102
Graph Settings
To change settings of a graph, select the Graph->Graph settings item in the graph context menu. The Graph Settings dialog appears:
103
Dotplot
The Dotplot plugin provides a tool to build dotplots for DNA or RNA sequences. This allows comparing these sequences graphically. Using a
dotplot, you can easily identify such differences between sequences as mutations, inversions, insertions, deletions and low-complexity
regions.
Also the plugin provides advanced features: comparing multiple dotplots, navigation in a dotplot, dotplots synchronization, saving and loading
a dotplot, etc.
An example of a dotplot view:
104
The Dotplot plugin uses the Repeat Finder plugin to build a dotplot, make sure you have the Repeat Finder plugin installed.
The Dotplot features are described in more details below.
Creating Dotplot
Navigating in Dotplot
Zooming to Selected Region
Selecting Repeat
Interpreting Dotplot: Identifying Matches, Mutations, Invertions, etc.
Editing Parameters
Filtering Results
Saving Dotplot as Image
Saving and Loading Dotplot
Building Dotplot for Currently Opened Sequence
Comparing Several Dotplots
Creating Dotplot
To create a dotplot select the Tools Build dotplot main menu item. The Build dotplot from sequences dialog will appear:
105
Here you should specify the File with first sequence. Also you should either check the Compare sequence against itself option or select the Fi
le with second sequence.
Optionally you can select to Join all sequences found in the file (for the first and/or for the second file). If you select to join the sequences you
can also select the Gap size. The gap of the specified size will be inserted between the joined sequences.
After you press the Next button, the dialog to configure the dotplot parameters will appear:
106
The specified algorithm is provided to the Repeat Finder plugin as an input parameter. In most cases the Auto value is appropriate.
Minimum repeat length allows to draw only such matches between the sequences that are continuous
and long enough. For example if it equals to 3bp, then only repeats will be found that contain 3 and more
base symbols.
Press the 1k button to automatically adjust the Minimum repeat length value. Such value will be set, that there will be about 1000 repeats
found.
Repeats identity specifies the percents of the repeats identity.
Press the 100 button to set the 100% identity.
After the parameters are set, press the OK button. The dotplot will appear in the Sequence View:
107
Navigating in Dotplot
To zoom in / zoom out a dotplot you can:
Rotate the mouse wheel.
Press corresponding zoom buttons located on the left:
Hold the middle mouse button and move the mouse cursor over the zoomed region of the doplot.
Click on the desired region of the minimap in the right bottom corner.
Activate the Scroll tool, hold the left mouse button and move the mouse cursor over the zoomed region:
hold down the left mouse button and drag the mouse cursor over the dotplot.
When you select a region on a dotplot the corresponding region is also selected in other Sequence View areas (Sequence details view, Sequ
108
Selecting Repeat
To select a repeat activate the Select tool:
To deselect the repeat either click on other repeat or hold Ctrl and click somewhere on the dotplot.
109
3. Inverted repeats
The Dotplot plugin allows to search for inverted repeats as well. Inverted repeats are shown contrary to the direct repeats.
Use the Search direct repeats and Search inverted repeats options of the Dotplot parameters dialog to select which repeats to draw
(the dialog is described here).
4. Low-complexity regions
A low-complexity region is a region produced by redundancy in a particular part of the sequence. It is represented on a plot as a
rectangular area filled with the matches.
110
Hint
Compare sequence with itself to easily find low-complexity regions in it.
Editing Parameters
It is possible to edit parameters of a built dotplot. Right-click on the dotplot and select the Dotplot Parameters context menu item:
The parameters dialog will be re-opened. See description of the available parameters here.
Filtering Results
It is possible to find features intersections and filter dotplot results. Right-click on the dotplot and select the Dotplot Filter results context
menu item. The following dialog will appear:
111
Select features and click OK button. The filtered dotplot will appear.
112
Available formats are *.png, *.jpg, *.bmp, *.jpeg, *.ppm, *.tif, *.tiff, *.xbm and *.xpm.
The Save Dotplot dialog will appear. A dotplot is saved in a file with the *.dpt extension.
Later the dotplot can be loaded using the Dotplot Save/Load Load context menu item.
If you need to compare a sequence with itself, you can activate the menu from a single Sequence View.
113
114
Alignment Editor
Overview
Alignment Editor Features
Alignment Editor Components
Navigation
Coloring Schemes
Creating Custom Color Scheme
Highlighting Alignment
Zooming and Fonts
Searching for Pattern
Consensus
Export Consensus
Alignment Overview
Working with Alignment
Undo/Redo Framework
Selecting Subalignment
Moving Subalignment
Editing Alignment
Removing Selection
Filling Selection with Gaps
Replacing with Reverse-Complement
Replacing with Reverse
Replacing with Complement
Removing Columns of Gaps
Removing All Gaps
Saving Alignment
Aligning Sequences
Pairwise Aligning
Working with Sequences List
Adding New Sequences
Copying Sequences
Sorting Sequences
Shifting Sequences
Collapsing Rows
Exporting in Alignment
Extracting Selected as MSA
Exporting Sequence from Alignment
Exporting Alignment as Image
Statistics
Distance Matrix
Grid Profile
Advanced Functions
Building HMM Profile
Building Phylogenetic Tree
PHYLIP Neighbor-Joining
MrBayes
PhyML Maximum Likelihood
Overview
This chapter gives an overview of the Alignment Editor components and explains basic concepts of browsing an alignment.
Alignment Editor Features
Alignment Editor Components
Navigation
Coloring Schemes
Creating Custom Color Scheme
Highlighting Alignment
Zooming and Fonts
Searching for Pattern
Consensus
Export Consensus
Alignment Overview
115
Navigation
The Sequence area provides several flexible ways to navigate through an alignment. The simplest way is to use the mouse and the
scrollbars.
Alternatively you can use arrow keys on the keyboard to navigate.
The list of hot keys for quick navigation:
PageUp to move one screen left.
PageDown to move one screen right.
Home to center the starting columns of the alignment.
End to move to the trailing columns of the alignment
Hint
116
Enter the column number (base coordinate) and the view will be centered to the corresponding base.
Coloring Schemes
There are various coloring schemes for DNA and amino alphabets available.
To change the scheme, activate the Colors context menu:
117
Select the new scheme name, alphabet and click on the Create button. The next dialog will appear for nucleotide extended mode:
Here you can select a color for each element. Click on the element for it. The new scheme will be created after clicking the OK button. The
new custom scheme will be available in the Colors->Custom schemes context menu.
Highlighting Alignment
118
119
Select file to export, exported area and click on the Export button.
By default, the base characters are visible when zooming. But for rather long sequences there is another zoom mode available. In this mode
the bases are not shown. This allows viewing very large sequence regions (up to 500 bp).
You can zoom to the selected region by clicking the Zoom to selection button. It is very convenient operation, when the alignment size is
rather large. For example, you can zoom out to some percentage, select an interesting region and then zoom to the selection.
You can change font by clicking the Change font button.
To reset zoom and font click the Reset zoom button.
120
Press the right arrow to search in the direction From left to right, from top to bottom. Press the left arrow to search in the direction From
right to left, from bottom to top. If the pattern is found, the result will be focused and highlighted in the Sequence area. You can continue the
search in any direction from this position.
Consensus
Each base of a consensus sequence is calculated as a function of the corresponding column bases. There are different methods to calculate
the consensus. Each method reveals unique biological properties of the aligned sequences. The Alignment Editor allows switching between
different consensus modes. To switch the consensus mode go to the General tab of the Options Panel or activate the context menu (using
the right mouse button) or the Actions menu and select the Consensus mode item and General tab will be opened automatically:
Export Consensus
121
Alignment Overview
The alignment overview is shown automatically in the Alignment Editor. To close the overview click on the Overview toolbar button. To show
the simple alignment overview use the Show simple overview context menu item of the overview.
122
Undo/Redo Framework
The editor tracks all modifications of the aligned sequences.
When a modification happens the current state of the multiple sequence alignments object is being recorded.
You can apply any previous state and redo the modifications using the corresponding buttons on the toolbar:
Selecting Subalignment
While in the Sequence area, if you hold the left mouse button and move the cursor, you will activate the selection mode. By moving the
cursor you can adjust the size of the selection. Also you can use the Shift modifier for selecting. Select a first row, hold Shift and select a last
row. All the rows between the first and the last row will be selected.
Releasing the mouse button will result in exiting the selection mode.
The selection mode is available in the Sequence list and the Consensus area too. The difference between these areas and the Sequence
area is that here you can add to selection the whole rows or columns respectively.
To cancel the selection, press the Esc key.
123
Moving Subalignment
To move subalignment there are different ways:
1. Select a subalignment and drag and drop it. The subalignment will be moved.
2. With a help of the Spase the subalignment will be moved to the right by size of the selection. With a help of the Backspase the
subalignment will be returned to the first state.
3. With a help of the Ctrl+Spase the subalignment will be moved to the right by one column. With a help of the Ctrl + Backspase the
subalignment will be returned to the first state.
Editing Alignment
Select the Edit submenu in the Alignment Editor context menu:
Removing Selection
To remove a subalignment select it and choose the Edit Remove selection item in the context menu or press the Delete key. For Mac OS
use the Fn+Delete key instead of the Delete key.
124
To remove colums containg certain number of gaps select the Edit Remove columns of gaps item in the context menu. The dialog appears:
Saving Alignment
To save current alignment click the Save alignment button, to the the alignment into another file click the Save alignment as button.
Aligning Sequences
The Alignment Editor integrates several popular multiple sequence alignment algorithms. Below is the list of available algorithms and links to
the documentation:
Port of the popular MUSCLE3 algorithm.
KAlign plugin: effective work with huge alignments.
ClustalW and MAFFT: these algorithms appeared in the version 1.7.2 of UGENE with the External Tools plugin.
T-Coffee: this alignment algorithm is available since version 1.8.1 of UGENE with the External Tools plugin.
To align sequences choose a preferred alignment method in the Actions main menu, in the context menu or by Align main toolbar button .
Also you may find useful the following video tutorials devoted to the multiple sequence alignment:
Making a multiple sequence alignment from FASTA file
Working with large alignments in UGENE
Performing profile-to-profile and profile-to-sequence MUSCLE alignments
Running remote MUSCLE task
Pairwise Aligning
To align two sequences go to the Pairwise Alignment tab of the Options Panel:
125
Select two sequence from the original alignment, select the parameters and click on the Align button. The following parameters are available:
Algorithm - algorithm of the pairwise alignment. There are two algorithms:
Hirschberg (KAlign) - algorithm has the following parameters:
Gap open penalty - indicates the penalty applied for opening a gap. The penalty must be negative.
Gap extension penalty - indicates the penalty applied for extending a gap.
Terminate gap penalty - the penalty to extend gaps from the N/C terminal of protein or 5'/3' terminal of nucleotide
sequences.
Bonus score - a bonus score that is added to each pair of aligned residues.
Smith-Waterman - the following parameters are available:
Algorithm version - version of the algorithm implementation. Non-classic versions produce the same results as classic but
much faster. To use these optimizations our system must support these capabilities: OPENCL, SSE2 or SW_classic.
Scoring matrix - scoring matrix.
Gap open penalty - penalty for opening a gap.
Gap extension penalty penalty for extending a gap.
Output settings - settings of the otput file.
126
You will see the Project View tree filtered to show only appropriate sequences. Select the items to add and press the Ok button.
Copying Sequences
To copy current selection click the Copy Copy selection item in the Actions main menu or the context menu. The hotkey for this action is Ctrl
-C.
To copy one or several sequences do the following:
Select the sequences in the Sequence list area;
Select the Copy Copy selection context menu item in the Sequence area or use hot key combination. Note, that if you activate
context menu in the Sequence list area you will lose your current selection.
Sorting Sequences
To sort sequences by name in the alphabetical order choose the View Sort sequences by name item from the Actions main menu or the
context menu.
Shifting Sequences
To change an order of sequences in a multiple sequence alignment do the following:
127
Collapsing Rows
It is able to coolaps the sequential rows. To collapse rows click on the Switch on/off collapsing main toolbar button:
The triangle will appear near collapsed sequences. Click on the triangle to show the whole tree of the collapsed rows.
To update the collapsed groups click on the corresponding main toolbar button .
Exporting in Alignment
Extracting Selected as MSA
Exporting Sequence from Alignment
Exporting Alignment as Image
128
Specify the name and format of the new MSA file in the File name and File format to use fields. The currently selected region is extracted by
default when you press the Extract button.
You can change the columns to be extracted using the From and to fields. And change the rows to be extracted by checking / unchecking
required sequences in the Selected sequences list.
Use buttons:
Invert selection to invert the selection of the sequences.
Select all to select all sequences.
Clear selection to clear the selection of all sequences.
The Add to project check box specifies to add the MSA file created from the subalignment to the active project.
Here it is possible to specify the result file location, to select a sequence file format, to define whether to keep or remove gaps ( chars) in
the sequence and optionally add the created document to the current project.
129
The file save dialog will appear where you should set name, location, export settings and format of the picture:
UGENE supports export to the BMP, JPEG, JPG, PNG, PPM, TIF, TIFF, XBM, and XPM image formats. You can export whole alignment or
custom region. To select the custom region click on the Select button.
Statistics
To show statistics use the Statistic tab of the Options Panel:
Here you need to select a reference sequence. Also you can change the distance algorithm, select the profile mode and exclude gaps. To
generate distance matrix and grid profile see the documentation below:
130
Distance Matrix
Using the Alignment Editor you can also create a distance matrix of a multiple sequence alignment.
To create a distance matrix, use the Statistics Generate distance matrix item in the Actions main menu or in the context menu.
The dialog will appear:
131
Grid Profile
Using the Alignment Editor you can create a statistic profile of a multiple sequence alignment.
The alignment grid profile shows positional amino acid or nucleotide counts highlighted according to the frequency of symbols in a row.
To create a grid profile, use the Statistics Generate grid profile item in the Actions main menu or in the context menu.
To learn more about this feature, refer to the DNA Statistics plugin documentation.
Advanced Functions
This chapter is devoted to the advanced functions of the Alignment Editor. You will learn how to build a grid profile, export a picture of an
alignment and build HMM profiles.
Building HMM Profile
Also you can use Tree Settings tab of the Options Panel:
132
PHYLIP Neighbor-Joining
The Building Phylogenetic Tree dialog for the PHYLIP Neighbour-Joining method has the following view:
133
134
MrBayes
The Building Phylogenetic Tree dialog for the MrBayes method has the following view:
135
136
137
138
Perform bootstrap - the support of the data for each internal branch of the phylogeny can be estimated using non-parametric
bootstrap.
Tree searching parameters - selection of the tree topology searching algorithm:
Make initial tree automatically - initial tree automatically.
Type of tree improvement - type of tree improvement.
Set number of random starting tree - number of random starting tree.
Optimize topology - the tree topology is optimised in order to maximise the likelihood.
Optimize branch lengths - optimize branch lengths.
Display tree in new window - displays tree in new window.
Display tree with alignment editor - displays tree with alignment editor.
Synchronize alignment with tree - synchronize alignment and tree.
Save tree to - file to save the built tree.
Press the Build button to run the analysis with the parameters selected and build a consensus tree.
139
Assembly Browser
The UGENE Assembly Browser project started in 2010 was inspired by Illumina iDEA Challenge 2011 and multiple requests from UGENE
users.The main goal of the Assembly Browser is to let a user visualize and efficiently browse large next generation sequence assemblies.
Currently supported formats are SAM (Sequence Alignment/Map) and BAM, which is a binary version of the SAM format. Both formats are
produced by SAMtools and described in the following specification: SAMtools. Support of other formats is also planned, so please send us a
request if youre interested in a certain format.
To browse an assembly data in UGENE, a BAM or SAM file should be imported to a UGENE database file. After that you can convert the
UGENE database file into a SAM file. The import to a UGENE database file has both advantages and disadvantages. The disadvantages are
that the import may take time for a large file and there should be enough disk space to store the database file.
On the other hand, this allows one to overview the whole assembly and navigate in it rather rapidly. In addition, during the import you can
select contigs to be imported from the BAM/SAM file. So, there is no need to import the whole file if youre going to work only with some
contigs. Note that in the future there are plans to support the other approach as well, namely, when a BAM/SAM file is opened directly.
The Assembly Browser has been tested on different BAM/SAM files from the 1000 Genomes Project and other sources.
Read the documentation below to learn more about the Assembly Browser features.
Import BAM/SAM File
Import ACE File
Browsing and Zooming Assembly
Opening Assembler Browser Window
Assembly Browser Window
Assembly Browser Window Components
Reads Area Description
Assembly Overview Description
Ruler and Coverage Graph Description
Go to Position in Assembly
Using Bookmarks for Navigation in Assembly Data
Getting Information About Read
Short Reads Vizualization
Reads Highlighting
Reads Shadowing
Associating Reference Sequence
Associating Variations
Consensus Sequence
Exporting
Exporting Reads
Exporting Visible Reads
Exporting Coverage
Exporting Consensus
Exporting Consensus Variations
Exporting Assembly as Image
Options Panel in Assembly Browser
Navigation in Assembly Browser
Assembly Statistics
Assembly Browser Settings
Assembly Browser Hotkeys
Assembly Overview Hotkeys
Reads Area Hotkeys
140
The Source URL field in the dialog specifies the file to import. The Info button nearby can be used to obtain additional information about the
file.
There is a list of contigs below the Source URL. Check the contigs that you want to import to the database. You can use the Select All, Desel
ect All and Invert Selection buttons to manage the selection.
The Destination URL field specifies the output database file.
If you check the Import unmapped reads, then all unmapped reads in the assembly (i.e. read with the unmapped flag or without CIGAR) are
imported. Note, however, that they are not vizualized in the current UGENE version.
To start the import, click the Import button in the dialog. You can see the progress of the import in the Task View. To export a UGENE
database file into the SAM format, select the Actions Export assembly to SAM format item in the main menu.
141
If you choose the first option the file will be opened in the Alignment Editor as multiple sequence alignment. If you choose the second option
the following dialog will appear:
Select the Source URL and Destination URL and click OK button.
The Source URL field in the dialog specifies the file to import.
The Destination URL field specifies the output database file.
Each [as] object corresponds to an imported contig. When you double-click on an [as] object a new Assembly Browser window with the
assembly data is opened. A window for the first assembly object in the list is opened automatically after the import.
142
Note that for large assemblies it may take some time to calculate the overview and the well-covered regions.
To see the reads, either select a region from the list or zoom in, for example, by clicking the link above the well-covered regions or by rotating
the mouse wheel.
You can also use the hotkeys. Tips about hotkeys are shown under the list of well-covered regions. To learn about available hotkeys refer to
Assembly Browser Hotkeys.
Assembly Overview
By default, shows the whole assembly overview. Can be resized to provide an overview of an
assembly part.
Reference Area
Shows the reference sequence.
Consensus Area
Shows the consensus sequence.
Ruler
Shows the coordinates in the Reads Area.
Reads Area
Displays the reads.
Coverage Graph
143
144
To scroll the resized overview, drag the mouse while pressing down the mouse wheel.
To learn about available hotkeys refer to Assembly Browser Hotkeys.
To show/hide the coordinates on the ruler you can click the following button on the toolbar:
To show/hide the coverage on the ruler you can click the following button on the toolbar:
Alternatively, you can use the Show coordinates and Show coverage under cursor check boxes located on the Assembly Browser Settings ta
b of the Options Panel.
Go to Position in Assembly
To go to the required position in an assembly use the following field located on the Assembly Browser toolbar.
Input the location and click the Go! button. A similar Go! field is also available on the Navigation tab of the Options Panel.
145
Or uncheck the Show pop-up hint check box on the Assembly Browser Settings tab of the Options Panel.
The hint shows the following information about the read:
Read name
Location
Length
Cigar
Strand
Read sequence
The operations in the Cigar parameter are described as follows:
M Alignment match (can be a sequence match or mismatch).
I Insertion to the reference. Skipped when the read is aligned to the reference, i.e. it is not shown in the Reads Area, but is
present in the read sequence.
D Deletion from the reference. Gaps are inserted to the read when the read is aligned to the reference. For example:
N Skipped region from the reference. Behaves as D, but has a different biological meaning: for mRNA-to-genome alignment it
represents an intron.
S Soft clipping (clipped sequences are present in the read sequence, i.e. behaves as I).
H Hard clipping (clipped sequences are not present in the read sequence).
P Padding (silent deletion from padded reference).
= Exact match to the reference.
x Reference sequence mismatch.
To copy the information about the read to the clipboard, select the Copy read information to clipboard item in the Reads Area context menu.
Now you can paste it in any text editor.
To copy the current position of the read select the Copy current position to clipboard item in the Reads Area context menu.
Reads Highlighting
To apply a reads highlighting mode, select it in the Reads highlighting menu of the Reads Area context menu or on the Assembly Browser
Settings tab of the Options Panel. The following modes are available:
Nucleotide shows all nucleotides in different colors. It is used by default.
146
Strand direction highlights reads located on the direct strand in blue and reads on the complement strand in green.
Paired reads highlights all paired reads in green. Note that the information about the pair is shown in the hint.
Reads Shadowing
Various modes of column highlighting are available from the Reads shadowing item in the context menu of the Reads Area:
Disabled highlights all columns of nucleotides.
Free highlights all reads that intersect a given column. In this mode you can lock a position. Click the Lock here item in the
context menu to do it. To return to a locked position, select the Jump to locked base item in the context menu.
Centered highlights all reads that intersect the column in the center of the screen.
147
To remove the association, select the Unassociate item in the Reference Area context menu.
Associating Variations
To associate variations with the assembly, open the sequence (the sequence must be loaded) and drag it to the Assembly Reference Area:
148
To remove the association, select the Remove track from the view item in the Variations Area context menu.
Consensus Sequence
A consensus sequence can be found in the Consensus Area under a reference sequence. It refers to the most common nucleotide at a
particular position.
To choose a consensus algorithm select the Consensus algorihtm item either in the context menu of the Consensus Area, in the context
menu of the Reads Area or on the Assembly Browser Settings tab of the Options Panel. .
The following algorithms are currently available:
Default shows the most common nucleotide at each position. When there is equal numbers of different nucleotides in a position,
the consensus sequence resulting nucleotide is selected randomly from these nucleotides.
SAMtools uses an algorithm from the SAMtools Text Alignment Viewer to build the consensus sequence. The algorithm takes into
account quality values of reads and nucleotides and works with the extended nucleotide alphabet.
To leave only differences between the reference and the consensus sequences highlighted on the consensus sequence, select the Show
difference from reference item in the context menu of the Consensus Area or the Difference from reference item on the Assembly Browser
Settingstab of the Options Panel:
149
To export a Consensus Sequence, right-click on it in the Consensus Area and select the Export Export consensus item in the context menu.
For more information about consensus exporting see Exporting Consensus.
Exporting
Exporting Reads
Exporting Visible Reads
Exporting Coverage
Exporting Consensus
Exporting Consensus Variations
Exporting Assembly as Image
Exporting Reads
To export a read, right-click on it in the Reads Area and select the Export Current read item in the context menu.
The Export Reads dialog appears:
Select a file to export the read to and the file format. The read can be exported either to a FASTA or FASTQ file.
When the parameters are set click the Export button.
The read is exported to the file and if the Add to project check box has been checked it is added to the current project from where you can op
en it.
Exporting Coverage
To export a coverage of the assembly, select either the Export coverage item in the Consensus Area context menu.
The Export Coverage dialog appears:
150
Select a file, threshold and export option: Export coverage or Export bases count or both of them.
When all the parameters are set click the Export button.
Exporting Consensus
To export a consensus sequence of the assembly, select either the Export consensus item in the Consensus Area context menu or the Expor
t Consensus item in the Reads Area context menu.
The Export Consensus dialog appears:
Select a file and the file format. The consensus can be exported to a FASTA, FASTQ, GFF or GenBank file.
Modify, if required, the exported sequence name and choose the consensus algorithm.
The consensus is exported with gaps if the Keep gaps check box has been checked.
Also you can select the exporting region. It can be either a Whole sequence, a Visible region, or a Custom region.
When all the parameters are set click the Export button.
The consensus sequence is exported to the file and if the Add to project check box has been checked it is added to the current project and
opened.
151
Select a file, mode and the file format. The following modes are available: Variations, Similar and All. Variations can be exported as to a
SimpleSNP or VCFv4 file.
Modify, if required, the consensus algorithm.
The consensus is exported with gaps if the Keep gaps check box has been checked.
Also you can select the exporting region. It can be either a Whole sequence, a Visible region, or a Custom region.
When all the parameters are set click the Export button.
The consensus sequence is exported to the file and if the Add to project check box has been checked it is added to the current project and
opened.
The Export consensus variations feature is available when the reference sequence is associated with assembly.
In the dialog you can select the image file name and its format (bmp, jpeg, png, etc.). For some file formats the Quality parameter also
becomes available.
When the parameters are set click the OK button.
152
To learn more about well-covered regions refer to the Assembly Browser Window chapter.
To learn more about searching required position refer to the Go to Position in Assembly chapter.
Assembly Statistics
The Assembly Statistics tab includes the following Assembly Information:
Name the name of the opened assembly.
Length the length of the assembly.
Reads the number of reads in the assembly.
Also the tab can include the Reference Information if it is available in the assembly file. For example:
MD5
Species
URI
153
To learn more about Reads Area settings refer to the Reads Area Settings chapter.
154
Action
Ctrl + wheel
Alt + click
Action
wheel
double-click
+/-
arrow
Ctrl + arrow
Home / End
Ctrl+G
155
To load a tree from a file follow the instruction described in the Opening Document paragraph or use the Tree settings tab of the Options
Panel. For example, you may open the $UGENE\data\samples\Newick\COI.nwk sample file provided within UGENE package.
To build a tree from a multiple sequence alignment see the Building Phylogenetic Tree paragraph.
To learn what you can do with a tree using UGENE Phylogenetic Tree Viewer read the documentation below.
Tree Settings
Selecting Tree Layout and View
Modifying Labels Appearance
Showing/Hiding Labels
Aligning Labels
Changing Labels Formatting
Adjusting Branch Settings
Zooming Tree
Working with Clade
Selecting Clade
Collapsing/Expanding Branches
Swapping Siblings
Zooming Clade
Adjusting Clade Settings
Changing Root
Exporting Tree Image
Printing Tree
Tree Settings
To adjust a tree settings select either the Tree Settings toolbar button or the Tree settings tab of the Options Panel. The Tree settings tab:
156
157
Showing/Hiding Labels
When you open a tree all the labels are shown by default.
To hide the taxon (sequence name) labels select the Show Labels toolbar button or in the Tree settings Options Panel tab uncheck the Show
Names item.
To hide the distance labels uncheck the Show Distances item.
To show the labels again check an appropriate item.
Labels settings in the Options Panel:
Aligning Labels
To align a tree labels press the Align labels toolbar button or in the Tree settings Options Panel tab check the Align label item.
See the example of aligning labels below:
158
Here you can select color, font, size and attributes (bold, italic, etc.) of the labels.
Note that when a clade has been selected the labels formatting settings are applied to the clade only.
Here you can select the color and the line width of the tree branches.
Note that when a clade has been selected the branch settings are applied to the clade only.
Zooming Tree
To change the size of a tree use the Zoom In and Zoom Out toolbar button. You can use the Restore Zooming toolbar button to set the
default size.
Or use the corresponding items in the Actions main menu.
See also: Zooming Clade.
159
Selecting Clade
To select a clade click on its root node:
Collapsing/Expanding Branches
You can hide branches of a clade by selecting the Collapse item in the context menu of the clades root node:
To show the collapsed clade select the Expand item in the nodes context menu.
Swapping Siblings
To interchange the locations of the the two branches of a clade select the Swap Siblings item in the context menu of the root node of the
clade.
160
Zooming Clade
Additionally to other zooming options you can use the Zoom In item in the context menu of the root node of a clade.
Changing Root
To change root of a tree select the root and call the Reroot tree context menu item:
Printing Tree
To print a tree select either the Print Tree toolbar button or the Actions Print Tree item in the main menu.
The standard print dialog will appear where you can select a printer to use and specify other settings.
161
Extensions
Workflow Designer
DNA Annotator
DNA Flexibility
Configuring Dialog Settings
Result Annotations
DNA Statistics
DNA Generator
ORF Marker
Remote BLAST
Exporting BLAST Results to Alignment
Fetching Sequences from Remote Database
BLAST/BLAST+
Creating Database
Making Request to Database
Fetching Sequences from Local BLAST Database
Repeat Finder
Repeats Finding
Tandem Repeats Finding
Tandem Repeats Search Result
Restriction Analysis
Selecting Restriction Enzymes
Using Custom File with Enzymes
Filtering by Number of Hits
Excluding Region
Circular Molecule
Results
Molecular Cloning in silico
Digesting into Fragments
Creating Fragment
Constructing Molecule
Available Fragments
Fragments of the New Molecule
Changing Fragments Order in the New Molecule
Removing Fragment from the New Molecule
Editing Fragment Overhangs
Reverse Complement a Fragment
Other Constuction Options
Output
Creating PCR Product
In Silico PCR
Primers Details
Primer Library
Secondary Structure Prediction
SITECON
SITECON Searching Transcription Factors Binding Sites
Types of SITECON Models
Eukaryotic
Prokaryotic
Building SITECON Model
Smith-Waterman Search
HMM2
Building HMM Model (HMM Build)
Calibrating HMM Model (HMM Calibrate)
Searching Sequence Using HMM Profile (HMM Search)
HMM3
Building HMM Model (HMM3 Build)
Searching Sequence Using HMM Profile (HMM3 Search)
Searching Sequence Against Sequence Database (Phmmer Search)
uMUSCLE
MUSCLE Aligning
Aligning Profile to Profile with MUSCLE
Aligning Sequences to Profile with MUSCLE
ClustalW
MAFFT
T-Coffee
Bowtie
Bowtie Aligning Short Reads
Building Index for Bowtie
Bowtie 2
Bowtie 2 Aligning Short Reads
Building Index for Bowtie 2
BWA
Aligning Short Reads with BWA
Building Index for BWA
BWA-SW
Aligning Short Reads with BWA-SW
162
Workflow Designer
The Workflow Designer allows a molecular biologist to create and run complex computational workflow schemas even if he or she is not
familiar with any programming language.
The workflow schemas comprise reproducible, reusable and self-documented research routines, with a simple and unambiguous visual
representation suitable for publications.
The workflow schemas can be run both locally and remotely, either using graphical interface or launched from the command line.
The elements that a schema consists of corresponds to the bulk of algorithms integrated into UGENE. Additionally you can create custom
workflow elements.
163
To learn more about the Workflow Designer read the Workflow Designer Manual (follow the link on the UGENE documentation page).
DNA Annotator
The DNA Annotator plugin provides an algorithm to search for sequence regions that contain a predefined set of annotations.
Usage example:
Open the Sequence View for a sequence that has annotations. A good candidate here could be any file in Genbank format with a rich set of
annotations.
Select the Analyze Find annotated regions item in the context menu. The dialog will appear:
164
Using this dialog you can search for DNA sequence regions that contain every annotation from the list on the left side. The found regions are
displayed on the right side of the dialog.
Use the Save regions as annotations button to store the regions as new annotations to the sequence.
DNA Flexibility
To search for regions of high DNA helix flexibility in a DNA sequence, open the sequence in the Sequence View and select the Analyze Find
high DNA flexibility regions item in the context menu. Note that only standard DNA alphabet is supported, i.e. the sequence should consist of
characters A, C, G, T and N.
The following dialog appears:
The calculation is made for overlapping windows along a given sequence. If there are two or more consecutive windows with an average
flexibility threshold (in each window) greater than the specified Threshold parameter, such area is marked by an annotation.
The average threshold in a window is calculated by the following formula:
165
Angle
Dinucleotide
Angle
AA
7.6
CA
14.6
AC
10.9
CC
7.2
AG
8.8
CG
11.1
AT
12.5
CT
8.8
GA
8.2
TA
25
GC
8.9
TC
8.2
GG
7.2
TG
14.6
GT
10.9
TT
7.6
166
Once the Search button has been pressed, the annotations for the regions of the high DNA flexibility are created.
Result Annotations
Each annotation has the following qualifiers:
area_average_threshold average window threshold in the area (i.e. total_threshold / windows_number)
total_threshold sum of all window thresholds in the area
windows_number number of windows in the area
Using the DNA Graphs Package you can see the flexibility graph of a DNA sequence.
DNA Statistics
The DNA Statistics plugin provides exportable statistic reports.
In the current UGENE version the DNA Statistics plugin provides only Alignment Grid Profile report. The Alignment Grid Profile shows
positional amino acid or nucleotide counts highlighted according to the frequency of symbols in a row.
The original idea of the MSA Grid Profile is described in the following paper:
Alberto Roca, Albert Almada and Aaron C Abajian: ProfileGrids as a new visual representation of large multiple sequence alignments: a
case study of the RecA protein family, BMC Bioinformatics 2008, 9:554
Usage example:
Open a sequence alignment in the Alignment Editor and use the Statistics Generate grid profile context menu item.
167
Here is a brief description of the options that can be set in the dialog:
Profile mode: Counts/Percents select the Percents to have scores shown as percents in the report.
Show scores for gaps check this item if you want gap characters () statistics to be shown in the report.
Show scores for symbols not used in alignment if a symbol is not used in the alignment at all it wont be shown in the report. Check this
item to make all symbols of alignment alphabet reported.
Skip gaps in consensus position increments consensus ruler configuration. If checked the gaps in consensus will not lead to ruler
increments.
Save profile to file allows to save profile to a file in the HTML or CSV format. The CSV format is convenient for further processing in
worksheets editors like Excel.
The result profile in the HTML mode:
168
DNA Generator
DNA sequence generator is a tool that generates a random DNA sequence with specified nucleotide content. To generate a random DNA
sequence select the Tools->Generate sequence item in the main menu. The dialog will appear:
169
ORF Marker
From this chapter you can learn how to search for Open Reading Frames (ORF) in a DNA sequence. The ORFs found are stored as
automatic annotations. This means that if the automatic annotations highlighting has been enabled then ORFs are searched and highlighted
for each sequence opened. Refer Automatic Annotations Highlighting to learn more.
To open the ORF Marker dialog, select the Analyze Find ORFs item in the context menu.
170
171
Clicking on a codon name redirects you to Wikipedia to give you a brief description of the corresponding amino acid. Cells of the table are
colored according to classes of amino acids.
172
Remote BLAST
The Remote BLAST plugin provides a capability to annotate sequences with information stored in the NCBI BLAST remote database.
To perform a remote database search open a Sequence View, select a sequence region to analyze and click the Analyze Query NCBI
BLAST database context menu item. If a region is not selected the whole sequence will be analyzed.
The following dialog will appear where you can choose the search options:
173
174
The view of the Advanced options tab depends on the selected search. For the blastn search it looks like on the picture above.
Word size the size of the subsequence parameter for the initiated search.
Gap costs costs to create and extend a gap in an alignment. Increasing the Gap costs will result in alignments which decrease the
number of Gaps introduced.
Match scores reward and penalty for matching and mismatching bases.
Entrez query a BLAST search can be limited to the result of an Entrez query against the database chosen. This restricts the search
to a subset of entries from that database fitting the requirement of the Entrez query. Examples are given below:
protease NOT hiv1[organism] this will limit a BLAST search to all proteases, except those in HIV 1.
1000:2000[slen] this limits the search to entries with lengths between 1000 to 2000 bases for nucleotide entries, or 1000 to
2000 residues for protein entries.
Mus musculus[organism] AND biomol_mrna[properties] this limits the search to mouse mRNA entries in the database. For
common organisms, one can also select from the pulldown menu.
10000:100000[mlwt] this is yet another example usage, which limits the search to protein sequences with calculated
molecular weight between 10 kD to 100 kD.
src specimen voucher[properties] this limits the search to entries that are annotated with a /specimen_voucher qualifier on the
source feature.
all[filter] NOT enviromnental sample[filter] NOT metagenomes[orgn] this excludes sequences from metagenome studies and
uncultured sequences from anonymous environmental sample studies.
For help in constructing Entrez queries see the Entrez Help document.
Filters filters for regions of low compositional complexity and repeat elements of the humans genome.
175
Masks for lookup table only this option masks only for purposes of constructing the lookup table used by BLAST so that no hits are
found based upon low-complexity sequence or repeats (if repeat filter is checked).
Mask lower case letters with this option selected you can cut and paste a FASTA sequence in upper case characters and denote
areas you would like filtered with lower case.
Filter by filters results by accession, by definition of annotations or by id.
Select result by selects results by EValue or by score.
When the blastp search is selected in the general options, the view of the Advanced options tab is the following:
As you can see there is no Match scores option, but there are Matrix and Service options.
Matrix key element in evaluating the quality of a pair-wise sequence alignment is the substitution matrix, which assigns a score for
aligning any possible pair of residues.
Service blastp service which needs to be performed: plain, psi or phi.
The Advanced options tab is not available when the cdd search is selected.
Exporting BLAST Results to Alignment
Fetching Sequences from Remote Database
176
BLAST/BLAST+
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or
protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and
evolutionary relationships between sequences as well as help identify members of gene families.
BLAST+ is a new version of the BLAST package from the NCBI.
From UGENE you can use the following tools of the old BLAST package:
blastall the old program developed and distributed by the NCBI for running BLAST searches.
formatdb formats protein or nucleotide source databases before these databases can be searched by blastall.
And the following tools of the new BLAST+ package:
blastn searches a nucleotide database using a nucleotide query.
177
Creating Database
To format a BLAST database do the following:
If youre using BLAST open Tools BLAST FormatDB.
If youre using BLAST+ open Tools BLAST BLAST+ make DB.
The Format database dialog appears:
Here you must select the input files. If all the files you want to use are located in one directory, you can simply select the directory with the
files. By default only the files are taken into account with *.fa and *.fasta extensions. You can change this by specifying either Include files
filter orExclude files filter.
You can choose either protein or nucleotide type of the files.
178
Then you must select the path to save the database file and specify a Base name for BLAST files and a Title for database file.
179
Word size - the size of the subsequence parameter for the initiated search.
Gap costs - costs to create and extend a gap in an alignment. Increasing the Gap costs will result in alignments which decrease the
number of Gaps introduced.
Match scores - reward and penalty for matching and mismatching bases.
Filters - filters for regions of low compositional complexity and repeat elements of the humans genome.
Masks for lookup table only this option masks only for purposes of constructing the lookup table used by BLAST so that no hits are
found based upon low-complexity sequence or repeats (if repeat filter is checked).
Mask lower case letters with this option selected you can cut and paste a FASTA sequence in upper case characters and denote
areas you would like filtered with lower case.
The view of the Advanced options tab depends on the selected search. For the blastn search it looks like on the picture above. When the bla
stx search is selected in the general options, the view of the Advanced options tab is the following:
180
For gapped alignment - X dropoff value (in bits) for gapped alignment.
For ungapped alignment - X dropoff value (in bits) for ungapped alignment.
181
Here you need select a query ID, database, type of file(s) and output path. After that click on the Fetch button.
Repeat Finder
The Repeat Finder plugin provides a tool to search for direct and invert repeats in a DNA sequence. Also it allows to search for tandem
repeats.
Repeats Finding
Tandem Repeats Finding
Tandem Repeats Search Result
Repeats Finding
Usage example:
Open a DNA sequence in the Sequence View and select the Analyze Find repeats... context menu item:
The dialog will appear that allows specifying repeat parameters and the annotations table document to save the results into:
182
The dialogues status line displays approximate repeats number that will be found with the current settings.
The Advanced tab provides additional repeats finding options:
The found repeats are saved and displayed as annotations to the DNA sequence:
183
184
Min period, Max period the minimum and maximum acceptable repeat length measured in base symbols.
Region to process specify the region to search in the whole sequence, a custom region or the region of the current selection (if
any).
Save annotation(s) to specify the existing or new annotations table file to save the resulting annotations into.
Annotation parameters you can change the default group name and annotation(s) name values of the resulting annotation(s).
185
Restriction Analysis
From this chapter you can learn how to search for restriction sites on a DNA sequence.
The restriction sites found are stored as automatic annotations. This means that if the automatic annotations highlighting is enabled then the
restiction sites are searched and highlighted for each nucleotide sequence opened. Refer Automatic Annotations Highlighting to learn more.
Open a DNA sequence in and click the following button on the Sequence View toolbar:
Alternatively, select either the Actions Analyze Find restriction sites item in the main menu or the Analyze Find restriction sites item in the
context menu.
The Find restriction sites dialog appears:
186
You can see the list of restriction enzymes that can be used to search for restriction sites. The information about enzymes was obtained from
the REBASE database. For each enzyme in the list a brief description is available (the accession ID in the database, the recognition
sequence, etc.). If youre online you can get more detailed information about an enzyme selected by clicking the REBASE Info button.
Selecting Restriction Enzymes
Using Custom File with Enzymes
Filtering by Number of Hits
Excluding Region
Circular Molecule
Results
187
Excluding Region
To exclude a sequence region from the search check the Exclude region check box and input the start and the end positions of the region. If
a subsequence has been selected before opening the dialog you can click the Selected button to automatically fill the values with the
selected subsequences start and end positions.
Circular Molecule
To consider the sequence as circular and be able to search for restriction sites between the end and the beginning of the sequence check
the Circular molecule option.
Example: Lets consider:
The sequence is CTGC ... CAC.
AarI restriction enzyme (with recognition sequence CACCTGC) has been checked.
In this case if the Circular molecule option has been checked, the restriction site will be found. If it hasnt been checked, the restriction site
wont be found (in this position).
Results
When at least one enzyme has been selected and the OK button has been pressed in the dialog, the auto-annotating becomes enabled. In
the Annotations editor the Restriction Sites annotations can be found in the Auto-annotations\enzyme group.
The direct and complement cut site positions are visualized as triangles on an annotation in the Sequence details view:
188
On the Restriction Sites tab of the dialog you can see the name of the molecule, the list of restriction enzymes found during the restriction
analysis that can cut the molecule and the list of enzymes selected to perform the digestion.
To digest the sequence into fragments you should select at least one enzyme.
To move an enzyme to the Selected enzymes list click on it in the Available enzymes list and press the Add button. Note that you can select
several items in a list by holding the Ctrl key while clicking on the items.
To select all available enzymes press the Add All button.
To remove enzymes from the Selected enzymes list select them in the list and press the Remove button.
To remove all items from the Selected enzymes list press the Clear Selection button.
On the Conserved Annotations tab of the dialog you can select the annotations that must not be disrupted during cloning.
On the Output tab of the dialog you can select the file to save the new molecule to.
As soon as the required parameters are selected press the OK button. The fragments will be saved as annotations.
Also all the generated fragments are available in the task report:
189
Creating Fragment
To create a DNA fragment from a sequence region activate the Sequence View window and select either the Actions Cloning Create
Fragment item in the main menu or the Cloning Create Fragment item in the context menu.
The Create DNA Fragment dialog appears:
If a region has been selected you can choose to create the fragment from this region. Otherwise you can either choose to create the
fragment from the whole sequence or choose the Custom item and input the custom region.
To add a 5 overhang to the direct strand check the Include Left Overhang check box and input the required nucleotides. To add a 5
overhang to the reverse strand in addition to the described steps select the Reverse-complement item in the same group box.
Similarly, to add a 3 overhang check the Include Right Overhang check box, input the required overhang and select either the direct or the
reverse-complement strand.
On the Output tab of the dialog you can optionally modify the annotations output settings.
Finally, press the OK button to create the fragment. The fragment will be saved as an annotation.
190
Constructing Molecule
To construct a new molecule from fragments select the Tools Cloning Construct Molecule item in the main menu.
If a Sequence View window is active you can also select either the Actions Cloning Construct Molecule item in the main menu or the Clonin
g Construct Molecule item in the context menu.
The Construct Molecule dialog appears:
Available Fragments
Fragments of the New Molecule
Changing Fragments Order in the New Molecule
Removing Fragment from the New Molecule
Editing Fragment Overhangs
Reverse Complement a Fragment
Other Constuction Options
Output
Available Fragments
All the fragments available in the current project are shown in the Available fragments list.
You can automatically create a fragment from a DNA molecule from the current UGENE project. Click the From Project button to do so. The
Select Item dialog appears with the sequence objects available. Select a sequence and press the OK button. After that create a fragment in
the appeared Create DNA Fragment dialog as described in the Creating Fragment paragraph. The fragment created from the sequence
appears in the list of available fragments.
191
Here you can select the type of each DNA end and even input a custom overhang.
The changes youve made are shown in the Preview area of the dialog.
To confirm the changes and close the dialog click the OK button.
Output
192
If a primer has been selected you can choose to create the PCR product from this primer. Otherwise you can either choose to create the
PCR from the whole sequence or choose the Custom item and input the custom region.
To add a 5 overhang to the direct strand check the Include Left Overhang check box and input the required nucleotides. To add a 5
overhang to the reverse strand in addition to the described steps select the Reverse-complement item in the same group box.
Similarly, to add a 3 overhang check the Include Right Overhang check box, input the required overhang and select either the direct or the
reverse-complement strand.
On the Output tab of the dialog you can optionally modify the annotations output settings.
Finally, press the OK button to create the PCR product. The PCR product will be saved as an annotation.
In Silico PCR
In Silico PCR Overview
In silico PCR is used to calculate theoretical polymerase chain reaction (PCR) results using a given set of primers (probes) to amplify DNA se
quences.
UGENE provides the In silico PCR feature only for nucleic sequences. To use it in UGENE open a DNA sequence and go to the In silico
PCR tab of the Options Panel:
193
Choosing primers
Type two primers for running In Silico PCR. If the primers pair is invalid for running the PCR process then the warning is shown. Also, primer
s for the running In silico PCR can be chosen from a primers library. Click the following button to choose a primer from the primers library:
194
The table consists of the following columns: name, GC-content (%), Tm, Length (bp) and sequence. Select primer in the table and click the C
hoose button.
Click the Reverse-complement button for making a primer sequence reverse-complement:
Click Show primers details for seeing statistic details about primers.
When you run the process, the predicted PCR products appear in the products table.
Products table
There are three columns in the table:
region of product in the sequence
product length
preferred annealing temperature
Click the product for navigating to its region in the sequence.
Click the Extract product(s) button for exporting a product(s) in a file or use double click for that.
195
Primers Details
Primer Library
Primers Details
Click Show primers details for seeing statistic details about primers.
196
This is a dialog with statistic details about primers: melting temperature, GC content, dimers, self-dimers, etc. If a value is not correct for its
criteria then it is colored in red.
Primer Library
The primer library is a storage for keeping user primers. The added primers are stored between UGENE sessions.
Go to the Tools->Primer->Primer library context menu to configure the primer library. The following window will appear:
197
Click the New primer button to add a new primer. The following dialog will appear:
Input the primer sequence and primer name and click the OK button.
Select the primer in the table (you can use Ctrl and Shift) and click Remove primer(s) button to remove primer(s).
Select the primer and click the Edit primer button to edit primer.
198
199
Save as annotation select this button to save the results as annotations of the current protein sequence.
SITECON
SITECON is a program package for recognition of potential transcription factor binding sites basing on the data about conservative
conformational and physicochemical properties revealed on the basis of the binding sites sets analysis.
To cite SITECON use the following article:
Oshchepkov D.Y., Vityaev E.E., Grigorovich D.A., Ignatieva E.V., Khlebodarova T.M.SITECON: a tool for detecting conservative
conformational and physicochemical properties in transcription factor binding site alignments and for siterecognition. //Nucleic Acids Res.
2004 Jul 1;32(Web Server issue):W208-12.
UGENE version of SITECON provides a tool for recognition of potential binding sites for over 90 types of transcription factors. Also UGENE
version of SITECON provides a tool for recognition of potential binding sites basing site alignment proposed by user. For the detailed method
description see the original SITECON site.
Data about used context-dependent conformational and physicochemical properties are available in the PROPERTY Database.
SITECON Searching Transcription Factors Binding Sites
Types of SITECON Models
Eukaryotic
Prokaryotic
Building SITECON Model
200
The regions found by SITECON algorithm can be saved as annotations to the DNA sequence in the Genbank format.
Every SITECON profile supplied with UGENE contains complete information about calibration settings provided to UGENE team by the
author of SITECON.
The original TFBS alignments used to calculate profiles can be requested directly from the author of SITECON.
Eukaryotic
Name
Description
CEBP_a
CCAAT-enhancer-binding protein_alpha
CEBP_all
CCAAT-enhancer-binding proteins
CLOCK
cMyc_can
CRE
E2F1
E2F1/DP1sel1
201
EGR1
EKLf
ER2
GATA_all
GATA-1
GATA-binding factor 1
GATA-2
GATA-binding protein 2
GATA-3
HMG-1
HNF-1
HNF-3
HNF-4
IRF
isre
MyoD
MyOGsel3
Myogenin
NF-1
Neurofibromin 1
NF-E2
NFATp
NFkB_all
NFkB_hetero
NFkB_ homo
Nfy
Nrf2
Oct-1
Oct_all
p53
Protein 53
PPRF
Pu1
setCREB
setCREBzag
SRE_san
SRF
STAT1
STAT
202
TTF1
USF
yy1
Prokaryotic
Name
Description
AgaR
AgaC
ArcA
ArgR
CpxR
Crp
CysB
Cysteine B
CytR
Cytidine Regulator
DeoR
Deoxyribose Regulator
DnaA
FadR
fis
FlhDC
Fnr
Frur
Fructose repressor
FUR
GALR
Galactose repressor
GALS
Galactose isorepressor
GLPR
sn-Glycerol-3-phosphate repressor
GNTP
HNS
ICLR
IHF
ISCR1
ISCR3
LEXA
203
Lrp
MALT
Maltose regulator
MARA
MELR
Melibiose regulator
MEtJ
MetR1
MLC
MODE
NAC
NAGC_new2
N-acetylglucosamine
NANR
NARL2
NARL
NARP
NIRC
OmpC
OxyR
PHOB
PHOP
PurR
RcsB_1
RcsB_2
Rob2
ROB
soxS
TORR
TRPR
TyrR
Tyrosine repressor
204
Here you need to select a nucleotide alignment and an output model. Optionally, you can change other parameters. After that click on the Bu
ild button.
Smith-Waterman Search
The Smith-Waterman Search plugin adds a complete implementation of the Smith-Waterman algorithm to UGENE.
To use the plugin open a nucleotide or protein sequence in the Sequence View and select the Analyze Find pattern [Smith-Waterman] item
in the context menu. The Smith-Waterman Search dialog appears:
First of all you need to specify the pattern to search for. The rest parameters are optional:
Search in select either to search in the sequence or in its translation.
Strand select the strand to search in: direct, complementary or both strands.
Region specifies the region of the sequence that will be used to search for the pattern. By default, if a subsequence has been
selected when the dialog has been opened, then the selected subsequence is searched for the pattern. Otherwise, the whole
sequence is used. You can also input a custom range.
205
Here you can select a file to save the alignment to (Alignment files directory path parameter).
Using the Set advanced options checkbox you can select the saving options.
You can set the different templates for files names: create your own or create by using the following: [E] adds a subsequence end
position, [hms] adds a time, [MDY] adds a date, [S] adds a subsequence start position, [L] adds a subsequence length, [SN]
adds a reference sequence name prefix, [PN] adds a pattern sequence name prefix, [C] adds a counter.
You can create templates for alignment files names, reference subsequence names, pattern subsequence names and for pattern sequence
name:
206
HMM2
The HMM2 plugin is a toolkit based on the Sean Eddys HMMER2 package.
While working on this plugin we were guided by the following principles:
Make the HMMER2 tools accessible to a wider user audience by providing graphical interface for all supported utilities for most of
the platforms.
Be compatible with the original HMMER2 package.
Create the high-performance solution utilizing modern multi-core processors and SIMD instructions.
The current version of UGENE provides user interface for three HMM2 tools: HMM build, HMM calibrate and HMM search.
In the original program the corresponding commands are: hmmbuild, hmmcalibrate and hmmsearch.
To access these tools select the Tools HMMER2 tools submenu of the program main menu:
We highly recommend reading the original HMMER2 documentation to learn how to use utilities provided by the plugin.
SSE2 algorithm is implemented by Leonid Konyaev, Novosibirsk State University. Use of the SSE2 optimized version of the HMM
search algorithm with quad-core CPU gives >30x performance boost when compared with the original single-threaded algorithm
(single sequence mode).
207
The HMM build tool does not automatically calibrate a profile. Use the HMM calibrate tool to calibrate the profile.
208
The search results are stored as sequence annotations in the Genbank file format.
All HMM2 UGENE tools work only with files that contain a single HMM model.
209
HMM3
The HMM3 plugin is a toolkit based on the Sean Eddys HMMER3 package.
While working on this plugin we were guided by the following principles:
Make the HMMER3 tools accessible to a wider user audience by providing graphical interface for all supported utilities for most of
the platforms.
Be compatible with the original HMMER3 package.
Create the high-performance solution utilizing modern multi-core processors.
The current version of UGENE provides user interface for three HMM3 tools: HMM3 build, HMM3 search and Phmmer search.
In the original program the corresponding commands are: hmmbuild, hmmsearch and phmmer.
To access these tools select the Tools HMMER3 tools submenu of the program main menu:
We highly recommend reading the original HMMER3 documentation to learn how to use utilities provided by the plugin.
Building HMM Model (HMM3 Build)
Searching Sequence Using HMM Profile (HMM3 Search)
Searching Sequence Against Sequence Database (Phmmer Search)
The HMM3 configuration dialog provides an easy way to set appropriate search parameters.
210
For example, reporting thresholds options can be configured using the dialog:
211
The search results are stored as sequence annotations in the Genbank file format.
The HMM3 search works only with files that contain a single HMM model.
212
You can set options of the Phmmer search by choosing the needed dialog tab. Here you can see the e-value calibration options:
The results are stored as sequence annotations in the Genbank file format.
213
uMUSCLE
UGENE contains graphical ports of the Robert C. Edgars MUSCLE tool for multiple alignment.
MUSCLE4 is not supported since UGENE version 1.7.2.
The package is integrated completely, so there is no need in extra files for using it. It is possible to run several multiple alignment tasks in
parallel, check the progress and cancel the running tasks safely.
The k-mer clustering part of the MUSCLE algorithm was optimized for multicore systems by Timur Tleukenov, Novosibirsk State
Technical University.
MUSCLE Aligning
Aligning Profile to Profile with MUSCLE
Aligning Sequences to Profile with MUSCLE
MUSCLE Aligning
To run the classic MUSCLE use the Align Align with MUSCLE context menu item in the Alignment Editor.
214
By default UGENE does not rearrange sequence order in an alignment, but the original MUSCLE package does. To enable
sequence rearrangement uncheck the Do not re-arrange sequences (-stable) option in the dialog.
One of the improvements to the original MUSCLE package is the ability to align only a part of the model. When the Column range item is
selected the region of the specified columns is only passed to the MUSCLE alignment engine. The resulted alignment is inserted into the
original one with gaps added or removed on the region boundaries.
To visually select the column range to align, make a selection in the alignment editor first. Then invoke the MUSCLE plugin. Its
column range boundary values will automatically match the given selection.
215
There are two gap columns inserted into the source profile, and two gap columns inserted into the added one. Therefore the profiles columns
kept intact and the alignments havent been changed.
Aligning a profile to the active alignment you will modify the original alignment file, since it will contain 2 profiles after the operation
is completed.
The original alignment is not modified, only columns with gap () character can be inserted.
The second profile was considered as a set of sequences and therefore is modified.
Note that if a file with another alignment is used as a source of unaligned sequences, the gap characters are removed and each input
sequence is processed independently.
This method is quite fast, for example an alignment of 3000 sequences (1000 bases each) to the existing profile takes about 5 minutes on
the usual Core2Duo computer.
ClustalW
Clustal is a widely used multiple sequence alignment program. It is used for both nucleotide and protein sequences. ClustalW is a
command-line version of the program.
216
217
MAFFT
Originally, MAFFT is a multiple sequence alignment program for unix-like operating systems. However, currently it is available for Mac OS X,
Linux and Windows. It is used for both nucleotide and protein sequences.
MAFFT home page: http://mafft.cbrc.jp/alignment/software
To make MAFFT available from UGENE:
Install the MAFFT program on your system.
Set the path to the MAFFT executable on the External tools tab of UGENE Application Settings dialog.
For example, on Windows you need to specify the path to the mafft.bat file.
To use MAFFT open a multiple sequence alignment file and select the Align with MAFFT item in the context menu or in the Actions main
menu. The following dialog appears:
218
T-Coffee
T-Coffee is a multiple sequence alignment package.
T-Coffee home page: T-Coffee
To make T-Coffee available from UGENE see the External Tools.
To use T-Coffee open a multiple sequence alignment file and select the Align with T-Coffee item in the context menu or in the Actions main
menu. The following dialog appears:
Bowtie
Bowtie is a popular short read aligner. Click this link to open Bowtie homepage. Bowtie is embedded as an external tool into UGENE.
Open Tools DNA Assembly submenu of the main menu.
219
Select the Align short reads item to align short reads to a DNA sequence using Bowtie. Or select the Build index item to build an index for a
DNA sequence which can be used to optimize aligning of the short reads to the sequence.
Bowtie Aligning Short Reads
Building Index for Bowtie
220
221
Bowtie 2
Bowtie 2 is a popular ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
Click
this link
to
Select the Align short reads item to align short reads to a DNA sequence.
Or select the Build index item to build an index for a DNA sequence which can be used to optimize aligning of
the short reads to the sequence:
Bowtie 2 Aligning Short Reads
Building Index for Bowtie 2
222
By default, Bowtie 2 performs end-to-end read alignment. That is, it searches for alignments involving
all of the read characters. This is also called an "untrimmed" or "unclipped" alignment.
When the --local option is specified, Bowtie 2 performs local read alignment. In this mode, Bowtie 2
223
BWA
BWA is a fast light-weighted tool that aligns relatively short reads to a reference sequence. Click this link to open BWA homepage. BWA is
embedded as an external tool into UGENE.
224
Select the Align short reads item to align short reads to a DNA sequence using BWA. Or select the Build index item to build an index for a
DNA sequence which can be used to optimize aligning of short reads.
Aligning Short Reads with BWA
Building Index for BWA
225
226
BWA-SW
BWA is a fast light-weighted tool that aligns relatively short reads to a reference sequence. Click this link to open BWA homepage. BWA-SW
share similar features such as long-read support and split alignment. BWA-SW is embedded as an external tool into UGENE.
Open Tools Align to reference submenu of the main menu.
Select the Align short reads item to align short reads to a DNA sequence using BWA-SW. Or select the Build index item to build an index for
a DNA sequence which can be used to optimize aligning of short reads.
227
228
229
BWA-MEM
BWA is a fast light-weighted tool that aligns relatively short reads to a reference sequence. Click this link to open BWA homepage. BWA-ME
M is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than
BWA-backtrack for 70-100bp Illumina reads.
Open Tools Align to reference submenu of the main menu.
Select the Align short reads item to align short reads to a DNA sequence using BWA-MEM. Or select the Build index item to build an index
for a DNA sequence which can be used to optimize aligning of short reads.
Aligning Short Reads with BWA-MEM
Building Index for BWA-MEM
230
231
232
Select the Align short reads item to align short reads to a DNA sequence or Build index item to build an index for a DNA sequence which can
be used to optimize aligning short reads to the sequence.
Aligning Short Reads with UGENE Genome Aligner
Building Index for UGENE Genome Aligner
Converting UGENE Assembly Database to SAM Format
233
234
235
Select assembly and result files and click on the Convert button.
CAP3
CAP3 (CONTIG ASSEMBLY PROGRAM Version 3) is a sequence assembly program for small-scale assembly with or without quality
values. Click this link to open CAP3 homepage. CAP3 is embedded as an external tool into UGENE.
Open Tools DNA assembly submenu of the main menu.
Select the Contig assembly with CAP3 item to use the CAP3.
The Contig Assembly With CAP3 dialog appears.
236
237
SPAdes
SPAdes St. Petersburg genome assembler. Click this link to open SPAdes homepage. SPAdes is embedded as an external tool into
UGENE.
Open Tools DNA assembly.
238
Weight Matrix
The Weight Matrix plugin is a tool for solving the problem of a sequence annotating. As well as for the SITECON, the main use case of the
plugin is recognition of potential transcription factor binding sites on basis of the data about conservative conformational and physicochemical
239
In the search dialog you must specify a file with PWM or PFM. You can do so by pressing the browse button and selecting the file.
Also you can use the special interface to choose a JASPAR matrix by pressing the Search JASPAR database button.
Alternative way to specify the position weight/frequency matrix is to create a specific one from an alignment or a file with several sequences
with the build a new matrix tool.
After the profile (the matrix) is loaded, you can adjust the threshold value. The threshold sets the minimal identity score for a result to pass.
The more the result score is, the more it is homologically related to the aligned region. By changing the threshold you can filter low- scoring
results.
If the loaded matrix is a position frequency matrix, you must also specify the algorithm to build the corresponding position weight matrix which
will represent the transcription factor. There are four algorithms available.
240
Also you can add a selected matrix with the specified Minimal score and the Algorithm to the matrices list. To do it, select the matrix and
other options and press the Add to queue button. The plugin will search with all matrices specified in the list.
You can use the Save list button to export the list of matrices to a *.csv file. Later the list can be loaded from the file using the Load list button
.
The rest options are standard sequence search options: the strand and the sequence region where to search for matches.
After specifying the necessary options press the Search button. The found results will appear in the dialog table. The corresponding results
identity scores are in the Score column.
Also you can see the matrix by using the View matrix button:
The regions found by the weight matrix algorithm can be saved as annotations to the DNA sequence in the Genbank format by pressing the
Save as annotations button.
After saving, the file with resulting annotations will be automatically added to the current project, and the annotations will be added to the
original sequence.
Note that in case of selecting JASPAR or UNIPROBE matrix, the resulting annotations will contain the given matrix properties.
241
See also:
Searching JASPAR Database
Building New Matrix
242
Here the matrices are divided into categories and you can read detailed information of a matrix which is represented by its properties. It could
help you to choose the matrix properly.
The matrices provided with UGENE are located in the $UGENE/data/position_weight_matrix folder.
243
244
Primer3
The Primer3 plugin is a port of the Primer3 tool. It is intended to pick primers from a DNA sequence.
To use the Primer3, open a DNA sequence and select the Analyze Primer3 context menu item. The dialog will appear:
245
To design primers for your mRNA sequence and go to the RT-PCR tab of the Primer Designer dilaog:
246
247
In the list of sequences select the corresponding mRNA sequence and click OK. The resulting alignment will be saved as an annotation with
name "exon":
248
External Tools
The External Tools plugin allows one to launch an external tool from UGENE.
To use an external tool from UGENE, the tool needs to be installed on the system and the path to it should
be properly configured. However, there is no need in the additional configuration, if youve installed the
UGENE Full Package, as it already contains all the tools by default.
Otherwise, if youve installed the UGENE Standard Package, you would need to configure an external tool in order to use it. Note that in this
case you can download the package with all the external tools from this page.
To learn how to configure an external tool, read below.
Configuring External Tool
249
Query Designer
The Query Designer allows a molecular biologist to analyze a nucleotide sequence using different algorithms (Repeats finder, ORF finder,
Weight matrix matching, etc.) at the same time imposing constraints on the positional relationship of the results obtained from the algorithms.
A user-friendly interface is used to create a schema of the algorithms and constraints.
To activate Plasmid Auto Annotation upon your sequence use the menu item Analyze Annotate plasmid and custom features. In the appeare
250
The detected plasmid features are stored as automatic annotations and can be controlled through
corresponding menu. Refer Automatic Annotations Highlighting to learn more.
The database containing features and their sequences is located in a subfolder of UGENE data folder: data/custom_annotations/
plasmid_features.txt.
ClustalO
Clustal is a widely used multiple sequence alignment program. It is used for both nucleotide and protein sequences.Clustal Omega is the
latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of
sequences to be aligned in only a few hours. It will also make use of multiple processors, where present.
Clustal home page: http://www.clustal.org
If you are using Windows OS, there are no additional configuration steps required, as ClustalO executable file is included to the UGENE
distribution package. Otherwise:
Install the Clustal program on your system.
Set the path to the ClustalW executable on the External tools tab of UGENE Application Settings dialog.
Now you are able to use ClustaOl from UGENE.
Open a multiple sequence alignment file and select the Align with ClustalO item in the context menu or in the Actions main menu. The Align
with ClustalO dialog will appear (see below), where you can adjust the following parameters:
Number of iterations number
of processors to use.
251
Kalign Aligning
Kalign is a fast and accurate multiple sequence package designed to align large numbers of protein sequences.
Kalign home page: KAlign
To use Kalign open a multiple sequence alignment file and select the Align with Kalign item in the context menu or in the Actions main menu.
The following dialog appears:
DAS Annotating
The DAS annotator finds similar protein sequence using remote BLAST. Using IDs of sequences found loads annotation for DAS sources.
Nucleotide sequences are skipped.To annotate with DAS use the DAS Annotations tab of the Options Panel:
252
253
Expert Discovery
ExpertDiscovery system applies an original knowledge discovery approach (Relational Data Mining) [Scientific Discovery Web Site; Vityaev,
2006; Vityaev, Kovalerchuk, 2008; Vityaev, Kovalerchuk, 2004; Kovalerchuk, Vityaev, 2000]. The approach was used in Discovery system
which has been successfully applied for solution some particular problems in the fields of psychophysics, cancer diagnostics and securities
rates prediction. The heart of the system is semantic probabilistic inference. [Vityaev, 2006].
The idea of new knowledge discovery is to sequentially increase accuracy of hypotheses so that on each step the hypotheses have the
higher probability and definition level. Also the level of significance of the results is tested by statistical criterions.
Discovery system implements semantic probabilistic inference with knowledge discovery as a set of probability laws, the strongest probability
laws and maximally specific laws.
ExpertDiscovery is an adaptation of the Discovery system which is configured to knowledge discovery in sets of nucleotide sequences,
according to semantic probabilistic inference, as complex signals with specified parameters.
ExpertDiscovery plugin in UGENE has the following advantages:
1. Crossplatforming
2. The unite system
a. Many algorithms within the bounds of one project, apparently, give more possibilities than many different individual narrow
applications. Such an approach simplifies users work: that is needed is to launch UGENE which gives the access to the
wide range of the algorithms instead of launching different unrelated programs.
b. UGENE plugins have unified interface and work logic. Also, user who is already familiar with UGENE could cope with a new
module faster. Thus, ExpertDiscovery uses reliable interface and visualization solutions (sequence view, annotation view,
task manager, etc.) of UGENE.
c. Extension and combination of results possibilities appear. For example, ExpertDiscovery markups can be UGENE
algorithms results (SITECON, Weight Matrix, Query Designer, etc.)
d. Data formats. ExpertDiscovery can read sequences in any format which is supported by UGENE (FASTA, FASTAQ,
Genbank, GFF, EMBL, etc.).
To open the ExpertDixcovery go to the Tools->Expert Discovery main menu item. More detailed information about ExpertDiscovery you can
find below:
Loading Sequences
Mapping Sequences
Markup Sequences
Creating Signals
Generating Signals
Complex Signals Recognition on a Sequence
254
Loading Sequences
To load sequences to ExpertDiscovery click on the New ExpertDiscovery Document toolbar button:
Load the sequences you want to analyze by choosing any file with a sequence or multiple sequences. Positive sequence base contains a
regulation object you are interested in. Negative sequence base doesn't have it. You also may generate negative sequences automatically.
ExpertDiscovery will extract complex signals which reflect a structure of your regulation object. The more sequences you provide the better
will be result. Click on the Next button. The following dialog will appear:
255
Mapping Sequences
You can show loaded sequences by different ways:
1. By Positive, Negative and Control context menus:
2. By sequence context menu you can show one sequence, add sequence to displayed or clear displayed sequences area:
Markup Sequences
To markup sequences go to the Markup context menu:
Creating Signals
To manually create Complex Signal one can use the context menu of the Complex signals item:
256
Under definition of CS, it is represented as a hierarchical tree in which the operations are nodes and markups items or words are leafs.
When CS is created and selected, its structure can be changed and parameters can be viewed in the parameters area. The available types
of nodes are the distance operation (binary), the repetition operation, the interval operation, the markup items and words. CS is full
determined when all its leafs have terminal symbols words or markup items.
Generating Signals
Using the training set (positive and negative set, markups) the system can construct a structure of a regulatory region as Complex Signal.
The extracting wizard is launched by the Extract signals button on the toolbar:
In the first dialog window extraction parameters (see below) are set. Next windows are for setting operations which will be nodes of CS and
choosing a folder for CS storing.
To see CS location in a sequence it is needed to pick sequences for representation with the popup menu of the sequence. Then, one can
choose any CS and it will be shown as autoannotations on each represented sequence. Moreover, it is possible to observe few signals at
once on the sequence, for this, user checks signals for group representation with the popup menu. The same operation is used to choose
signals for recognition.
257
In the dialog errors of the first and the second type are shown for choosing the value.
Also, for convenience, an HTML recognition report can be generated. The report includes statistical parameters and a recognition result for
each sequence.
258
Shared Database
The rational storage of biological data is an ever-present issue. It is not only about large data sizes, but also about the requirement of
simultaneous access to them by several scientists. For instance, a few researchers from a lab may need to work on the same data, like a set
of primers or data produced by sequencing. That information has to be updated and synchronized between different users and kept in a
common storage. That is what UGENE Shared Database is intended for.
To start sharing data via UGENE you need to deploy a public database server. MySQL servers are currently supported. See this paragraph f
or details about the required server configuration.
After that any UGENE user (who knows the correct login/password, however) can connect to the database. The connected database is
shown in the Project View as a document exactly the same way as if the data were located on the local computer.
As described in this paragraph the users can have a read-only access to the database or be able to modify its content. A user with a
read-only access can:
Browse the data in the database
Open the data in the UGENE views
Export the data to the local computer
Users with write access, in addition, can:
Add new objects to the database
Create new folders to order the data in the database
Modify the folders hierarchy inside the database (using drag'n'drop)
Rename objects and folders
Delete existed objects
Delete folders
All UGENE instances connected to a database constantly monitors the state of the database and shows changes, made by other users.
UGENE accesses large remote data, such as NGS assemblies, so that only a viewed part of them is loaded to a client computer.
So, if you store the assembly data on a server, the data can be browsed in the UGENE Assembly Browser on a local computer
almost instantly, without the need to copy the data on the computer or use the hard disk space.
Configuring Database
259
To add new connection click on the Add button. The following dialog appears:
260
Here you need to specify Host (IP-address of the server), Port (number of the port used by the MySQL server) and Database (name of the
database). You may also fill Login and Password fields. Otherwise, you are asked to input them every time you are establishing this
connection until you check the Remember me box. Click on the OK button, then the connection is created and the appropriate item appears
in the previous Shared Database Connections dialog.
If you want to use already existing connection choose the appropriate item in the Shared Database Connections dialog and press the Conne
ct button. This can also be done by double clicking the item. If the specified database is empty, UGENE has to initialize it. This routine is
done only once. In this case you get an appropriate message box, asking whether to initialize the database or not. If you choose Yes the
database is populated with UGENE data structures, if No it remains empty and UGENE does not connect to it.
If you want to delete some connection select it in the Shared Database Connections dialog and click on the Delete button. You may also edit
connection parameters using the Edit button.
An established connection can be terminated by pressing the Delete button. The same effect is produced by removing the database
document item from Project View.
261
Here you can add to the database files, folders or other objects from the current Project View. To do this use corresponding buttons. After
specifying your data click on the Import button. The data will be imported and appear in the database data tree. Also you can change import
settings. To do this click on the General options button. The following dialog will appear:
262
You can add a new folder to the database tree. To do that use the Add->Add folder database context menu item. To add a subfolder to some
existing folder use the Add->Add folder folder context menu item. To delete an object or a folder press the Delete button or drag'n'drop it to
the Recycle bin.
In this version of UGENE objects in the database are read-only. Nevertheless, there is a workaround to edit them. First, you need export the
objects to files on your computer using the Export/Import object context menu. Then you can change that files locally, upload them to
database and, finally, delete the originals.
If new data are added to the database by another user or removed from it, UGENE detects this and shows updates automatically in Project
View.
Deleting Data
To remove an object or a folder select it and press the Delete button or drag it to the Recycle bin folder.
All removed items are located in the Recycle bin folder.
263
To delete all files from Recycle bin click on the Empty recycle bin context menu item of the Recycle bin.
To restore objects from the Recycle bin select them and call the Restore selected items context menu item.
When the database is updated outside, UGENE shows these changes on your computer automatically.
You cannot delete any object from Recycle bin if it is opened on the other computer. This situation can appear if the object was
being viewed by another user when you moved it to Recycle Bin.
264
2. Choose the predefined "UGENE public database" item and click the Connect button.
265
266
Here:
task_name task to execute, it can be one of the predefined tasks or a task you have created.
task_parameter parameter of the specified task. Some parameters of a task are required, like in and out parameters of some tasks.
option one of the CLI options.
See the example below:
CLI Options
CLI Predefined Tasks
Format Converting Sequences
Converting MSA
Extracting Sequence
Finding ORFs
Finding Repeats
Finding Pattern Using Smith-Waterman Algorithm
Adding Phred Quality Scores to Sequence
Local BLAST Search
Local BLAST+ Search
Remote NCBI BLAST and CDD Requests
Annotating Sequence with UQL Schema
Building Profile HMM Using HMMER2
Searching HMM Signals Using HMMER2
Aligning with MUSCLE
Aligning with ClustalW
Aligning with ClustalO
Aligning with Kalign
Aligning with MAFFT
Aligning with T-Coffee
Building PFM
Searching for TFBS with PFM
Building PWM
Searching for TFBS with Weight Matrices
Building Statistical Profile for SITECON
Searching for TFBS with SITECON
Fetching Sequence from Remote Database
Annotating with DAS
Gene-by-Gene Report
Reverse-Complement Converting Sequences
Variants Calling
Generating DNA Sequence
Creating Custom CLI Tasks
CLI Options
--help | -h [<option_name> | <task_name>]
Shows help information. For example:
267
ugene --help
ugene -h
ugene --help=<option_name>
ugene -h <option_name>
ugene --help=<task_name>
ugene -h <task_name>
--log-no-task-progress
A task progress is shown by default when a task is running. This option specifies not to show the progress.
--log-level="[<category1>=]<level1> [, ...]"
Sets the log level per category. If a category is not specified, the log level is applied to all categories.
The following categories are available:
Algorithms
Console
Core Services
Input/Output
Performance
Remote Service
Scripts
Tasks.
The following log levels are available: TRACE, DETAILS, INFO, ERROR or NONE.
By default, loglevel=ERROR.
For example:
ugene --log-level=NONE
ugene --log-level="Tasks=DETAILS, Console=DETAILS"
--log-format="<format_string>"
Specifies the format of a log line.
Use the following notations: L - level, C - category, YYYY or YY - year, MM - month, dd - day, hh - hour, mm - minutes, ss - seconds,
zzz - milliseconds.
By default, logformat=[L][hh:mm].
--license
Shows license information.
--lang=language_code
Specifies the language to use (e.g. for the log output). The following values are available:
CS (Czech)
268
ugene --session-db=D:/session.ugenedb
--version
Shows version information.
--tmp-dir=<path_to_file>
Path to teporary folder.
--ini-file=<path_to_file>
Loads configuration from the specified .ini file. By default the UGENE.ini file is used.
--genome-aligner
UGENE Genome Aligner is an efficient and fast tool for short read alignment. It has 2 work modes: build index and align short reads
(default mode).
If there is no index available for reference sequence it will be built on the fly.
Usage: ugene --genome-aligner { --option[=argument] }
The following options are available:
--build-index Use this flag to only build index for reference sequence.
--reference Path to reference genome sequence
--short-reads Path to short-reads data in FASTA or FASTQ format
--index Path to prebuilt index (base file name or with .idx extension). If not set, index is searched in system temporary directory. If
--build-index option is applied, index will be saved to specified
path.
--result Path to output alignment in UGENEDB or SAM format (see --sam)
--memsize Memory size (in Mbs) reserved for short-reads. The bigger value the faster algorithm works. Default value depends on
available system memory.
--ref-size Index fragmentation size (in Mbs). Small fragments better fit into RAM, allowing to load more short reads. Default value is 10.
--n-mis Absolute amount of allowed mismatches per every short-read (mutually exclusive with --pt-mis). Default value is 0.
--pt-mis Percentage amount of allowed mismatches per every short-read (mutually exclusive with --n-mis). Default value is 0.
--rev-comp Use both the read and its reverse complement during the aligning.
--best Report only about best alignments (in terms of mismatches).
--omit-size Omit reads with qualities lower than the specified value. Reads which have no qualities are not omitted. Default value is 0.
--sam Output aligned reads in SAM format. Default value is false.
For example:
269
270
Example:
Converting MSA
Task Name: convert-msa
Converts a multiple sequence alignment file from one format to another.
Parameters:
in input multiple sequence alignment file. [String, Required]
out name of the output file. [String, Required]
format format of the output file. [String, Optional]
The following values are available:
clustal (default)
fasta
mega
msf
nexus
phylip-interleaved
phylip-sequential
stockholm
Example:
Extracting Sequence
Task Name: extract-sequence
Extracts annotated regions from an input sequence.
Parameters:
in semicolon-separated list of input files. [String, Required]
out output file. [String, Required]
annotation-names list of annotations names which will be accepted or filtered. [String, Required]
sccumulate - accumulate all incoming data in one file or create separate files for each input. In the latter case, an incremental
numerical suffix is added to the file name (using 'True' by default). [Boolean]
accept-or-filter if set to true, accepts only the specified annotations, if set to false, accepts all annotations except the specified ones.
[Boolean, Optional]
complement complements the annotated regions if the corresponding annotation is located on the complement strand. [Boolean,
Optional]
extend-left extends the resulting regions to the left for the specified number of base symbols. [Number, Optional]
extend-right extends the resulting regions to the right for the specified number of base symbols. [Number, Optional]
gap-length inserts a gap of the specified length between the merged annotations.
transl - translates the annotated regions. [Boolean, Optional]
Example:
271
Finding ORFs
Task Name: find-orfs
Searches for Open Reading Frames (ORFs) in nucleotide sequences and saves the regions found as annotations.
Parameters:
in semicolon-separated list of input files. [String, Required]
out output file with the annotations. [String, Required]
name name of the annotated regions. [String, Optional, Default: ORF]
min-length ignores ORFs shorter than the specified length. [String, Optional, Default: 100]
require-stop-codon ignores boundary ORFs that last beyond the search region (i.e. have no stop codon within the range). [Boolean,
Optional, Default: false]
require-init-codon allows ORFs starting with any codon other than terminator. [Boolean, Optional, Default: true]
allow-alternative-codons allows ORFs starting with alternative initiation codons, accordingly to the current translation table.
[Boolean, Optional, Default: false]
Example:
Finding Repeats
Task Name: find-repeats
Searches for repeats in sequences and saves the regions found as annotations.
Parameters:
in semicolon-separated list of input files. [String, Required]
out output file with the annotations. [String, Required]
name name of the annotated regions. [String, Optional, Default: repeat_unit]
min-length minimum length of the repeats. [Number, Optional, Default: 5]
identity percent identity between repeats. [Number, Optional, Default: 100]
min-distance minimum distance between the repeats. [Number, Optional, Default: 0]
max-distance maximum distance between the repeats. [Number, Optional, Default: 5000]
inverted if true, searches for the inverted repeats. [Boolean, Optional, Default: false]
Example:
272
273
274
275
datasets]
default)
[Number]
compromise between the
default)
[Number]
by default) [String]
default)
[Boolean]
default)
[String]
Example:
ugene align
276
ugene align-clustalw
ugene align-clustalw
277
278
ugene align-tcoffee
Building PFM
Task Name: pfm-build
Builds a position frequency matrix from a multiple sequence alignment file.
Parameters:
in semicolon-separated list of input MSA files. [String, Required]
out output file. [String, Required]
type type of the matrix. [Boolean, Optional, Default: false]
The following values are available:
true (dinucleic type)
false (mononucleic type)
Dinucleic matrices are more detailed, while mononucleic ones are more useful for small input data sets.
Example:
279
Building PWM
Task Name: pwm-build
Builds a position weight matrix from a multiple sequence alignment file.
Parameters:
in semicolon-separated list of input MSA files. [String, Required]
out output file. [String, Required]
type type of the matrix. [Boolean, Optional, Default: false]
The following values are available:
true (dinucleic type)
false (mononucleic type)
Dinucleic matrices are more detailed, while mononucleic ones are more useful for small input data sets.
algo algorithm used to build the matrix. [String, Optional, Default: Berg and von Hippel]
The following values are available:
Berg and von Hippel
Log-odds
Match
NLG
Example:
280
281
Alias
genbank
genbank-protein
pdb
SwissProt
swissprot
Uniprot
uniprot
Parameters:
db database alias to read from. [String, Required]
id semicolon-separated list of resource IDs in the database. [String, Required]
save-dir directory to store sequence files loaded from the database. [String, Optional]
Example:
282
ugene das_annotation
Gene-by-Gene Report
Task Name: gene-by-gene
Suppose you have genomes and you want to characterize them. One of the ways to do that is to build a table of what genes are in each
genome and what are not there.
1. Create a local BLAST db of your genome sequence/contigs. One db per one genome.
2. Create a file with sequences of genes you what to explore. This file will be the input file for the scheme
3. Setup location and name of BLAST db you created for the first genome.
4. Setup output files: report location and output file with annotated (with BLAST) sequence. You might want to delete the "Write
Sequence" element if you do not need output sequences.
5. Run the scheme
5*. Run the scheme on the same input and output files changing BLAST db for each genome that you have.
As the result you will get the report file. With "Yes" and "No" field. "Yes" answer means that the gene is in the genome. "No" answer MIGHT
mean that there is no gene in the genome. It is a good idea to analyze al
l the "No" sequences using annotated files. Just open a file and find a sequence with a name of a gene that has "No" result.
Parameters:
in - Input sequence file [Url datasets]
final-name - Annotation name used to compare genes and reference genomes (using 'blast_result' by dafault) [String]
exist-file - If a target report already exists you should specify how to handle that. Merge two table in one. Overwrite or Rename existing
file (using 'Merge' by default) [String]
ident - Identity between gene sequence length and annotation length in per cent. BLAST identity (if specified) is checked after (using
'90.0' percents by default) [Number]
out - Output report file [String]
blast-out - Location of BLAST output file [String]
search-type - Type of BLAST searches (using 'blastn' by default) [String]
db-name - Name of BLAST DB [String]
blast-path - Path to BLAST DB [String]
expected-value - This setting specifies the statistical significance threshold for reporting matches against database sequences (using
'10.0' by default) [Number]
gapped-aln - Perform gapped alignment (using 'use' by default) [Boolean]
blast-name - Name for annotations (using 'blast_result' by default) [String]
283
tmpdir - Directory for temporary files (using UGENE temporary directory by default) [String]
toolpath - External tool path (using the path specified in UGENE by default) [String]
out-type - Type of BLAST output file (using 'XML (-m 7)' by default) [String]
Example:
Variants Calling
Task Name: snp
Call variants for an input assembly and a reference sequence using SAMtools mpileup and bcftool
Parameters:
bam - Input sorted BAM file(s) [Url datasets]
ref - Input reference sequence [Url datasets]
wout - Out file with variations [String]
bN - A/C/G/T only [Boolean]
bI - List of sites [String]
ml - BED or position list file [String]
bg - Per-sample genotypes [Boolean]
mC - Mapping quality downgrading coefficient [Number]
bT - Pair/trio calling [String]
mB - Disable BAQ computation [Boolean]
me - Gap extension error [Number]
mE - Extended BAQ computation [Boolean]
bF - Indicate PL [Boolean]
284
285
286
APPENDIXES
Appendix A. Supported File Formats
Specific File Formats
UGENE Native File Formats
Other File Formats
File extension
Read
Write
Comment
ABIF
A chromatogram
format.
file
*.ace,
Bairoch
*.bairoch
Alignment
BAM
*.bam
ClustalW
*.aln
A multiple sequence
alignments (MSA) file
format.
See also:
Editor
EBWT
*.ebwt
Assembly
Alignment
EMBL
Sequence
287
FASTA
FASTQ
*.fastq
Sequence
Genbank
GFF
*.gff
*.hmm
Sequence
The
Gene
Finding
Format (GFF) format is
used to store features
and annotations.
See also:
View
HMM
Sequence
Sequence
MMDB
*.prt
MSF
*.msf
A multiple sequence
alignments file format.
See also:
Editor
Mega
*.meg, *.meg.gz
A multiple sequence
alignments file format.
See also:
Editor
Newick
*.nwk, *.newick
Alignment
Alignment
288
Nexus
*.nex *.nxs
PDB
*.pdb
pDRAW32
*.pdw
PFM
*.pfm
Sequence
Phylip
*.phy
PWM
*.pwm
Raw
*.seq
SAM
*.sam
Sequence
The
Sequence
Alignment/Map (SAM)
format is a generic
alignment format for
storing read alignments
against
reference
sequences.
See also: Assembly
Browser, Bowtie, UGEN
E Genome Aligner
SCF
*.scf
It
is
a
Standard
Chromatogram Format.
See also: Chromatogra
m Viewer
SITECON
*.sitecon
289
Stockholm
*.sto
A multiple sequence
alignments file format.
See also:
Editor
Swiss-Prot
*.txt *.sw
An annotated protein
sequence in format of
t h e
UniProtKB/Swiss-Prot
database.
See also:
View
VCF
*.vcf
Alignment
Sequence
File extension
Read
Write
Comment
Dotplot
*.dpt
Stores a dotplot of a
sequence.
See also: Dotplot
*.ugenedb
*.srfa, *.srfasta
A multiple sequence
alignments file format.
See also:
Editor
UGENE Workflow
Designer schema
*.uwl
*.uql
Alignment
Human-readable format
to store UGENE Workflo
w Designer schemas.
See also:
Designer
UGENE Query
Designer schema
Import
Workflow
Human-readable format
to store UGENE Query
Designer schemas.
See
also:
Designer
Query
290
*.etc
Format
for
storing
workflow elements that
can launch an external
command line tool.
See also:
Designer
Workflow
Comment
*.csv
*.html
*.txt
291
Tutorials
Using BioMart with UGENE
Environment requirements
Installing UGENE extension on Mozilla Firefox
Opening data found using BioMart in UGENE
Opening BioMart data in UGENE by ID
Opening selected data in UGENE
Environment requirements
Currently UGENE extension is available for Mozilla Firefox web browser only. Please make sure to launch UGENE before using the
extension!
Follow the instructions below to install the extension.
292
In the browse dialog select ugene.xpi file that you can find in the Firefox directory of the UGENE Web Browsers Extensions Package that
there is on the Download page.
293
Click, for example, on the Proceed to Bio Portal link. The following page will appear:
Notice that an example Ensembl ID below the search bar is highlighted (it has a light blue background).
Current version of the UGENE extension allows detecting the following types of identification numbers:
1. Ensemble Gene ID
2. Ensembl Protein ID
3. PDB ID
294
Right-click on the ID and select Open in UGENE item in the context menu:
295
296