To visualize the alignment simply load your file by clicking
Upload alignment
file.
Make sure that the alignment consists of at least two
sequences.
It is recommended to replace spaces in sequence names with
the underscore (_),
as the software used for processing the files, i.e.
Biopython, tends to truncate names with whitespaces.
Sequence Flow accepts 5 MSA formats: fasta, msf, clustal, phylip, and
stockholm,
and allows the following file extensions: .msf, .fasta,
.fna, .ffn, .faa, .frn, .fa, .clustal, .aln, .phy, .phylip, .stk, .sto, .sth, .stockholm.
After the file is loaded, select the alignment type.
It is important to choose the right type because
it implies the order of the nodes in the graph and cannot be
changed later.
Then, click on the Visualize
button.
It is possible to attach a file containing a tree in the
Newick format to the visualization.
To do so, check the Include tree file checkbox, and
then
load your file by clicking Upload Newick file.
You can also explore available examples by clicking Show example
alignment.
Your Multiple Sequence Alignment is parsed to the FASTA
format.
It is then represented as a graph inspired by the Partial
Order Alignment model.
The visualization is an interactive Sankey diagram.
Read more about this concept in the ABOUT
THE PROJECT
section.
The app provides many features, that will help you explore
the alignment, including:
Note: Although the application lets you select any range for alignment, we recommend keeping the size within a reasonable limit. Large ranges can reduce readability, slow down browser performance, and may cause rendering issues with smaller elements. Displaying more than several hundred columns simultaneously can impact usability.
Partial Order Alignment (POA) is an acyclic directed graph, whose nodes are residues of the aligned
sequences, and the links connect successive residues in these sequences. In addition, the set of nodes is
divided into clusters, which are equivalent to the columns in the standard alignment model, with identical
symbols from aligned sequences in each cluster combined into a single vertex. As a result, the aligned
sequences are represented by paths in the graph, and the common nodes of these paths are their identical
residues. The POA model proved useful in several applications (e.g. sequencing reads assembly and
pangenome structure exploration).
Sankey diagram is a type of flow chart first used in 1898 to depict the energy efficiency of a steam engine, in which the width of the links is proportional to the flow rate. Sankey diagrams provide a more intuitive alignment visualization than commonly used graphical representations of nodes and links.
Sequence Flow is a tools which allows to browse interactive visualization of POAs as Sankey diagrams. It accepts multiple sequence alignment files in the most common formats, including fasta, clustal, msf, phylip, and stockholm. Sequence Flow provides integrated visualization of classic alignment, POA and phylogenetic tree. For the purpose of visualizing alignments, Sequence Flow uses SanKEY.js - a JavaScript library we developed, tailored for huge Sankey diagrams. Check out the MANUAL section for more information about the Sequence Flow features for interactive alignment exploration.
The service provides a couple of alignment examples, that can be interactively explored:
The application's source code can be found in the official GitHub repository.
We appreciate every feedback. If you have experienced any issues with the app or have other suggestions, please write to: dojer@mimuw.edu.pl, or open a new issue in the repository.
Project created by Krzysztof Zdąbłasz under the supervision of Norbert Dojer, supported by Anna Lisiecka.
This work was supported by the IDUB grant of the Polish Ministry of Science and Higher Education no 01/IDUB/2019/04.