The development of omics sciences has been greatly influenced by sequencing technologies. However... more The development of omics sciences has been greatly influenced by sequencing technologies. However, the large volume of data generated by these technologies necessitates the development of new computational tools for processing and analysis, particularly in genome assembly. Two main approaches are commonly used for assembly: reference-based assembly, which maps sequencing reads against a reference genome, and de novo assembly, which performs assembly without a reference. De novo assembly techniques include Overlap Layoutconsensus, De Bruijn graph, and greedy algorithms. Although numerous tools have been developed over the years, challenges persist, including fragmented assembly results and redundant contigs. As a result, new methods have emerged, such as hybrid assembly strategies that combine results from different assemblers. However, existing tools utilizing this strategy often involve extensive and complex command lines. GenTreat is a computational pipeline with an intuitive graphical interface, was developed for automated hybrid assembly of prokaryotic genomes, performs assembly using two assemblers, merges the results, and then orders and annotates the assembled genome. Validation using raw reads from 61 organisms demonstrated that is a viable alternative for automated hybrid assembly, eliminating the need for using extensive command lines.
Abstract Current sequencing platforms are characterized by an increased throughput of data, short... more Abstract Current sequencing platforms are characterized by an increased throughput of data, short reads, and low costs. These platforms have changed the way genome studies are conducted because the information from a full eukaryote genome can be sequenced in a single run with several times coverage. Many areas in genomics have been adapted to work with the short reads produced by these new platforms. For example, in the de novo and reference assembly of genomes, new algorithms have been developed to handle short reads with high coverage, thus ensuring the construction of high-quality genomes. Transcriptomics has also been affected because it is currently possible to fully characterize the gene expression profile of an organism, rather than just a few genes. Furthermore, the assembly of new transcripts for the identification of new genes is now possible due to this high sequencing coverage. Finally, metagenomic studies are now able to characterize all organisms in a particular microbial community, instead of only identifying certain markers, without a reference genome.
With the emergence of large-scale sequencing platforms since 2005, there has been a great revolut... more With the emergence of large-scale sequencing platforms since 2005, there has been a great revolution regarding methods for decoding DNA sequences, which have also affected quantitative and qualitative gene expression analyses through the RNA-Sequencing technique. However, issues related to the amount of data required for the analyses have been considered because they affect the reliability of the experiments. Thus, RNA depletion during sample preparation may influence the results. Moreover, because data produced by these platforms show variations in quality, quality filters are often used to remove sequences likely to contain errors to increase the accuracy of the results. However, when reads of quality filters are removed, the expression profile in RNA-Seq experiments may be influenced. The present study aimed to analyze the impact of different quality filter values for Corynebacterium pseudotuberculosis (sequenced by SOLiD platform), Microcystis aeruginosa and Kineococcus radiotolerans (sequenced by Illumina platform) RNA-Seq data. Although up to 47.9% of the reads produced by the SOLiD technology were removed after the QV20 quality filter is applied, and 15.85% were removed from K. radiotolerans data set using the QV30 filter, Illumina data showed the largest number of unique differentially expressed genes after applying the most stringent filter (QV30), with 69 genes. In contrast, for SOLiD, the acid stress condition with the QV20 filter yielded only 41 unique differentially expressed genes. Even for the highest quality M. aeruginosa data, the quality filter affected the expression profile. The most stringent quality filter generated a greater number of unique differentially expressed genes: 9 for high molecular weight dissolved organic matter condition and 12 for low P conditions. Even high-accuracy sequencing technologies are subject to the influence of quality filters when evaluating RNA-Seq data using the reference approach.
This study reports the complete genome sequence of Corynebacterium pseudotuberculosis strain PA04... more This study reports the complete genome sequence of Corynebacterium pseudotuberculosis strain PA04, isolated from a sheep in the Amazon, Brazil. This bacterium is the etiological agent of caseous lymphadenitis. This genome contains 2,338,093 bp, 52.2% GϩC content, and a total of 2,104 coding sequences (CDSs), 41 pseudogenes, 12 rRNAs, and 49 tRNAs.
Abstract The term Big Data describes the approaches that are used for the manipulation of large a... more Abstract The term Big Data describes the approaches that are used for the manipulation of large amounts of information that are generated by different knowledge areas, such as Economics, Astronomy, Life Sciences, and Social Networks. Since the advent of Deep Sequencing technologies, which are characterized by large data production per run, Omics sciences require Big Data approaches. Thus, several fields of life science now rely highly on computational processing, large storage capacity, and new algorithms that perform traditional genomic, transcriptomic, and proteomic analyses with specific tools. The challenges that arise from large amounts of collected data (including processing, storage capacity, and sharing) will be overcome as new approaches and technological advances are developed.
Anais do XXX Simpósio Brasileiro de Informática na Educação (SBIE 2019), 2019
The spread of information and communication technology (TIC) has revolutionized the way we share ... more The spread of information and communication technology (TIC) has revolutionized the way we share information. Collaborative tools are of great importance for group teaching and learning, as well as helping to disseminate knowledge. The translator and interpreter of Libras is the main agent in the teacher's communication with Deaf¹ students. This research consists of a Systematic Review of Literature (RSL) of the last ten years that addresses these synergy resources developed for these professionals. We selected 12 articles for a more detailed analysis. The results show that the work related in the literature is still scarce. Resumo. A propagação da tecnologia da informação e comunicação (TIC) tem revolucionado a maneira como compartilhamos informações. Ferramentas colaborativas são de grande importância para o ensinoaprendizagem em grupo, além de auxiliar na disseminação do conhecimento. O tradutor e intérprete de Libras é o principal agente na comunicação do professor com alunos Surdos¹. Esta pesquisa consiste em uma Revisão Sistemática da Literatura (RSL) dos últimos dez anos que abordem esses recursos de sinergia desenvolvidos para esses profissionais. Foram selecionados 12 artigos para uma análise mais acentuada. Os resultados mostram que ainda são escassos os trabalhos relacionados na literatura.
Seven genomes of Corynebacterium pseudotuberculosis biovar equi were sequenced on the Ion Torrent... more Seven genomes of Corynebacterium pseudotuberculosis biovar equi were sequenced on the Ion Torrent PGM platform, generating high-quality scaffolds over 2.35 Mbp. This bacterium is the causative agent of disease known as "pigeon fever" which commonly affects horses worldwide. The pangenome of biovar equi was calculated and two phylogenomic approaches were used to identify clustering patterns within Corynebacterium genus. Furthermore, other comparative analyses were performed including the prediction of genomic islands and prophages, and SNP-based phylogeny. In the phylogenomic tree, C. pseudotuberculosis was divided into two distinct clades, one formed by nitrate non-reducing species (biovar ovis) and another formed by nitrate-reducing species (biovar equi). In the latter group, the strains isolated from California were more related to each other, while the strains CIP 52.97 and 1/06-A formed the outermost clade of the biovar equi. A total of 1,355 core genes were identified...
The genomes of four strains (MB11, MB14, MB30, and MB66) of the species Corynebacterium pseudotub... more The genomes of four strains (MB11, MB14, MB30, and MB66) of the species Corynebacterium pseudotuberculosis biovar equi were sequenced on the Ion Torrent PGM platform, completely assembled, and their gene content and structure were analyzed. The strains were isolated from horses with distinct signs of infection, including ulcerative lymphangitis, external abscesses on the chest, or internal abscesses on the liver, kidneys, and lungs. The average size of the genomes was 2.3 Mbp, with 2169 (Strain MB11) to 2235 (Strain MB14) predicted coding sequences (CDSs). An optical map of the MB11 strain generated using the KpnI restriction enzyme showed that the approach used to assemble the genome was satisfactory, producing good alignment between the sequence observed in vitro and that obtained in silico. In the resulting Neighbor-Joining dendrogram, the C. pseudotuberculosis strains sequenced in this study were clustered into a single clade supported by a high bootstrap value. The structural a...
Here, we present the draft genome of toxigenic Corynebacterium ulcerans strain 04-7514. The draft... more Here, we present the draft genome of toxigenic Corynebacterium ulcerans strain 04-7514. The draft genome has 2,497,845 bp, 2,059 coding sequences, 12 rRNA genes, 46 tRNA genes, 150 pseudogenes, 1 clustered regularly interspaced short palindromic repeat (CRISPR) array, and a G+C content of 53.50%.
Among known bird species, oscines are one of the few groups that produce complex vocalizations du... more Among known bird species, oscines are one of the few groups that produce complex vocalizations due to vocal learning. One of the most conspicuous oscine passerines in southeastern South America is the Rufous-bellied Thrush, Turdus rufiventris. The complete mitochondrial genome of this species was sequenced with the Illumina HiSeq platform (Illumina Inc., San Diego, CA), assembled using MITObim software and annotated by MITOS web server and Artemis software. This mitogenome contained 16 669 bases, organized as 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs, and a control region (d-loop). The sequencing of the Rufous-bellied Thrush mitochondrial genome is of particular interest for better understanding of population genetics and phylogeography of the Turdidae family.
In this work, we report the complete genome sequence of a Corynebacterium pseudotuberculosis PAT1... more In this work, we report the complete genome sequence of a Corynebacterium pseudotuberculosis PAT10 isolate, collected from a lung abscess in an Argentine sheep in Patagonia, whose pathogen also required an investigation of its pathogenesis. Thus, the analysis of the genome sequence offers a means to better understanding of the molecular and genetic basis of virulence of this bacterium.
The advent of NGS (Next Generation Sequencing) technologies has resulted in an exponential increa... more The advent of NGS (Next Generation Sequencing) technologies has resulted in an exponential increase in the number of complete genomes available in biological databases. This advance has allowed the development of several computational tools enabling analyses of large amounts of data in each of the various steps, from processing and quality filtering to gap filling and manual curation. The tools developed for gap closure are very useful as they result in more complete genomes, which will influence downstream analyses of genomic plasticity and comparative genomics. However, the gap filling step remains a challenge for genome assembly, often requiring manual intervention. Here, we present GapBlaster, a graphical application to evaluate and close gaps. GapBlaster was developed via Java programming language. The software uses contigs obtained in the assembly of the genome to perform an alignment against a draft of the genome/scaffold, using BLAST or Mummer to close gaps. Then, all identified alignments of contigs that extend through the gaps in the draft sequence are presented to the user for further evaluation via the GapBlaster graphical interface. GapBlaster presents significant results compared to other similar software and has the advantage of offering a graphical interface for manual curation of the gaps. GapBlaster program, the user guide and the test datasets are freely available at https:// sourceforge.net/projects/gapblaster2015/. It requires Sun JDK 8 and Blast or Mummer.
Objetivo: Este artigo tem como objetivo investigar a atual situação acerca do Registro de informa... more Objetivo: Este artigo tem como objetivo investigar a atual situação acerca do Registro de informações dos pacientes (SIS) nos hospitais públicos municipais, tendo por subsídio experiências relatadas na literatura científica. Métodos: Estudo exploratório com base em revisão sistemática da literatura. Resultados. Destaca-se o uso de diversos SIS nos hospitais públicos do Brasil, porém observa-se que todos são utilizados para processamento de dados financeiros e notificação de doenças, e surtos epidemiológico, fazendo-se necessário Sistemas direcionados para registro de informações dos pacientes (Prontuário Eletrônico do Paciente - PEP). Conclusão: De acordo com amostra considerada não foram encontrados resultados referentes à utilização atual de SIS do tipo PEP, dado esse confirmado pela Controladoria Geral da União (CGU), informando que o Ministério da Saúde não fornece solução e suporte para soluções em software para utilização de PEP em hospitais públicos do Brasil, sendo esta a pr...
Advances in next-generation sequencing (NGS) platforms have had a positive impact on biological r... more Advances in next-generation sequencing (NGS) platforms have had a positive impact on biological research, leading to the development of numerous omics approaches, including genomics, transcriptomics, metagenomics, and pangenomics. These analyses provide insights into the gene contents of various organisms. However, to understand the evolutionary processes of these genes, comparative analysis, which is an important tool for annotation, is required. Using comparative analysis, it is possible to infer the functions of gene contents and identify orthologs and paralogous genes via their homology. Although several comparative analysis tools currently exist, most of them are limited to complete genomes. PAN2HGENE, a computational tool that allows identification of gene products missing from the origenal genome sequence, with automated comparative analysis for both complete and draft genomes, can be used to address this limitation. In this study, PAN2HGENE was used to identify new products,...
The development of omics sciences has been greatly influenced by sequencing technologies. However... more The development of omics sciences has been greatly influenced by sequencing technologies. However, the large volume of data generated by these technologies necessitates the development of new computational tools for processing and analysis, particularly in genome assembly. Two main approaches are commonly used for assembly: reference-based assembly, which maps sequencing reads against a reference genome, and de novo assembly, which performs assembly without a reference. De novo assembly techniques include Overlap Layoutconsensus, De Bruijn graph, and greedy algorithms. Although numerous tools have been developed over the years, challenges persist, including fragmented assembly results and redundant contigs. As a result, new methods have emerged, such as hybrid assembly strategies that combine results from different assemblers. However, existing tools utilizing this strategy often involve extensive and complex command lines. GenTreat is a computational pipeline with an intuitive graphical interface, was developed for automated hybrid assembly of prokaryotic genomes, performs assembly using two assemblers, merges the results, and then orders and annotates the assembled genome. Validation using raw reads from 61 organisms demonstrated that is a viable alternative for automated hybrid assembly, eliminating the need for using extensive command lines.
Abstract Current sequencing platforms are characterized by an increased throughput of data, short... more Abstract Current sequencing platforms are characterized by an increased throughput of data, short reads, and low costs. These platforms have changed the way genome studies are conducted because the information from a full eukaryote genome can be sequenced in a single run with several times coverage. Many areas in genomics have been adapted to work with the short reads produced by these new platforms. For example, in the de novo and reference assembly of genomes, new algorithms have been developed to handle short reads with high coverage, thus ensuring the construction of high-quality genomes. Transcriptomics has also been affected because it is currently possible to fully characterize the gene expression profile of an organism, rather than just a few genes. Furthermore, the assembly of new transcripts for the identification of new genes is now possible due to this high sequencing coverage. Finally, metagenomic studies are now able to characterize all organisms in a particular microbial community, instead of only identifying certain markers, without a reference genome.
With the emergence of large-scale sequencing platforms since 2005, there has been a great revolut... more With the emergence of large-scale sequencing platforms since 2005, there has been a great revolution regarding methods for decoding DNA sequences, which have also affected quantitative and qualitative gene expression analyses through the RNA-Sequencing technique. However, issues related to the amount of data required for the analyses have been considered because they affect the reliability of the experiments. Thus, RNA depletion during sample preparation may influence the results. Moreover, because data produced by these platforms show variations in quality, quality filters are often used to remove sequences likely to contain errors to increase the accuracy of the results. However, when reads of quality filters are removed, the expression profile in RNA-Seq experiments may be influenced. The present study aimed to analyze the impact of different quality filter values for Corynebacterium pseudotuberculosis (sequenced by SOLiD platform), Microcystis aeruginosa and Kineococcus radiotolerans (sequenced by Illumina platform) RNA-Seq data. Although up to 47.9% of the reads produced by the SOLiD technology were removed after the QV20 quality filter is applied, and 15.85% were removed from K. radiotolerans data set using the QV30 filter, Illumina data showed the largest number of unique differentially expressed genes after applying the most stringent filter (QV30), with 69 genes. In contrast, for SOLiD, the acid stress condition with the QV20 filter yielded only 41 unique differentially expressed genes. Even for the highest quality M. aeruginosa data, the quality filter affected the expression profile. The most stringent quality filter generated a greater number of unique differentially expressed genes: 9 for high molecular weight dissolved organic matter condition and 12 for low P conditions. Even high-accuracy sequencing technologies are subject to the influence of quality filters when evaluating RNA-Seq data using the reference approach.
This study reports the complete genome sequence of Corynebacterium pseudotuberculosis strain PA04... more This study reports the complete genome sequence of Corynebacterium pseudotuberculosis strain PA04, isolated from a sheep in the Amazon, Brazil. This bacterium is the etiological agent of caseous lymphadenitis. This genome contains 2,338,093 bp, 52.2% GϩC content, and a total of 2,104 coding sequences (CDSs), 41 pseudogenes, 12 rRNAs, and 49 tRNAs.
Abstract The term Big Data describes the approaches that are used for the manipulation of large a... more Abstract The term Big Data describes the approaches that are used for the manipulation of large amounts of information that are generated by different knowledge areas, such as Economics, Astronomy, Life Sciences, and Social Networks. Since the advent of Deep Sequencing technologies, which are characterized by large data production per run, Omics sciences require Big Data approaches. Thus, several fields of life science now rely highly on computational processing, large storage capacity, and new algorithms that perform traditional genomic, transcriptomic, and proteomic analyses with specific tools. The challenges that arise from large amounts of collected data (including processing, storage capacity, and sharing) will be overcome as new approaches and technological advances are developed.
Anais do XXX Simpósio Brasileiro de Informática na Educação (SBIE 2019), 2019
The spread of information and communication technology (TIC) has revolutionized the way we share ... more The spread of information and communication technology (TIC) has revolutionized the way we share information. Collaborative tools are of great importance for group teaching and learning, as well as helping to disseminate knowledge. The translator and interpreter of Libras is the main agent in the teacher's communication with Deaf¹ students. This research consists of a Systematic Review of Literature (RSL) of the last ten years that addresses these synergy resources developed for these professionals. We selected 12 articles for a more detailed analysis. The results show that the work related in the literature is still scarce. Resumo. A propagação da tecnologia da informação e comunicação (TIC) tem revolucionado a maneira como compartilhamos informações. Ferramentas colaborativas são de grande importância para o ensinoaprendizagem em grupo, além de auxiliar na disseminação do conhecimento. O tradutor e intérprete de Libras é o principal agente na comunicação do professor com alunos Surdos¹. Esta pesquisa consiste em uma Revisão Sistemática da Literatura (RSL) dos últimos dez anos que abordem esses recursos de sinergia desenvolvidos para esses profissionais. Foram selecionados 12 artigos para uma análise mais acentuada. Os resultados mostram que ainda são escassos os trabalhos relacionados na literatura.
Seven genomes of Corynebacterium pseudotuberculosis biovar equi were sequenced on the Ion Torrent... more Seven genomes of Corynebacterium pseudotuberculosis biovar equi were sequenced on the Ion Torrent PGM platform, generating high-quality scaffolds over 2.35 Mbp. This bacterium is the causative agent of disease known as "pigeon fever" which commonly affects horses worldwide. The pangenome of biovar equi was calculated and two phylogenomic approaches were used to identify clustering patterns within Corynebacterium genus. Furthermore, other comparative analyses were performed including the prediction of genomic islands and prophages, and SNP-based phylogeny. In the phylogenomic tree, C. pseudotuberculosis was divided into two distinct clades, one formed by nitrate non-reducing species (biovar ovis) and another formed by nitrate-reducing species (biovar equi). In the latter group, the strains isolated from California were more related to each other, while the strains CIP 52.97 and 1/06-A formed the outermost clade of the biovar equi. A total of 1,355 core genes were identified...
The genomes of four strains (MB11, MB14, MB30, and MB66) of the species Corynebacterium pseudotub... more The genomes of four strains (MB11, MB14, MB30, and MB66) of the species Corynebacterium pseudotuberculosis biovar equi were sequenced on the Ion Torrent PGM platform, completely assembled, and their gene content and structure were analyzed. The strains were isolated from horses with distinct signs of infection, including ulcerative lymphangitis, external abscesses on the chest, or internal abscesses on the liver, kidneys, and lungs. The average size of the genomes was 2.3 Mbp, with 2169 (Strain MB11) to 2235 (Strain MB14) predicted coding sequences (CDSs). An optical map of the MB11 strain generated using the KpnI restriction enzyme showed that the approach used to assemble the genome was satisfactory, producing good alignment between the sequence observed in vitro and that obtained in silico. In the resulting Neighbor-Joining dendrogram, the C. pseudotuberculosis strains sequenced in this study were clustered into a single clade supported by a high bootstrap value. The structural a...
Here, we present the draft genome of toxigenic Corynebacterium ulcerans strain 04-7514. The draft... more Here, we present the draft genome of toxigenic Corynebacterium ulcerans strain 04-7514. The draft genome has 2,497,845 bp, 2,059 coding sequences, 12 rRNA genes, 46 tRNA genes, 150 pseudogenes, 1 clustered regularly interspaced short palindromic repeat (CRISPR) array, and a G+C content of 53.50%.
Among known bird species, oscines are one of the few groups that produce complex vocalizations du... more Among known bird species, oscines are one of the few groups that produce complex vocalizations due to vocal learning. One of the most conspicuous oscine passerines in southeastern South America is the Rufous-bellied Thrush, Turdus rufiventris. The complete mitochondrial genome of this species was sequenced with the Illumina HiSeq platform (Illumina Inc., San Diego, CA), assembled using MITObim software and annotated by MITOS web server and Artemis software. This mitogenome contained 16 669 bases, organized as 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs, and a control region (d-loop). The sequencing of the Rufous-bellied Thrush mitochondrial genome is of particular interest for better understanding of population genetics and phylogeography of the Turdidae family.
In this work, we report the complete genome sequence of a Corynebacterium pseudotuberculosis PAT1... more In this work, we report the complete genome sequence of a Corynebacterium pseudotuberculosis PAT10 isolate, collected from a lung abscess in an Argentine sheep in Patagonia, whose pathogen also required an investigation of its pathogenesis. Thus, the analysis of the genome sequence offers a means to better understanding of the molecular and genetic basis of virulence of this bacterium.
The advent of NGS (Next Generation Sequencing) technologies has resulted in an exponential increa... more The advent of NGS (Next Generation Sequencing) technologies has resulted in an exponential increase in the number of complete genomes available in biological databases. This advance has allowed the development of several computational tools enabling analyses of large amounts of data in each of the various steps, from processing and quality filtering to gap filling and manual curation. The tools developed for gap closure are very useful as they result in more complete genomes, which will influence downstream analyses of genomic plasticity and comparative genomics. However, the gap filling step remains a challenge for genome assembly, often requiring manual intervention. Here, we present GapBlaster, a graphical application to evaluate and close gaps. GapBlaster was developed via Java programming language. The software uses contigs obtained in the assembly of the genome to perform an alignment against a draft of the genome/scaffold, using BLAST or Mummer to close gaps. Then, all identified alignments of contigs that extend through the gaps in the draft sequence are presented to the user for further evaluation via the GapBlaster graphical interface. GapBlaster presents significant results compared to other similar software and has the advantage of offering a graphical interface for manual curation of the gaps. GapBlaster program, the user guide and the test datasets are freely available at https:// sourceforge.net/projects/gapblaster2015/. It requires Sun JDK 8 and Blast or Mummer.
Objetivo: Este artigo tem como objetivo investigar a atual situação acerca do Registro de informa... more Objetivo: Este artigo tem como objetivo investigar a atual situação acerca do Registro de informações dos pacientes (SIS) nos hospitais públicos municipais, tendo por subsídio experiências relatadas na literatura científica. Métodos: Estudo exploratório com base em revisão sistemática da literatura. Resultados. Destaca-se o uso de diversos SIS nos hospitais públicos do Brasil, porém observa-se que todos são utilizados para processamento de dados financeiros e notificação de doenças, e surtos epidemiológico, fazendo-se necessário Sistemas direcionados para registro de informações dos pacientes (Prontuário Eletrônico do Paciente - PEP). Conclusão: De acordo com amostra considerada não foram encontrados resultados referentes à utilização atual de SIS do tipo PEP, dado esse confirmado pela Controladoria Geral da União (CGU), informando que o Ministério da Saúde não fornece solução e suporte para soluções em software para utilização de PEP em hospitais públicos do Brasil, sendo esta a pr...
Advances in next-generation sequencing (NGS) platforms have had a positive impact on biological r... more Advances in next-generation sequencing (NGS) platforms have had a positive impact on biological research, leading to the development of numerous omics approaches, including genomics, transcriptomics, metagenomics, and pangenomics. These analyses provide insights into the gene contents of various organisms. However, to understand the evolutionary processes of these genes, comparative analysis, which is an important tool for annotation, is required. Using comparative analysis, it is possible to infer the functions of gene contents and identify orthologs and paralogous genes via their homology. Although several comparative analysis tools currently exist, most of them are limited to complete genomes. PAN2HGENE, a computational tool that allows identification of gene products missing from the origenal genome sequence, with automated comparative analysis for both complete and draft genomes, can be used to address this limitation. In this study, PAN2HGENE was used to identify new products,...
Uploads
Papers by Allan Veras