Attention Conference Presenters - please review the Speaker Information Page available here.
If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.
Short Abstract: The search for extremozymes with useful functional characteristics can be burdensome using classical techniques. Recent advances in metagenomics has facilitated the exploration of extremophiles and extremozymes from previously unexplored environments. In this study, we searched for novel thermostable antibiotic resistance (AR) genes in the hot, pristine Atlantis II Deep Red Sea brine pool. These genes may allow better understanding of AR evolution and could enrich the thermophilic selection marker gene repertoire. To this end, we assembled 4,184,386 454 reads, generating 43,555 contigs from which open reading frames (ORFs) were aligned to polypeptides from the comprehensive antibiotic resistance database (CARD) using BLASTX. We selected two ORFs that showed: 1) relatively low % identity to already known AR enzymes (55 & 59 %), 2) rather short sequences (999 & 804 bp), 3) full length sequences, 4) signatures of thermal stability, and 5) the presence of active sites and other essential domains. These ORFs putatively coded for a beta lactamase and an aminoglycoside phosphotransferase. To verify these activities, genes were synthesized, cloned and expressed. Both proteins showed activity in vitro, while only the aminoglycoside phosphotransferase showed AR in vivo against both kanamycin and neomycin. Interestingly, the aminoglycoside phosphotransferase proved to be thermostable (Tm = 61.7 °C and ~40% activity after 30 min at 65 °C). On the other hand, the beta lactamase was not as thermostable; Tm = 43 °C. In conclusion, we have discovered two novel AR enzymes with potential application as thermophilic selection markers.
Short Abstract: Finding an interpretable data representation for machines is an important step in any machine-learning task. Continuous distributed vector representations have proved one of the most successful deep learning approaches for data representation. We introduce such an approach for biological sequences, called bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences). In the present work, we focused on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, structure prediction, disordered protein identification, and visualization. We trained representations of sequence segments using skip-gram neural networks on large sequence databases. Then, we used the trained neural network to encode any sequence of interest. For an evaluation purpose, we applied this approach in classification of 324,018 protein sequences belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% was obtained. In addition, we used ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences were used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using ProtVec features in support vector machine classifiers, FG-Nup sequences were distinguished from structured protein sequences found in Protein Data Bank with a 99.8% accuracy, and unstructured DisProt sequences were differentiated from structured DisProt sequences with 100.0% accuracy. Our results suggest that ProtVec can characterize protein sequences in terms of biochemical and biophysical interpretations of the underlying patterns. The related data are available at http://dx.doi.org/10.7910/DVN/JMFHTN.
Short Abstract: Microbial communities of aquatic environments change under gradients of nutrient, temperature, salinity, UV irradiance and artificial disturbances. Understand microbial adaptive processes in response to environmental conditions is a fundamental issue to microbial ecology. We present a temporal analysis (including dry and wet seasons from 2012-2013) of taxonomic and functional repertoire of microbial community from Carajás (PA/Brazil) chalcopyrite tailings dam.
We collected water (surface, 15m depth) and sediment samples. Prokaryotic biomass from water was concentrated with sequential filtration and DNA was extracted. For each sample, the V3-V4 region 16S rDNA were amplified. Shotgun libraries were sequenced on Ion Torrent and 16S rDNA on Miseq. Illumina reads were filtered and assembled with Trimmomatic and FLASH. USEARCH was used in chimera checking, OTU picking (97% similarity) and dereplication steps. Diversity analyses were performed using QIIME. Contigs were obtained with SPADES and coding sequences were predicted with MetaGeneMark. Proteins were classified (BLAST) into Uniprot families and KEGG orthology groups and mapped to pathway modules.
The observed dissimilarities of the dominant taxonomic groups across the compared stratums were due to differences in physicochemical parameters. Nitrogen metabolism markers (genes of nitrate reduction and denitrification) decreased towards the bottom. Iron/sulfate-reducing bacteria and microorganisms resistant to of heavy metals, were founded in sediment samples. We also observed high occurrences of genes related to regulation and cell signaling, cell wall/capsule, motility/chemotaxis and stress response in all layers, suggesting that these functions might be important in adaptive responses of the microorganisms to changes in environmental conditions.
Short Abstract: Characterization of the microbial diversity of a microbiome begins with the sequencing of its DNA complement, namely, the metagenome. Methods for the taxonomical profiling of metagenomic sequences rely on the microbial sequence database, which however represents a tiny fraction of microbes dwelling the earth. Therefore, attempts are made to systematically profile the metagenomic sequences, beginning with the species level towards higher taxonomic level until a “match” with the sequence of interest is found. Construction of signature models of sequenced microbial genomes underlie the current state-of-the-art in the field. We posit that these “static” models are inherently limited in exemplifying the microbial dynamism that shapes the genomes, resulting in chimeras with segments of different ancestries or origins. We, therefore, propose a segmental genome model, where a genome is represented by an ensemble of signatures derived from segments of apparently different ancestries or origins. By incorporating segmental signature models within a variable order Markov model framework for scoring metagenomics sequences, we could achieve a more robust metagenome profiling. The proposed method was assessed on synthetic and real metagenomes, and compared with the popular methods for metagenome classification. This new approach for metagenome characterization will be presented with a focus on future directions in this meeting.
Short Abstract: We developed the bioBakery as a comprehensive platform for the analysis of shotgun meta’omic sequencing data, incorporating fast, accurate methods for taxonomic profiling of bacteria, archaea, eukaryotic microbes, and viruses (MetaPhlAn2), organism-specific functional profiling and metabolic reconstruction (HUMAnN2), and identification/tracking of microbial strains (StrainPhlAn). Here, we applied these methods to the expanded NIH Human Microbiome Project (HMP): one of the broadest overviews of normal microbiome diversity and function, which in a second wave of data (HMP1-II) now includes >2,000 metagenomes sampled from 7 major body sites (niches) in a population of >100 individuals at three times points each. Analysis with bioBakery methods revealed key properties of the human microbiome, including 1) common eukaryotic and viral members (in addition to bacteria and archaea); 2) distinguishing molecular functions, including niche-specific metabolic pathways; 3) three distinct ecological mechanisms of niche-specific functional conservation; 4) strain-level niche specialization in the oral cavity; and 5) longitudinal dynamics of species, strains, and functions. HMP1-II metagenomic profiles of the baseline adult microbiome thus provide a uniquely deep and detailed view of human microbial diversity, aiming to eventually contextualize diagnoses and develop treatments for functional disorders of the microbiome. These results were enabled by the bioBakery, which includes open source tool distributions and a virtual environment that are broadly applicable to the analysis of shotgun meta’omic sequencing data from host- or environmentally-associated microbial communities. The bioBakery and its component methods are available for download (including documentation, tutorials, and demonstration data) from http://huttenhower.sph.harvard.edu/biobakery.
Short Abstract: Alignment-based approaches for metagenomic profiling are able to detect genomes at a very high resolution using whole-genome high-throughput DNA sequencing data. Part of the data preprocessing involves generating fragment counts for each genome in a reference database of organisms. These counts represent the number of sequencing reads that map (uniquely, and non-uniquely) to each reference genome.
The increasing number, size, and complexity of metagenomic samples creates a computational challenge for researchers as analyses take considerably extended times to complete, and new strategies that leverage distributed systems need to be developed. These tools will provide researchers with efficient platforms for processing large amounts of data, and facilitate the analysis of integrative datasets that contain data from many sources.
Here we present Flint, a fast and scalable tool for efficiently generating genomic counts and data reports from metagenomic whole-genome sequencing samples. Flint is built on top of Apache Spark, and is able to handle multiple samples comprising billions of sequencing reads. Flint provides a fast, scalable interface for preprocessing complex metagenomic samples that contain not only bacterial genomes, but viral, archaeal, and other organisms as well. Our implementation is designed to leverage the Spark frameworks, and maximize the use of the underlying compute resources to reduce analysis time. To demonstrate its capabilities, we apply our method to multiple microbiome samples from the Human Microbiome Project (HMP).
Short Abstract: The basis of metagenomic research is assignment of sequenced reads to certain taxonomic units. This is usually done using a classifier based either on read alignments or k-mer spectra and one of the four taxonomies NCBI, RDP, SILVA or Greengenes. The outcome of this crucial step depends on the chosen taxonomy as well as on the tool. We present comparative analysis of the four taxonomies together with the recently published Open Tree of Life classification. Also, we contrast the specificity of different tools that are commonly used in 16s RNA analysis.
Short Abstract: Whilst abundant bioinformatics resources have been developed to analyse the taxonomic and functional composition of microbial metagenomic data, their applicability to viral metagenomics is limited. HoloVir is a computational workflow designed to process large viral metagenomics datasets, facilitating thorough taxonomic and functional characterisation of viral communities. HoloVir performs taxonomic assignment using pairwise comparsions to the Viral Refseq genomic resource, and incorporates marker analysis to identify potential cellular contaminants and estimate viral taxonomic composition. HoloVir incorporates single read analysis as well as metagenome assembly and gene prediction, and has been validated using simulated viral community datasets. Broad functional classification of predicted genes is facilitated by the assignment of COG microbial functional categories using EggNOG, and higher resolution functional analysis is performed using SwissProt keyword assignment. The workflow has been shown to be flexible enough to accommodate taxonomically diverse hosts, yet specific enough to identify differences within the associated viral assemblages. HoloVir has been successfully applied to investigate viral communities within several key holobionts from the Great Barrier Reef and has facilitated the comparison of viral communities across species, health states and life history stages. Visualisation of output data can be specifically tailored to complement the scientific focus. The HoloVir workflow is available to download via GitHub (https://github.com/plaffy/HoloVir ).
| Search Posters: |