Attention Conference Presenters - please review the Speaker Information Page available here.
If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.
Short Abstract: Studies previously performed suggested to use biological measurement diversity, which is derived from multiplexed profiling in the construction of libraries for cell-based screens and to understand compounds mechanism of action. One such example is the combinatorial study of high-dimensional image-based cell morphology and gene expression profiles. Integration of these two types of analysis can improve our understanding of compounds fundamental mechanism of action (MoA). Our current study spotlights the integration of compound induced gene expression data along with cellular morphological features of the U-2 OS osteosarcoma cell line and compounds target prediction scores which provide insight into mechanism of action (MoA). We piloted this approach using the LINCS (Library of Integrated Cellular Signatures) imaging dataset to analyse high-dimensional image-based cell morphology (811 cellular morphology features) and gene expression profiles (L1000) mediated by exposure to ~30,000 compounds. Principal Component Analysis (PCA) was performed to identify the top seven important and independent cell morphological features. For a given morphological feature, compounds were then clustered into two groups and each cluster was then linked to significant sets of up/down-regulated genes by performing hypergeometric distribution test. Using significant gene sets, ClueGO, a Cytoscape plug-in, was used to integrate Gene Ontology (GO) terms as well as KEGG pathways that created a functionally organized GO/pathway term network. For the selected 8,378 compounds, protein targets were predicted using an in house, machine learning algorithm trained on activity and inactivity data of compounds against 1,080 human protein targets.
Short Abstract: Diarrhea is one of the leading causes of death among children under five globally. More than one in ten child deaths – about 800,000 each year – is due to diarrhea. Today, only 44% of children with diarrhea in low-income countries receive the recommended treatment, and limited trend data suggest that there has been little progress since 2000. Cryptosporidium is a microscopic parasite that is second leading cause of diarrheal cases globally. For Cryptosporidium, there is no fully effective drug treatment or vaccine, and the basic research tools and infrastructure needed to discover, evaluate and develop interventions for this parasite are mostly lacking. My project uses comparative genomics to assess the suitability of a related, easy to culture bird-infecting species as a possible model for study. However, before I perform my comparative analysis I must first determine whether the Illumina assembly or Hybrid (PacBio, Illumina) assembly is of the highest quality. To test my hypothesis about suitability, I have conducted annotations of both assemblies of C. baileyi. The determination of the highest quality annotation will allow me to compare the C. baileyi genome sequence to other Cryptosporidium and apicomplexan species. I anticipate that my work will reveal novel information on conserved Cryptosporidium genes as well as species-specific evolutionary changes that may warrant further investigation into features that are found in human-infecting species.
Short Abstract: Many de novo genome assembly projects have been performed using high-throughput sequencers, assembling the highly genome is a big scientific challenge due to the increased complexity. We present an assembly pipeline that simplifies the entire genome assembly process by automating these steps, by integration several previously published algorithms with new algorithms for quality control and automated assembly pipeline. In order to provide a tool for automated genome annotation, GAS system was created, which makes use of an automatic genome annotation pipeline, which can be used of annotate plant type of organism.The genome annotation pipeline is involving five main steps - pre-processing, building contigs, scaffolding contigs, gene prediction and annotation. We separate building contigs into three steps and annotation into two steps. First step in pre-processing is read error correction. After contigs are built, paired/mated libraries can be used to scaffold contigs together by using the scaffolder. After assembly, scaffold sequences refers to the process of identifying the regions of genomic DNA that encode genes. A gene prediction software is used, such as MAKER, which produces a set of predicted protein-coding genes. Then functional annotation is performed with Blas2GO. The results are in GFF3 format and can be easily visualized in a genome browser. Here, we demonstrate that we are able to produce assemblies and gene information of high quality, without any prior knowledge of the particular genome and without the extensive parameter tuning. Our pipeline will assist researchers in selecting a well-suited genome annotation module and offer essential information.
Short Abstract: ChIP-seq enables the identification of regulatory regions that govern gene expression at genome-scale. However, the biological insights generated from ChIP-seq analysis have been limited to predictions of binding sites and cooperative interactions. Furthermore, ChIP-seq data often poorly correlate with in vitro measurements or predicted motifs, highlighting that binding affinity alone is insufficient to explain transcription factor (TF)-binding in vivo. A more comprehensive biophysical representation of TF-binding will improve our ability to understand, predict, and alter gene expression. Here, we show that genome accessibility is a key parameter that impacts TF-binding in bacteria. We developed a thermodynamic model that parameterizes ChIP-seq coverage in terms of genome accessibility and binding affinity. The role of genome accessibility is validated using a large-scale ChIP-seq dataset of the M. tuberculosis regulatory network. We find that accounting for genome accessibility led to a model that explains 69% of the ChIP-seq profile variance, while a model based in motif conservation alone explains only 46% of the variance. Moreover, our framework enables de novo ChIP-seq peaks prediction and is useful for inferring TF-binding peaks in new experimental conditions by reducing the need for additional experiments. We observe that the genome is more accessible in intergenic regions, and that increased accessibility is positively correlated with gene expression and anti-correlated with distance to the origin of replication. Our biophysical model provides a more comprehensive description of TF-binding in vivo from first principles towards a better representation of gene regulation in silico, with promising applications in systems biology.
Short Abstract: The spatial organization of the genome plays a critical role in regulating gene expression. Recent chromatin interaction mapping studies have revealed that Topologically Associating Domain (TAD) is a megabase-sized, fundamental building block of the 3-dimensional genome. Thus, identifying TADs in different cell types and conditions is a critical step towards understanding the 3D structure-function relationship of the genome. Existing computational algorithms for predicting TADs rely on strong assumption of observed chromatin contact frequency, lack a principled strategy of selecting algorithmic parameters, and are computationally inefficient for high-resolution Hi-C data. In this work, we propose a novel algorithm, Gaussian Mixture model And Proportion tests (GMAP), to address the above-mentioned issues in predicting TADs. Furthermore, GMAP can also identify hierarchical structures within individual TADs, which enables studies of the role of sub-TAD domains in gene regulation. Using simulated and published Hi-C data, we show that GMAP can identify TADs more accurately and efficiently than existing methods.
Short Abstract: Background: Chromosome Conformation Capture (3C) is a biological technique that can determine if two genomic regions are in close spatial proximity. This proximity is often referred to as an "interaction". Known interactions, also called interaction-controls, are needed to validate 3C results. Unfortunately, there are no identified interaction-controls that can be used across tissues or species. Therefore, when 3C is performed in a new cell line, extensive work must be done to find new interaction-controls. Potential interaction-controls could be identified by searching existing whole-genome contact maps (produced by the 3C derivative Hi-C). Unfortunately, this is currently not done due to map size and the lack of computational tools.
Objective: Develop a tool for the detection of potential interaction-controls from whole-genome contact maps.
Methodology: An R program was developed that performs a modified matrix search to identify n (user-specified) interaction-controls from a whole-genome contact map. The program checks each unique cell of the contact map to identify genomic regions with the highest interaction frequencies. It is hypothesized that interactions with high frequencies will occur consistently making them ideal candidates for interaction-controls. Users can select optional filters to restrict the search to interactions occurring: within chromosomes, between chromosomes or between regions at least k base pairs apart.
Conclusion: The developed program identified the 10 most likely interaction-controls from the GSM1379427 yeast contact map (1258x1258) in 2.36 seconds. It is possible that the analysis of additional whole-genome contact maps with this program may assist in the identification of standardized interaction-controls across tissues and species.
Short Abstract: Due to recent advances in transcriptome assembly algorithms, de novo transcripts assembled from RNA-Seq data became valuable resource contributing to genome annotation. Practically, there is a wide range of relationships between existence and reliability of structural and functional gene annotation, on one hand, and the expected quality and exhaustiveness of the transcriptome assembly, on the other hand. Combining RNA- and DNA-level information for variety of use cases requires ability to tune the merging parameters and alter the relative weight of DNA- and RNA-level information in context of a particular project. Manually curated functional annotations, if available, need to be reasonably preserved in the re-annotation process. We present annTrans, a software tool that takes genome sequence, transcriptome assembly, structural annotation (possibly pre-corrected on the basis of the transcriptome assembly, using already available tools), curated functional annotation and, with help of a de novo annotation tool, merges those pieces of information to generate annotation that includes known transcripts with curated functional annotation, novel transcripts with predicted functional annotation and antisense transcripts. A user has flexibility to set rules (controlled separately for coding and predicted antisense transcripts) for calling or dropping novel sense and antisense transcripts and making decisions about transferring of a curated functional annotation on feature by feature basis. The tool is written in C++ and allows fast experimentation with the merging parameters. The code is publicly available under GPL2.0.
Short Abstract: As mounting evidence indicates, each cell in the human body has its own genome, a phenomenon called somatic mosaicism. Such somatic variations include single nucleotide variants (SNVs), small insertions and deletions (indels), transposable element insertions, large copy-number variations (CNVs), and structural variations. Although somatic mosaicism may pose functional and pathological implications, there has been no comprehensive estimate of the number and allelic frequency of genomic variations in normal somatic cells in various tissues of the human body, as it remains difficult to detect somatic mosaic variants given their limited presence in cell tissue—at times, amounting to less than a fraction of a percent. To circumvent that problem, we sequenced the genomes of clonal cell populations derived from single brain progenitor cells to identify genomic variations present in the founder cell and manifested in each clone at 50% allele frequency. Unlike single cell sequencing, our approach avoids amplification artifacts. For data analysis, we developed a workflow to synergize calls from several variant calling programs: MuTect, SomaticSniper, Strelka, and VarScan for SNVs; Scalpel, Strelka, and VarScan for indels; CNVnator for CNVs. By applying the workflow to compare germline genomes of different individuals, we performed a data-driven estimation of workflow sensitivity. Using real data for six clones from an individual healthy brain, we detected per clone 200–500 SNVs at >75% sensitivity, 10–30 indels at >40% sensitivity, and 1-5 CNVs . Orthogonal experimental validation revealed a ~100% specificity of the calls generated. Thus, our analysis has revealed extensive somatic mosaicism within the human brain.
Short Abstract: Advent of high-throughput RNA-sequencing (RNA-seq) has led to discovery of unprecedentedly immense transcriptomes in eukaryotic genomes but the transcriptome maps are still incomplete partly because of RNA-seq reads lacking their orientations (as known as unstranded reads) and providing uncertain boundary of assembled transcripts. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and to precisely determine the boundary of assembled transcripts could significantly benefit to the quality of the transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE that makes a significant improvement of the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienteering unstranded reads using the maximum likelihood estimation and by integrating information of transcription start and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising twenty-four billion RNAs-seq reads from the ENCODE and human body map projects, the CAFE enabled to predict the directions of about seven billion unstranded reads, which lead to construction of the accurate transcriptome maps, comparable to the manually curated one, and the comprehensive lncRNA catalogue including thousands of novel lncRNAs. Our pipeline would help to build comprehensive, precise transcriptome maps in complex genomes but also to expand the universe of non-coding genomes.
Short Abstract: Because of ongoing decrease in cost of high-throughput sequencing (HTS), studies for genome assembly and noncoding gene annotations have been getting popular for not only model organisms but also non-model organisms. In this study, to understand livestock-specific characteristics within one species, we sequenced whole genomes and transcriptomes of Ogye, a Korean traditional Gallus gallus breed with phenotypical characteristics of black leather, skin, fascia, and sclera. Especially, Ogye is well known for its strong immune resistance against some specific diseases, such as Marek’s disease and avian influenza, in the Korean poultry industry. To help studies of these phenotypical characteristics in genome level, we first assembled a draft genome of Ogye with paired-end libraries with two different insert sizes (60X coverage), mate-pair libraries with five different insert sizes (172X coverage), and PacBio data (11X coverage) using our hybrid genome assembly pipeline including Allpath-LG. The resulting draft genome of Ogye displayed a high quality of N50 (133K for contig and 21.2M for scaffold), which is a better assembly than that of Gallus gallus. We also constructed noncoding transcriptome maps on the draft genome and profiled their expression across 20 different tissues including the skin, fascia, and eye by sequencing RNA-seq and small RNA-seq. We found 23 microRNA (miRNA) and 316 long intervening ncRNAs (lincRNAs) specifically expressed in the black tissues. We expect that our genomic and transcriptomic resources could provide insights of the genomic evolution during Gallus gallus subspeciation and of the medical implication for the viral infection and immune-related diseases.
Short Abstract: Plant growth and development processes are known to be regulated by genes that response to Auxin. The expression of these genes is controlled by transcription factors of the ARF (Auxin Response Factor) family. They recognize AuxRE (Auxin Response Element) sequences in plant genomes through a B3-type DNA binding domain, which are thought to be highly conserved. To date no systematic analysis has been conducted to explore the actual space of DNA sequences that could be potentially recognized by these factors. We hypothesize the existence of putative non-canonical AuxRE elements in the Arabidopsis genome that have not been identified. We have started from DNAse-seq experiments in the Arabidopsis genome to detect exposed regions over the genome. A structure-based prediction of ARF binding sites was performed using comparative models of protein-DNA complexes of three representative members of the ARF family, through a sliding window along the DNAse-seq identified regions. An energy function was used to assess protein-DNA complexes models and select putative binding regions. Both sequence-based and structure-based strategy were able to identify all previously described targets for ARF sequences within A. thaliana genome. However, our method was able to predict novel target that are not detected by sequence-based approaches. Finally, we conclude that a structure-based approach can provide with novel targets in the identification of DNA binding sequences for TF, however experimental validation of these results is pending. FONDECYT Postdoctorado 3140531.
| Search Posters: |