Attention Conference Presenters - please review the Speaker Information Page available here.
If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.
Short Abstract: In the typical lab many bench investigators will come and go over the years and labs will accumulate a large collection of samples. Traditionally in most laboratories sample labeling and tracking of these samples is a problem that is left up to the individual bench investigators and it often amounts to no more than a few handwritten cryptic indications on the side of a tube and some notes in a notebook, or at best in an Excel spreadsheet. When people leave labs it can become very difficult for those remaining to pick up the pieces of their research, often because of an idiosyncratic and confusing documentation of their samples. The value of overcoming this problem is therefore enormous. However achieving a more sophisticated system, with barcoding and full sample tracking, has traditionally been too expensive for all but the most highly funded labs because of the significant overhead and expert systems support required. To overcome this limitation, we have developed a lightweight system called COLLOS (COLLection Of Samples), together with detailed documentation that will allow any lab to achieve relatively sophisticated sample labeling and tracking without excessive overhead. In particular we give specific instructions on hardware and software which reflects several years of research and development, and which involved the tested various labels under many harsh conditions. The entire system can be installed for under $5000. This poster describes the basic functionality of the system and the most relevant implementation details.
Short Abstract: BioStudies is a new database at EBI that aims to the address current limitations within the traditional structured data archives available to scientists.
It is able to accept and store data from new and emergent technology where data is produced in formats not supported by the current EBI data resources. Biostudies is also able to link to data in other databases, this is particularly advantageous in multiomic studies where data has been deposited in a number of repositories but with no central link to tie everything together. Due to the flexible nature of it’s data model, Biostudies is also able to store the supplementary data that is associated with publications.
A simple tab-delimited text format, PAGE-TAB has been developed to enable the capture of all the information described. PAGE-TAB allows the submitter to describe files and external links associated with a study, organise information in hierarchies, and attach annotation as appropriate. Extra functionality can be added for specific purposes, such as a compound view in the ‘Data Infrastructure for Chemical Safety (diXa)’ project.
Submissions from users can be submitted through a new online tool allowing the submitter the input of metadata, including data release date, direct upload of files, links to already deposited data and associated publication information. The tool enables users to maintain and edit their own Biostudies record.
As of March 2016 BioStudies contains 578,167 studies that are free to browse, download and reuse. The user interface enables ontology-driven query expansion, enabling powerful searching across thousands of datasets.
Short Abstract: Broad open access to entire clinical research studies data is on the rise. Public access to raw clinical research data has created tremendous opportunity to evaluate new research hypotheses that were not originally formulated in the studies; by reanalyzing data from a study, or by performing cross analysis of multiple studies. The Immunology Database and Analysis Portal (ImmPort: www.immport.org) system warehouses immunology clinical study data that is generated by scientific researchers supported by the National Institute of Allergy and Infectious Diseases (NIAID) / Division of Allergy, Immunology and Transplantation (DAIT). Currently, over 160 studies are publicly available in ImmPort. We have developed RImmPort (bioconductor.org/packages/RImmPort/) that prepares ImmPort data for analysis in the open-source R statistical environment. RImmPort comprises of three main components: 1) a specification of R study classes that encapsulate study data. The specification leverages of study data standards from the Clinical Data Interchange Standards Consortium (CDISC), 2) foundational methods to load data for a specific study in ImmPort, and 3) generic methods to slice and dice data across different dimensions of study data. Thus, RImmPort hides the complexities and idiosyncrasies of the ImmPort data repository model, and provides easy access to the study data in a structure that is conducive for analysis. By basing RImmPort on open formalisms such as CDISC standards and by making it available in open source bioinformatics platforms such as Bioconductor, we ensure that clinical study data in ImmPort is openly accessible and ready for analysis, thus enabling innovative bioinformatics research in immunology.
Short Abstract: Installing software is at best a tedious experience, and is often a distressing experience. The system's package manager may require administrator access to the machine, which is typically not available on high performance computing clusters. Compiling software from source can vary from difficult to impossible, for example when the compiler and libraries provided by the operating system are a decade old. Manually navigating the recursive dependency chain of the tool and its dependencies, and their dependencies, can feel like a labyrinth with no end.
Linuxbrew is a package manager for Linux derived from Homebrew, the Mac OS package manager. It is a cross-platform utility, compatible with any distribution of Linux and version of Mac OS released in the last decade, allowing you to use the same package manager on both your Linux server and your Mac laptop. It can be installed in your home directory, and does not require administrator access. Using Linuxbrew, challenging tasks are made easy; for example installing a modern compiler in your home directory takes a few minutes, even on an ancient distribution of Linux.
Homebrew-Science is a collection of scientific software packages installable by either Linuxbrew or Homebrew. With nearly 600 software packages available, over 200 of those are bioinformatics tools. Software packages are maintained up-to-date by a fervent community of over 400 contributors.
Linuxbrew eliminates the hassle of installing software, and enables reproducible science by facilitating reproducible installations of the software used for an analysis.
Short Abstract: It is clear that the amount of data being generated worldwide cannot be curated and annotated by any individual or small group. Even those generating the data only tend to curate it with information that is directly relevant to their interests/goals, perhaps neglecting to include information that could be important for someone to re-use the data in an unrelated study.
There is a general realization that the only way to provide the depth and breadth of annotations is to employ the power of the community, or as the saying goes “many hands make light work”. To achieve this we first required user-friendly tools and apps that non-expert curators would be comfortable and capable of using. Such tools are now in place, including iCLiKVAL (http://iclikval.riken.jp) and Hypothes.is (https://hypothes.is).
In order for these tools to become the powerhouse behind community curation they need a kick-start, something to seed them with useful information that will allow users to realize their utility and begin to both habitually use and add information to them.
To this end, GigaScience created and ran the first “Giga-Curation Challenge” at the BioCuration2016 meeting in April this year. We created “The Annotometer” app (http://annotometer.com) to track and measure curations made over the duration of the conference.
Here we present both the results from the first challenge, as well as the Annotometer code, which we make freely available to anyone wishing to run a similar challenge in the future.
Visit our poster to find out how the first Giga-Curation challenge went.
Short Abstract: Analysis of high troughput biological data often involves the use of many software packages and in-house written code. For each analysis step there are multiple options of software tools available, each with its own capabilities, limitations and assumptions on the input and output data. The development of bioinformatics pipelines involves a great deal of experimentation with different tools and parameters, considering how each would fit to the big picture and the practical implications of their use. Organizing data analysis could prove challenging. In this work we present a set of methods and tools that aim to enable the user to experiment extensively, while keeping analyses reproducible and organized. We present a framework based on simple principles that allow data analyses to be structured in a way that emphasizes reproducibility, organization and clarity, while being simple and intuitive so that adding and modifying analysis steps can be done naturally with little extra effort. The framework suppports git-based version control of code, documentation and data, enabling collaboration between users.
Short Abstract: Data that containing redundant and/or missing values often cause significant performance degradation in various statistical tests and machine learning tests. Especially, in mass spectrometry data sets, we often face dirty-data problems such as duplicated records that originated from an identical molecule, and frequent missing values in measured intensity. These problems are caused by various reasons, such as batch effects in sample preparation and different sensitivity of machines. To solve these issues, we developed a data preprocessor with respect to mass spectrometry data which is comprised of three-dimensional contour map format (mass-to-charge, retention time and intensity). First, candidates of duplicated records were extracted by comparing similarity of mass-to-charge and retention time. Second, with the candidates, we tested pairwise intensity similarity under the threshold that user specified. If similar duplicated records exist, we merged these records. Our tool was implemented in R and it is dependent on C-language-based R package, data.table and dplyr. For test datasets, we obtained a reduced number of records and checked whether a log file for the merging process was written correctly or not. Moreover, we confirmed an overall reduction of the missing values. As a result, the records that had many missing values (> 80%) have lower proportion after the preprocessing. Our mass spectrometry preprocessor allows users to have preprocessed data which is duplication-free. If a user wants some additional data preprocessing, a missing value imputation tool like MICE could be adopted. Our tool will be deposited to GitHub under open-source license.
Short Abstract: MFAPipe is open source software for Flux Balance Analysis (FBA) and parallel labeling, metabolic and isotopic steady state Metabolic Flux Analysis (MFA). Distributed as a stand-alone executable with a permissive, open source license, the software has no proprietary dependencies, e.g., MATLAB or GAMS. Using the stoichiometric paradigm and Elementary Metabolite Units (EMU) method, the software supports the investigation of the flow of metabolites through biological systems, i.e., the determination of production and consumption rates of metabolites (referred to as "metabolic fluxes"). The software is engineered for efficiency and flexibility. Biological systems are modeled as biochemical reaction networks of exchange and transport reactions and intra- and extracellular metabolites. Biochemical reaction networks may be arbitrarily complex, featuring any number and configuration of metabolites, each with any number and configuration of isotopic atoms and tracer elements. Using the Levenberg-Marquardt algorithm (weighted nonlinear least-squares method) with analytic Jacobian matrices, metabolic fluxes are fitted to a combination of stoichiometry matrices, user-specified box and linear constraints, e.g., fixed flux values, upper and lower bounds and weights, and arbitrary algebraic expressions of Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) spectroscopy data, e.g., isotopomer, cumomer and mass fractions.
Short Abstract: To overcome the challenges of moving massive amount data among remote storage/compute nodes and mastering complicate scientific workflows, we built a federated CyVerse platform at CSHL, which is constituted by several storage and compute systems located at CSHL, and fully integrated with CyVerse’s Cyber-infrastructure through Agave API and iRODS. On top of the federated system, we designed and implemented a web portal using ReactJS, SciApps.org, that automatically populates the interfaces of Agave apps, handles data uploading and browsing into our data store, supports job submission and monitoring, and provides various ways to visualize analysis results. For example, we integrated genome browsers, Biodalliance and JBrowse, for visualizing alignments, variants, and genome annotation results. The platform allows us to keep large amount of data processed locally with existing CyVerse apps and pipelines, thus facilitate handling massive amount of data efficiently (by reducing national wide data transfer) and easy exchanging of apps/pipelines across systems. The platform is designed to facilitate cloud computing and sharing of data, apps, storage, and computing systems, and it supports fast exchange of data with the CSHL hosted Gramene/Ensembl plants database.
| Search Posters: |