Show simple item record

dc.contributor.authorSztromwasser, Paweł
dc.date.accessioned2019-10-29T08:00:38Z
dc.date.available2019-10-29T08:00:38Z
dc.date.created2018-09-05 01:20
dc.date.issued2014-04-11
dc.identifieroai:bora.uib.no:1956/7906
dc.identifier978-82-308-2967-7
dc.identifierhttp://hdl.handle.net/1956/7906
dc.identifier.urihttp://hdl.handle.net/20.500.12424/2581062
dc.description.abstract<p>The post-genomic era has been heavily influenced by the rapid development of highthroughput
 molecular-screening technologies, which has enabled genome-wide analysis
 approaches on an unprecedented scale. The constantly decreasing cost of producing
 experimental data resulted in a data deluge, which has led to technical challenges
 in distributed bioinformatics infrastructure and computational biology methods. At the
 same time, the advances in deep-sequencing allowed intensified interrogation of human
 genomes, leading to prominent discoveries linking our genetic makeup with numerous
 medical conditions. The fast and cost-effective sequencing technology is expected to
 soon become instrumental in personalized medicine. The transition of the methodology
 related to genome sequencing and high-throughput data analysis from the research
 domain to a clinical service is challenging in many aspects. One of them is providing
 medical personnel with accessible, robust, and accurate methods for analysis of
 sequencing data.</p><p>The computational protocols used for analysis of the sequencing data are complex,
 parameterized, and in continuous development, making results of data analysis sensitive
 to factors such as the software used and the parameter values selected. However,
 the influence of parameters on results of computational pipelines has not been systematically
 studied. To fill this gap, we investigated the robustness of a genetic variant
 discovery pipeline against changes of its parameter settings. Using two sensitivity
 screening methods, we evaluated parameter influence on the identified genetic variants,
 and found that the parameters have irregular effects and are inter-dependent. Only a
 fraction of parameters were identified to have considerable impact on the results, suggesting
 that screening parameter sensitivity can lead to simpler pipeline configuration.
 Our results showed, that although a simple metric can be used to examine parameter
 influence, more informative results are obtained using a criterion related to the accuracy
 of pipeline results. Using the results of sensitivity screening, we have shown that
 the influential pipeline parameters can be adjusted to effectively increase the accuracy
 of variant discovery. Such information is invaluable for researchers tuning pipeline parameters,
 and can guide the search for optimal settings for computational pipelines in
 a clinical setting. Contrasting the two applied screening methods, we learned more
 about specific requirements of robustness analysis of computational methods, and were
 able to suggest a more tailored strategy for parameter screening. Our contributions
 demonstrate the importance and the benefits of systematic robustness analysis of bioinformatics
 pipelines, and indicate that more efforts are needed to advance research in
 this area.</p><p>Web services are commonly used to provide interoperable, programmatic access to bioinformatics resources, and consequently, they are natural building blocks of bioinformatics
 analysis workflows. However, in the light of the data deluge, their usability
 for data-intensive applications has been questioned. We investigated applicability of
 standard Web services to high-throughput pipelines, and showed how throughput and
 performance of such pipelines can be improved. By developing two complementary approaches,
 that take advantage of established and proven optimization mechanisms, we
 were able to enhance Web service communication in a non-intrusive manner. The first
 strategy increases throughput ofWeb service interfaces by a stream-like invocation pattern.
 This additionally allows for data-pipelining between consecutive steps of a workflow.
 The second approach facilitated peer-to-peer data transfer between Web services
 to increase the capacity of the workflow engine. We evaluated the impact of the enhancements
 on genome-scale pipelines, and showed that high-throughput data analysis
 using standard Web service pipelines is possible, when the technology is used sensibly.
 However, considering the contemporary data volumes and their expected growth,
 methods capable of handling even larger data should be sought.</p><p>Systematic analysis of pipeline robustness requires intensive computations, which are
 particularly demanding for high-throughput pipelines. Providing more efficient methods
 of pipeline execution is fundamental for enabling such examinations on a largescale.
 Furthermore, the standardized interfaces of Web services facilitate automated
 executions, and are perfectly suited for coordinating large computational experiments.
 I speculate that, provided wide adoption of Web service technology in bioinformatics
 pipelines, large-scale quality control studies, such as robustness analysis, could be
 automated and performed routinely on newly published computational methods. This
 work contributes to realizing such a conception, providing technical basis for building
 the necessary infrastructure and suggesting methodology for robustness analysis.</p>
dc.language.isoeng
dc.publisherThe University of Bergen
dc.relation.ispartofPaper I: Paweł Sztromwasser, Pål Puntervoll, and Kjell Petersen. Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows. Journal of Integrative Bioinformatics, 8(2):163, 2011. The article is available at: <a href="http://hdl.handle.net/1956/7904" target="blank">http://hdl.handle.net/1956/7904</a>
dc.relation.ispartofPaper II: Sattanathan Subramanian, Paweł Sztromwasser, Pål Puntervoll, and Kjell Petersen. Direct data transfer between SOAP web services in Orchestration. In the International Conference on Information Integration andWeb-based Applications & Services (iiWAS). ACM, 2012. The article is available at: <a href="http://hdl.handle.net/1956/7905" target="blank">http://hdl.handle.net/1956/7905</a>
dc.relation.ispartofPaper III: Sattanathan Subramanian, Paweł Sztromwasser, Pål Puntervoll, and Kjell Petersen. Pipelined Data-flow Delegated Orchestration for Data-Intensive eScience Workflows. International Journal of Web Information Systems, 9(3):204-218, 2013. The article is not available in BORA due to publisher restrictions. The published version is available at: <a href="http://dx.doi.org/10.1108/ijwis-05-2013-0012" target="blank"> http://dx.doi.org/10.1108/ijwis-05-2013-0012</a>
dc.relation.ispartofPaper IV: Paweł Sztromwasser, Kjell Petersen, and Inge Jonassen. Sensitivity screening reveals influential parameters of a variant calling pipeline. The article is not available in BORA.
dc.rightsCopyright the author. All rights reserved
dc.titleThroughput and robustness of bioinformatics pipelines for genome-scale data analysis
dc.typeDoctoral thesis
ge.collectioncodeOAIDATA
ge.dataimportlabelOAI metadata object
ge.identifier.legacyglobethics:15234384
ge.identifier.permalinkhttps://www.globethics.net/gel/15234384
ge.lastmodificationdate2018-09-05 01:20
ge.lastmodificationuseradmin@pointsoftware.ch (import)
ge.submissions0
ge.oai.exportid149801
ge.oai.repositoryid4505
ge.oai.streamid2
ge.setnameGlobeEthicsLib
ge.setspecglobeethicslib
ge.linkhttp://hdl.handle.net/1956/7906


This item appears in the following Collection(s)

Show simple item record