• English
    • français
    • Deutsch
    • español
    • português (Brasil)
    • Bahasa Indonesia
    • русский
    • العربية
    • 中文
  • 中文 
    • English
    • français
    • Deutsch
    • español
    • português (Brasil)
    • Bahasa Indonesia
    • русский
    • العربية
    • 中文
  • 登入
查看項目 
  •   首頁
  • OAI Data Pool
  • OAI Harvested Content
  • 查看項目
  •   首頁
  • OAI Data Pool
  • OAI Harvested Content
  • 查看項目
JavaScript is disabled for your browser. Some features of this site may not work without it.

瀏覽

所有文獻群體出版日期標題主題作者此合集出版日期標題主題作者個人檔案檢視

我的帳號

登入

The Library

AboutNew SubmissionSubmission GuideSearch GuideRepository PolicyContact

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors

Throughput and robustness of bioinformatics pipelines for genome-scale data analysis

  • CSV
  • RefMan
  • EndNote
  • BibTex
  • RefWorks
Author(s)
Sztromwasser, Paweł

所有記錄
顯示完整的項目記錄
URI
http://hdl.handle.net/20.500.12424/2581062
Online Access
http://hdl.handle.net/1956/7906
Abstract
<p>The post-genomic era has been heavily influenced by the rapid development of highthroughput
 molecular-screening technologies, which has enabled genome-wide analysis
 approaches on an unprecedented scale. The constantly decreasing cost of producing
 experimental data resulted in a data deluge, which has led to technical challenges
 in distributed bioinformatics infrastructure and computational biology methods. At the
 same time, the advances in deep-sequencing allowed intensified interrogation of human
 genomes, leading to prominent discoveries linking our genetic makeup with numerous
 medical conditions. The fast and cost-effective sequencing technology is expected to
 soon become instrumental in personalized medicine. The transition of the methodology
 related to genome sequencing and high-throughput data analysis from the research
 domain to a clinical service is challenging in many aspects. One of them is providing
 medical personnel with accessible, robust, and accurate methods for analysis of
 sequencing data.</p><p>The computational protocols used for analysis of the sequencing data are complex,
 parameterized, and in continuous development, making results of data analysis sensitive
 to factors such as the software used and the parameter values selected. However,
 the influence of parameters on results of computational pipelines has not been systematically
 studied. To fill this gap, we investigated the robustness of a genetic variant
 discovery pipeline against changes of its parameter settings. Using two sensitivity
 screening methods, we evaluated parameter influence on the identified genetic variants,
 and found that the parameters have irregular effects and are inter-dependent. Only a
 fraction of parameters were identified to have considerable impact on the results, suggesting
 that screening parameter sensitivity can lead to simpler pipeline configuration.
 Our results showed, that although a simple metric can be used to examine parameter
 influence, more informative results are obtained using a criterion related to the accuracy
 of pipeline results. Using the results of sensitivity screening, we have shown that
 the influential pipeline parameters can be adjusted to effectively increase the accuracy
 of variant discovery. Such information is invaluable for researchers tuning pipeline parameters,
 and can guide the search for optimal settings for computational pipelines in
 a clinical setting. Contrasting the two applied screening methods, we learned more
 about specific requirements of robustness analysis of computational methods, and were
 able to suggest a more tailored strategy for parameter screening. Our contributions
 demonstrate the importance and the benefits of systematic robustness analysis of bioinformatics
 pipelines, and indicate that more efforts are needed to advance research in
 this area.</p><p>Web services are commonly used to provide interoperable, programmatic access to bioinformatics resources, and consequently, they are natural building blocks of bioinformatics
 analysis workflows. However, in the light of the data deluge, their usability
 for data-intensive applications has been questioned. We investigated applicability of
 standard Web services to high-throughput pipelines, and showed how throughput and
 performance of such pipelines can be improved. By developing two complementary approaches,
 that take advantage of established and proven optimization mechanisms, we
 were able to enhance Web service communication in a non-intrusive manner. The first
 strategy increases throughput ofWeb service interfaces by a stream-like invocation pattern.
 This additionally allows for data-pipelining between consecutive steps of a workflow.
 The second approach facilitated peer-to-peer data transfer between Web services
 to increase the capacity of the workflow engine. We evaluated the impact of the enhancements
 on genome-scale pipelines, and showed that high-throughput data analysis
 using standard Web service pipelines is possible, when the technology is used sensibly.
 However, considering the contemporary data volumes and their expected growth,
 methods capable of handling even larger data should be sought.</p><p>Systematic analysis of pipeline robustness requires intensive computations, which are
 particularly demanding for high-throughput pipelines. Providing more efficient methods
 of pipeline execution is fundamental for enabling such examinations on a largescale.
 Furthermore, the standardized interfaces of Web services facilitate automated
 executions, and are perfectly suited for coordinating large computational experiments.
 I speculate that, provided wide adoption of Web service technology in bioinformatics
 pipelines, large-scale quality control studies, such as robustness analysis, could be
 automated and performed routinely on newly published computational methods. This
 work contributes to realizing such a conception, providing technical basis for building
 the necessary infrastructure and suggesting methodology for robustness analysis.</p>
Date
2014-04-11
Type
Doctoral thesis
Identifier
oai:bora.uib.no:1956/7906
978-82-308-2967-7
http://hdl.handle.net/1956/7906
Copyright/License
Copyright the author. All rights reserved
合集
OAI Harvested Content

entitlement

 
DSpace software (copyright © 2002 - 2022)  DuraSpace
快速指南 | 聯絡我們
Open Repository is a service operated by 
Atmire NV
 

導出搜尋結果

導出選項允許您將輸入的查詢所產生的搜尋結果導出到一個檔案中。有不同的格式可以選擇下載。要導出項目,請點擊與最佳下載格式相對應的按鈕。

預設情況下,點擊導出按鈕會導致進行系統允許下,下載最大數量的項目。

要選擇搜索結果的子集,請點擊「選擇性導出」按鈕,然後選擇要導出的項目。每次可以導出的項目數量與完全導出受到同樣的限制。

"

作出選擇後,點擊其中一個導出格式按鈕。導出格式旁邊的氣泡中會顯示即將導出的項目數量。

"