Genesis: A Framework for Accelerating Genomic Data Analytics

Conceptualize genomic data as a very large relational database and accelerate genomic analytics using a genomic hardware library built from SQL and SQL-extension operators.

Our vision to accelerate algorithms in the domain of genomic data analysis is to use a framework called Genesis (genome analysis). Genesis consists of an interface and an implementation of a system that processes genomic data efficiently. This framework can be deployed in the cloud and exploit the FPGAs-as-a-service paradigm to provide cost-efficient secondary DNA analysis.

We propose conceptualizing genomic reads and associated read attributes as a very large relational database and using extended SQL as a domain-specific language to construct queries that form various data manipulation operations. To accelerate such queries, we design a Genesis hardware library which comprises composable primitive hardware modules to form a specialized dataflow architecture.

As a proof of concept for the Genesis framework, we present the architecture and the hardware implementation of several genomic analysis stages in the secondary analysis pipeline corresponding to the GATK4 workflow proposed by the Broad Institute. We construct genomic data analysis operations using a sequence of SQL-style queries and show how Genesis hardware library modules can be utilized to compose the accelerated hardware pipelines. Our accelerated system deployed on the cloud FPGA performs up to 19.3× better than GATK4 running on a commodity multi-core Xeon server and obtains up to 15× better cost savings.

Publications

Tae Jun Ham, David Bruns-Smith, Brendan Sweeney, Yejin Lee, Seong Hoon Seo, U Gyeong Song, Young H. Oh, Krste Asanović, Jae W. Lee, and Lisa Wu Wills, "Genesis: A Hardware Acceleration Framework for Genomic Data Analysis". International Symposium on Computer Architecture (ISCA) 2020.