Genome compression progresses toward standardization

At its 114th meeting, MPEG has progressed its exploration of genome compression toward formal standardization. The 114th meeting included a seminar to collect additional perspectives on genome data standardization, and a review of technologies that had been submitted in response to a Call for Evidence (CfE). The purpose of that CfE, which had been previously issued at the 113th meeting, was to assess whether new technologies could achieve better performance in terms of compression efficiency compared with currently used formats.

In all, 22 tools were evaluated. The results demonstrate that by integrating a multiplicity of these tools, it is possible to improve the compression of up to 27% with respect to the best state-of-the-art tool. With this evidence, MPEG has issued a Draft Call for Proposals (CfP) on Genomic Information Representation. The Draft CfP targets technologies for compressing raw and aligned genomic data and metadata for efficient storage and analysis.

As demonstrated by the results of the Call for Evidence, improved lossless compression of genomic data beyond the current state-of-the-art tools is achievable by combining and further developing them. The call also addresses quantized compression of the metadata which make up the dominant volume of the resulting compressed data. The Draft CfP seeks quantized compression technologies that can provide higher compression performance without affecting the accuracy of analysis application results. Responses to the Genomic Information Representation CfP will be evaluated prior to the 116th MPEG meeting in October 2016 (in Chengdu, China). An ad hoc group, co-chaired by Martin Golobiewski, convenor of Working Group 5 of ISO TC 276 (i.e. the ISO committee for Biotechnology) and Dr. Marco Mattavelli (for MPEG) will coordinate the receipt and pre-analysis of submissions received in respond to the call.

  • PIN