Accelerating Biological Imaging: Real Time Differential Image Compression
Experimental biology has driven the need for high-resolution, high-speed imaging systems. While modern CMOS sensors provide both of these characteristics, many of these sensors must be combined to cover a large field of view. However, it is the combination of image sensors which introduces a problem. There is limited bandwidth to get data from the imaging device and to persistent storage. We solve this problem by leveraging the large amount of inter-image redundancy present within high-speed biological imaging to perform compression with minimal computational complexity. Our system yields up to 60x compression with relatively low losses and will enable the development of higher throughput imaging systems, accelerating biological imaging.
Problem Statement
Biological imaging is becoming an increasingly important tool within neurobiology, as it provides insight into the inner workings of organisms using high-resolution high-speed microscopes. As imaging technology has improved, researchers have been able to image larger and larger fields of view. These imaging systems are limited however, not by the sensors or optics, but by the bandwidth available to move data from these sensors to persistent storage devices. As a result, current cutting edge imaging systems are not taking full advantage of their hardware capabilities, imaging at lower frame-rates or reduced fields of view to stay under the bandwidth limit. Figure above shows the original and differential images for drosophila (fruit flies) taken at 1Hz. Figure below shows the original and differential images for zebra fish taken at 1Hz.
Within this work we outline one method of overcoming this bandwidth limit: using real-time compression. Although video compression has advanced rapidly in recent years, the computational complexity and memory usage of modern compression algorithms limit the feasibility of their inclusion in real-time image processing pipelines. To address this problem, we introduce a simple compression scheme that has a low computational footprint. We use differential imaging taking advantage of the large amount of inter-image redundancy between subsequent frames. By compressing only the difference between frames, we are able to achieve up to 60x compression rates using low-complexity low-loss compression algorithms.
Critically we relate compression rate to overall system performance. By reducing the amount of data required to represent an image we allow more images to be transferred across the same bandwidth connection. Therefore allowing a higher throughput system to be realized either through increasing the frame-rate of imaging sensors or by integrating more sensors into an imaging system.
Proposed Solution
We propose using of a series of simple and well established techniques to reduce the amount of information that needs to be transmitted. The first observation is that most pixels change very little or not at all between frames. Many of the pixels that do change, vary imperceptibly due to ambient conditions and noise in the camera sensor. To leverage this, we compress only the difference between frames, applying a small threshold on the difference to reduce noise. Values below the threshold are set to zero. This makes the irregular and difficult to compress pixel values much easier to compress, by making most of the pixels that need to be sent zero. Figures 1 and 2 show the result of applying this process to sample image sequences.
Next, we observed that changes in the frame are not randomly distributed, they tend to occur in contiguous blocks of pixels corresponding to the movement of the specimens. This means that we are likely to see long runs of zeroes before encountering short bursts representing the specimen. This makes the data an ideal candidate for run-length encoding (RLE). The final key insight was in the data after RLE had been performed. The length values would tend to be strongly skewed towards one (which is the minimum value and would appear frequently in cases where a specimen has moved) or 255 (which is the maximum run-length in our testing). The pixel values would tend to be zero, since most of the image remains unchanged. This makes the output of the RLE ideal for Huffman encoding, providing additional compression gains. Finally, since the output of the Huffman encoders has a variable bit width, a final bit-packing step allows us to send the variable width data over a fixed-width bus.
System Design
There were several key goals for our design, accomplishing these would allow the design to be real-time and allow researchers to tune the system for their specific application:
- Use streaming dataflow to enable real-time processing (and avoid data loss)
- Configurable bandwidth-quality trade-off
- Configurable pipeline width to improve throughput
- FPGA resource utilisation should be low enough to allow the design to be replicated many times on a single FPGA (to handle multiple cameras)
Evaluation
We evaluated the performance of our system across two experimental species: zebra fish and drosophila. With a noise threshold cutoff of 10 for 8-bit pixels we established a compression rate of 60x for zebra fish and 40x for drosophila. By realizing these high compression rates we lower the bandwidth requirement for these systems allowing 40-60x higher frame rates if the theoretical image sensors are capable. The hardware design was able to run at the full 125 MHz speed it was compiled for, proving that it would be able to match the rate of an incoming AXI4 stream. In addition, our hardware run on the FPGA of just 10 images achieved a net compression ratio of just under 10x since we produced 70518 bytes for an input size of 655360 bytes.
Conclusion
The need to capture vast amounts of image data in biological research drives the need for a configurable accelerator to help overcome bandwidth constraints. In this report, we have outlined a technique to leverage the inter-frame redundancy that is especially present in studying biological specimens to create an efficient design that significantly reduce the required bandwidth.
We have demonstrated the proposed scheme can provide extremely high compression rates of over 40x on real biological data with minimal loss, allowing an up to 40x increase in frame-rate. In addition, the necessary hardware can be implemented on an FPGA and the proposed pipeline is able to handle the incoming data stream in real-time.