Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration
Rakesh Vijayakumara Warrier
Date of Issue2016
School of Computer Science and Engineering
Centre for High Performance Embedded Systems
Field Programmable Gate Arrays (FPGAs) are now widely adopted as hardware accelerators due to their inherent parallel processing capability. However, the sub-optimal logic utilization and large reconfiguration latency in conventional single-context FPGAs pose constraints on their usage for applications like adaptive control systems in vehicles, software defined radio where frequent context switching or resource sharing between tasks are required. As such, multi-context FPGAs with dynamic reconfiguration capability have been introduced with the aim to allow rapid reconfiguration of the FPGA, and hence increase the effective logic density. The current generation of multi-context FPGAs typically use a dynamic reconfigurable architecture based on static Random Access Memory (RAM) to implement multiple configuration planes that enable fast switching between contexts. The main challenge of these types of multi-context FPGAs are limited on-chip storage and relatively long reconfiguration latency (of the order of milliseconds). With technology driving down to nano scale, new generations of hybrid multi-context FPGA architectures, such as the CMOS/NAnoTechnology reconfigURablE architecture (NATURE) that use on-chip nano RAMs to store multiple configurations to enable extremely fast runtime reconfiguration (of the order of pico seconds) have been developed. This type of FPGA enables cycle-by-cycle reconfiguration and temporal logic folding resulting in improved logic density and area-delay product by more than an order of magnitude compared to traditional FPGAs. However, the fine granularity of this type of architecture limits its usage as a high performance hardware accelerator that implements compute intensive arithmetic operations. This research work explores and presents how DSP architectures can be designed for the hybrid multi-context NATURE platform in order to fully exploit its advantages and possibilities. The performance of various compute intensive signal processing kernels are used in the study to benchmark the improvements achieved by the proposed DSP architectures. A full-block dynamically reconfigurable DSP architecture is first presented, which can be reconfigured at runtime to implement different arithmetic functions in different clock cycles. To fully exploit the capability of temporal logic folding techniques in NATURE, the DSP block is then extended to support pipeline level reconfiguration that allows independent reconfiguration of individual pipeline stages. To enable efficient implementation of mixed-precision applications, the capability to dynamically fracture the internal compute-path of the DSP block is also incorporated into the design. The design automation tool for the NATURE platform is extended to enable efficient mapping of compute intensive kernels utilizing the proposed DSP architecture(s) by exploring optimum resource sharing and area/power reduction. A design space exploration algorithm is developed and incorporated into the mapping tool that can determine the optimal configuration for a given input circuit, based on the design requirements and user constraints. The proposed technique automatically explores the different folding levels and DSP modes (configurations), evaluates their area/power trade-off and determines the most efficient mapping of the chosen configuration, which is subsequently fed to the mapping flow to generate the bitstream. The contributions of this work would allow system designers to design and map compute intensive arithmetic kernels on the next generation hybrid multi-context FPGA platforms with ease, while providing high computational performance and energy efficiency that are required for many modern applications.