Application Driven Memory - Circuits and Architecture
Abstract
As smartphones and portable devices have become mainstream, the battery-life demands of end users continue to drive research into power efficient designs. Given the limited improvement in battery technology over the years, restricting power consumption while maintaining performance continues to challenge the electronics and computer industry. In particular, the computation and storage demands have far exceeded the available resources in mobile and deeply embedded devices. Also, as we move towards sub 10nm technologies, large sized memory become part of the processor chips. Hence, memory architectures provide opportunities for improved power and speed performance of computing systems. In this research, we build upon our work in memory circuits and architectures towards novel applications, such as machine learning, with a goal towards improved experience for the users.Our initial work has focused on improved SRAM, register files and content-addressable memories: (i) ultra-low voltage embedded SRAM for low power mobile video applications, (ii) novel clock-biased local bit line scheme for high performance energy efficient register files, (iii) a circuit/architecture co-design technique for power efficient register files, (iv) content-addressable memory (CAM), with emphasis on pipelining and (v) a new CAM technique that improves the performance of the CAM structure significantly. We integrate these studies in developing reliable, low power and high performance memory systems, register files and content-addressable memory designs and specifically focused on their impact in in-memory computation architectures for different machine learning models.We have investigated how in-memory computation architectures can improve the speed and reduce the power consumed for modern machine learning applications. For today’s machine learning (ML) applications, the challenge is that the state of the art ML models are complex. Traditional ML computation requires a nontrivial set of paths to access and transfer data. Because of the large number of data access per calculation, energy spent on transferring data from memories usually are orders-of-magnitude higher than for computation. Managing or preventing data movement can significantly increase the speed and energy efficiency of many ML tasks. Towards this end, an energy efficient in-memory computing kernel for an ML linear classifier has been developed. Compared with conventional discrete systems, this scheme achieves over six times power improvement, while improving the reliability by about 55%. A split-data-aware technique to manage process, voltage and temperature variations is introduced to conquer the reliability issue. A tri-modal architecture and hierarchical tree structure are used to further restrict power consumption. Our scheme provides a fast, energy efficient, and competitively accurate binary classification kernel.