Analogue Solution to the Power-Hungry Field of Machine Learning

Originally published here.

Machine learning methods—although performing many cognitive tasks successfully—are very time- and power-consuming, and, consequently, bad for the environment. For example, training a large neural network can emit as much carbon dioxide as five cars in their lifetimes [1]. Because of the ever-increasing use of machine learning techniques (such as neural networks) it is important to consider whether it is possible to make them more efficient. The main roadblock for further rapid development might be our current hardware platforms that we use to implement machine learning. Alternatives, such as physically implemented neural networks, have been suggested to match the hardware to the task.

Currently, machine learning algorithms—like most others—are being implemented in digital computing systems; these rely on storing information in binary format and processing it using binary logic. In such digital computers, computations are well-controlled, predictable and accurate. However, even simple tasks are not trivial to realise and require a lot of space—storing an 8-bit number requires tens of transistors, while performing mathematical operations (e.g. multiplication) with such numbers requires thousands of transistors. Furthermore, conventional computers use an architecture which separates memory and computing. Such separation results in large amounts of information being shuffled between these two modules during data-intensive tasks. This presents an enormous challenge and a bottleneck for improving power efficiency.

Using analogue approach, as well as merging memory and computing modules together, has been suggested in the past. Analogue implementation would mean that devices, that are capable of encoding numbers on a continuous range, would be employed, thus increasing the density of encoded information. However, it is often feared that such systems would not be accurate enough for practical purposes. Nevertheless, such implementations could increase the speed and power efficiency of various computing operations by orders of magnitude. It is thus important to consider whether it is possible to employ them at least in some contexts without sacrificing the accuracy.

Artificial neural networks (ANNs) seem to be one of the machine learning models that can handle a certain amount of inaccuracy associated with analogue computing. ANNs are structures that consist of two main elements: neurons and synapses (see diagram in Figure 1). Synapses amplify the signals, while neurons add and transform those signals; as a whole they can perform tasks like classification (more information on how ANNs operate can be found here).

Architecture of artificial neural networks
Figure 1: Architecture of artificial neural networks with neurons represented using coloured circles and synapses as connections between those circles. Input (e.g. an image of a handwritten digit) is fed into the leftmost neuronal layer, after which a signal propagates to the right. A prediction (e.g. guessing what digit the image depicts) is made by the rightmost neuronal layer of the network.

Analogue implementations of ANNs could be realised using devices, called memristors, whose resistance can be easily varied—a key property of synaptic behaviour. Thus, memristors could implement the synapses of neural networks—devices would be arranged in two-dimensional structures, called crossbar arrays, that resemble the way synapses are arranged in ANNs. Most importantly, this would enable to easily perform matrix-vector multiplication operations that are the costliest ones in these networks. In this way, ANNs could be made orders of magnitude more efficient.

Resistive random-access memory (RRAM) devices—being a type of memristor—are one of the most promising candidates for such implementations. They can be manufactured using conventional semiconductor materials and are relatively easy to integrate with current computing systems. Although ANNs can tolerate some amount of inaccuracies due to their very parallel and interconnected nature, the more precisely one can program RRAM devices, the better the performance of an ANN will be. Thus, many groups working on RRAM are trying to optimise their various properties: dynamic range, chance of failure, programming non-linearities, current/voltage non-linearities, device-to-device variability, and others. However, there is little prioritisation of some properties over others because the way each of these non-idealities affect ANNs is not fully understood.

Our Approach

We decided to look at each of the properties (shown in Figure 2) separately and evaluate their effect on network performance. We simulated many different configurations of ANNs, and then disturbed their weights to reflect the effects of these RRAM properties. These randomized disturbances were applied multiple times to get a reliable estimate of the average effect that they have on physically implemented neural networks.

Properties of RRAM devices
Figure 2: Properties of RRAM devices that have an influence on ANN accuracy.

In our analysis [2], we discovered that different non-idealities have very different effects on ANN performance. We found that in realistic scenarios small proportion of devices failing, and programming or current/voltage curves being non-linear, are tolerable—the decrease in inference accuracy is relatively small. The range of resistances that the devices can be set to, i.e. the dynamic range, can have a much more detrimental effect. However, it can be mitigated by employing a different mapping scheme. That is, it is possible to represent the synaptic weights using resistances so that the accuracy would not be affected, even if the range of those resistances is small. We found that the most important factor affecting accuracy is device-to-device variability. When the manufactured devices do not respond to the inputs in the same way, the accuracy can drop considerably. Besides, this non-ideality is more difficult to deal with as it cannot be avoided by simply using a different mapping or programming scheme.

Although some qualitative trends were observed, at this moment it is difficult to make generalised quantitative conclusions that would be applicable to all RRAM devices. The nature of non-idealities differs not only with different materials, but also with different physical dimensions of the devices. Although the effects of properties like dynamic resistance range are applicable to many different variants of RRAM devices, some other non-idealities, such as device-to-device variability, can vary a lot between differently manufactured devices. Besides, non-idealities can manifest themselves differently when using different network architectures. Thus, this is just the beginning of an exploration to build a more complete picture of the non-idealities of RRAM devices and the effect they have on the neural networks that they constitute.

Where are we at?

Analogue devices, such as RRAM, could potentially solve problems of high power consumption of ANNs. Although there are several competing types of analogue devices being considered, a focused approach to device optimisation will be key to their successful integration into mainstream computing systems. Instead of optimising device properties that are important in conventional memory technology, it will be crucial to understand what role each of them plays in the specific context of neural networks.

Despite the rapid development of RRAM devices’ technology, the discussion about the relative importance of their various non-idealities in physically implemented ANNs has been very limited. Thus, it is still difficult to take a structured approach to the optimisation of these devices. Our simulation results show that from all RRAM device non-idealities, device-to-device variability can have the largest effect. Although the nature of this and other non-idealities can differ in different types of RRAM devices, a systematic approach that we take provides a more comprehensive understanding of how various device properties affect ANN accuracy. We hope that this analysis will inform researchers trying to optimise RRAM devices suited for physical implementations of neural networks and that it will accelerate the growth of this exciting field even further.

Thanks to Dr Sunny Bains for reading the drafts of this post and for helping me organize my thoughts.

References

  1. E. Strubell, A. Ganesh, and A. McCallum, Energy and policy considerations for deep learning in NLP, 2019. [Online]. Available: http://arxiv.org/abs/1906.02243
  2. A. Mehonic, D. Joksas, W. Ng, M. Buckwell, and A. Kenyon, Simulation of inference accuracy using realistic RRAM devices, Frontiers in Neuroscience, vol. 13, p. 593, 2019. doi:10.3389/fnins.2019.00593