Motivation

In science fiction, communications networks allow characters to talk to each other over videophones, and robots can navigate through and manipulate their environment based entirely on visual data. In reality, these tasks are currently impossible. The main reason is that although algorithms for compressing or interpreting scene information are numerous and well documented, the vast amounts of data present in every scene, and the speed with which it changes, quickly overwhelms any computer's processing ability.

There are two main reasons why computer-based algorithms struggle to keep up. Firstly, the data must be converted from an analogue to a digital representation. This dramatically increases the cost of transmitting the information from one part of a circuit to another. If, say, a 16 bit representation of pixel intensity is used, then either a 16 bit bus must be used to carry the data, increasing the cost of the system by a factor of 16, or the data can be multiplexed on a single line, thereby reducing the speed by a factor of 16.

Secondly, and more importantly, every computation must pass through a single CPU, in a serial manner. Although each computation is relatively straightforward, the bottleneck of the CPU greatly limits the size of problem that can be solved [7].

A solution to both these problems is to break with the whole notion of digital processing, and perform all computations with analogue circuitry at the point of image capture. The data can be processed in a much more ``natural'' form at the point of capture, utilising the natural mathematical properties of the circuit elements on silicon chips. It turns out that some operations, such as image smoothing, can be implemented by trivially simple circuitry, whereas on a computer they may require thousands of operations for each pixel. Furthermore, every pixel can have its own dedicated circuitry, all operating in parallel. Digital computational circuitry is so complicated that it cannot possibly be used in this fashion with current technology; there is no way a general purpose CPU can be affixed to every pixel in a video camera.

A useful model for this kind of system is the retina of the human eye. Much of the processing required for human vision is performed at the retina and not within the bulk of the brain. The retina has the structure shown in Figure 1.1. At the rear of the retina are the rods and cones, which are light sensitive nerves. The signals from the rods and cones are then passed through a number of layers of processing cells that perform operations such as spatial differentiation. Although the properties of these cells have been investigated thoroughly [9], the function of these cells as a neural network is still very poorly understood [16]. By the time the signal has propagated to the optic nerve, the image is represented as a high-level description of the scene, which requires a far lower density of nerve fibres for transmission to the brain.

A limit to the performance of the human eye is that neural activity is relatively slow compared with electronic systems. This has led to the ``100 step rule'' being proposed [3], which is that it should be possible to construct a system which processes an image in less than 100 steps, because that is what the human eye achieves. Any digital system will obviously take much more than a hundred steps. Analogue processing of the type investigated in this thesis should help in achieving a hundred step system.

**Figure 1.1:** Structure of the human retina.
$\begin{figure}\par \centerline{\psfig{figure=scan/retina.ps,width=6cm,angle=0}} \par\end{figure}$

The main disadvantage of analogue processing of this type is that it is far less flexible. In a digital system, extra operations can easily be added to the processing system. In an analogue system this is generally impossible. Furthermore, in a digital system, parameters are easily adjustable, whereas in an analogue system, many parameters are often fixed by the fabrication process. If they have to be adjustable, this usually requires additional circuitry, which adds to the complexity and cost of the system. Therefore, the best application for this circuitry is embedded systems that have a clearly defined, limited purpose.

Matthew Exon 2004-05-23