So far we have had a look at how the Google car uses GPS and mapping to figure out where it is and where it is going, uses LIDAR to scan and map its environment and RADAR to detect objects in its immediate vicinity. There are, however, a few more things that the car needs to know about before it can operate safely and reliably in an environment as messy and complex as a suburban street.
Getting the Picture
To begin with, the car needs some means of determining whether it is coming up to a stop sign or a red traffic light. The stop sign could be included in the mapping data used by GPS, but the map won’t show temporary traffic signs or the state of the traffic lights that change constantly and there is no system – yet – to tell the car whether the traffic light is red, green or yellow. The car has to figure that out for itself.
At a school crossing, or in an area where there are road works in progress, the car needs to be able to determine the difference between a crossing guard holding up a sign that says “stop” (stop and wait) from a guard holding up a sign that says “slow” (move ahead slowly).
A camera on board the car could be used – and is used – to solve this problem. With the sophisticated high speed digital image processing available from companies such as Nvidia and AMD, the car is able to isolate pertinent parts of the image – such as traffic lights and street signs – to determine whether it is approaching a traffic light and what state the light is in.
It is interesting to note that the kind of processing required to find a traffic light in a street scene is not that much different to the kind of processing required to turn a software model of your favorite first person video game into an image on your computer monitor.
Image processing can be done by ordinary CPUs (Central Processing Units) such as those found in your computer, but CPUs are not well adapted to image processing and are a bit slow at it. Graphic card GPUs, on the other hand, are very good at the kinds of maths that are used for image processing and manipulation.
So What’s the Difference?
Let’s have a look at the way that GPUs differ to CPUs, and how they normally operate. Most image processing operations are done using a type of mathematics known as matrix transformations.
Matrix transformations work with numbers that have multiple dimensions. An example is a point in space that can be defined with three numbers – x, y and z – that are known as the point’s coordinates. The three numbers aren’t separate, all three of them are needed together to define our point. So if you did something like move the point, you will change all three of its coordinates at the same time.
In order to calculate where the point has moved to, a normal CPU would have to calculate the coordinates one at a time- first the X, then the Y, and then the Z. A GPU, on the other hand, can calculate all three coordinates at the same time because it has a lot of computation units that we know as processor cores.
The type of processing performed by a CPU is known as Single Instruction Single Data, or SISD, which means that it can perform one operation (such as addition) on one piece of data at a time before it goes and gets the next piece of data to work on.
By comparison, GPUs can perform the same operation on many different pieces of data simultaneously. This type of processing is known as Single Instruction Multiple Data (SIMD for short), and it is much better suited to doing the matrix transformations needed for image processing.
That said, most CPUs in use today have more than one processor core and actually use Multiple Instruction Multiple Data (MIMD) processing because they are able to do different operations on different data at the same time.
As an example, your top end Intel i7 extreme processor has 10 cores, although the operating system is written to use those cores to do lots of very different things at once but not on the same problem. That way, your computer can be reading and writing to the hard drive, sending email, recalculating your spreadsheet, downloading a file and playing Cookie Jam, among other things, all at the same time. In fact your computer is doing a lot of things in the background without you being aware of it.
Compare this with the Nvidia Quadro M6000 GPU that has an astonishing 3072 unified shader cores, or the ATI Radeon Pro WX 7100 with 2304 shader cores. The large number of cores allow these GPUs to do matrix calculations on hundreds or thousands of points simultaneously. They can only do this, however, because the calculations performed by the cores are almost all the same.
Even though it is, in theory, possible to run an operating system such as Linux on a GPU, it would not be very fast because the GPU just isn’t good at doing the kind of general purpose tasks that a CPU is. Similarly, trying to make a CPU do image processing would not work well because it isn’t good at doing lots of the same thing on a large number of similar objects, such as pixels.
The Google car, of course, is not so much interested in moving points around as it is in finding a green traffic light in a street scene. Matrix mathematics can be used to tell if a pixel (or a group of pixels) belongs to part of the background (a tree, for example), or something more interesting like a traffic light.
Of course, just knowing that the light is green is not enough. A human driver will be on the lookout for other vehicles that are about to encroach into their space – running a red light, for example – and take appropriate evasive action.
The Google car is no different. By the time the traffic light has turned green, the system has already spotted potential hazards such as other cars and pedestrians using its LIDAR and RADAR, and is tracking their movements. The car will only move on at a green light if the system is satisfied that there are no other cars, pedestrians or other objects moving into its path.
Next time, we’ll discuss how theses car knows how fast to go once they do take off from that green light.