|
|||||
| Current Issue | Past Issues | About YSM | Subscriptions | Advertisements | Contact Us |
| 77.4 - Summer 2004 | ||||||||||||
|
||||||||||||
| Search YSM Articles | |
|
|
|
|
Electronic Eyes and Silicon Brains
The search to make robots that can see |
Printable Version |
By Gregory Jordan
"[This] marks the beginning of what I hope will be a long and fruitful collaboration,” began Steven Zucker, professor of computer science and biomedical engineering, “between the Departments of Computer Science and Psychology, and more specifically, mine and Dr. [Brian] Scholl’s research groups.”
The time is ripe for coordination between Yale’s vision research groups. Such interdepartmental commingling could lead to many useful innovations such as robots that think and see with superhuman acuity, medical diagnostic techniques that automatically map the heart and brain, and widespread affordable bionic security devices. Vision is a vital application for any autonomous system, and the current research at Yale shows promise to become the basis for advanced technologies that will forever shape the way our society’s electronic eyes and silicon brains perceive the world around them.
For the past 25 years, Zucker has been working on the problem of computational vision. His Computational Vision Group takes a multidisciplinary approach, drawing from mathematics, computer science, and neurobiology. Ultimately, their goal is to build a “theory of computational vision from basic principles.” Unlike the Scholl lab, which uses psychological experiments to determine how the higher-level “visual mind” works (see “Out of Sight, Not Out of Mind,” p. 20), Zucker and his team are working from the ground up, building mathematical models and algorithms that simulate the very first levels of processing in the first visual area of the brain, known as the V1 cortex.
Levels of AbstractionThe key idea in computer vision is abstraction. Zucker’s area of expertise is in discovering mathematical functions that describe the activity of cells in the visual cortex — abstraction from biological to mathematical terms. The neurons in the V1 area of the visual cortex show what Zucker calls a “curious but beautifully structured” arrangement. Abstractly, they are stacked on top of each other into tiny columns. Using a mathematical approach, Zucker showed that one can view the columns as if they are working together to maximize a function, the output of which serves as the basis for the later visual processing in the brain.
Individually, each neuron is just acting like a local device, giving a signal output only if its multiple inputs “add up” to a certain threshold. The neuron turns either on or off like a simple switch. However, the highly complex structure of the connections between neurons allows them to perform some sophisticated mathematical operations.
“Imagine a thousand people in a room,” Zucker explained, “and each of them can only see the ground that he is standing on. How should they communicate with each other to develop an understanding of the entire room’s landscape?” By talking to one another. “That’s exactly what the visual cortex is doing — our brains process millions of local solutions in order to solve global problems.” This new way of looking at V1 caused a bit of a stir when it was first published, because it was previously thought that V1 was not “smart” enough to do advanced processing. It is now known to be even more visually adept, with an ability to make inferences about boundaries, shading, color, and even basic stereo information.
Three Steps to Computer VisionOnce a secure mathematical understanding of each component of the visual system is obtained, the next step is its translation into computer programs. The first step in this endeavor are the primary processes such as edge filtering, noise reduction, and thresholding — all of which clean up an image, saving only the information that is useful for what the computer is trying to do. The next step usually involves identification and localization of some sort, and the last step is to produce a sensible reaction to the input.
Let us say you have an image from a digital video camera and want your security system to be able to distinguish a tree branch from a burglar. First, you must turn the colorful, noisy image into something a bit more computer friendly — namely, a binary image (see Figure 1).
Figure 1. A photograph of Zucker (top) and the same photo after being binarized by a special operator called an edge detector (bottom). (Credit: Steven Zucker)
In order to separate the edges within an image from its smooth areas, an edge filter keeps only the parts of the image that have quick changes in brightness. Picture a two-dimensional topography map of a mountain, with the height determined by the brightness at each pixel. While the entire edge-finding process is more complicated, on a basic level the computer finds the edges by keeping only the steepest slopes of the mountain.
Next, the object might be identified by comparing it to a database of known objects. The computer matches the object to the database entry that has the highest correlative response, and it now has a representation of where the object is, how fast it is moving, and what it is.
Suppose the computer identified the object as a human — a potential burglar. The final step, effecting a response, might be turning on a light that gets brighter as the intruder gets closer, or an alarm that gets triggered once he gets within a certain distance of the house. The possibilities are limitless, and this is part of the fun — and difficulty — of computer vision.
Figure 2. The colorful output from the advanced edge operator illustrates a beautiful complexity contained within. Bright lines, dark lines, and edges must be identified and separated in a logical manner. (Credit: Steven Zucker)
One well-established area for computational vision is in medical imaging. Back when computed tomography (CT) was a fledgling technology, Zucker created the first software to compute the tangent planes to surfaces from CT scans. Today, he still works closely with the Yale Image Processing and Analysis Group (IPAG) at the Medical School. Gang Li GRD ’06 recently published a technique for deriving the three-dimensional structure of branched trees from a stereo image (picture how a monkey might process the stereo image while navigating through the trees in a forest). As it turns out, brain surgeons face a very similar visual problem when navigating the blood vessels of patients’ brains; a potential solution could make use of Li’s new approach, greatly increasing the utility of computer vision in surgical preparation.
Robotics is probably the most promising prospect for the Computational Vision Group. Brian Scassellati, assistant professor of computer science, is working on a sociable humanoid robot, whose current visual processing ability can be greatly enhanced by computational vision. After all, social communication between humans occurs as often through sight as it does sound. In order for our robots to interact with people, they must first learn to see.
Some say that human sensation and perception defines human experience. If this is true, then to create humanoid robots, we must equip them with the ability to see, think and feel. Bringing together Zucker’s computations, Scassellati’s social robots, and Scholl’s cognitive research, a Titan of a robot could be in the future of Yale Computer Science. Three worlds are slowly but surely converging into one, and the focal point is unavoidably vision.
About the Author| Science Links |
Copyright 2013 Yale Scientific Publications, Inc. - Disclaimer