MIT researchers develop technique for driverless cars and robots to spot objects amid clutter



Researchers at MIT say that they have developed a technique that allows robots to quickly identify objects hidden in a three-dimensional cloud of data.

According to the researchers, sensors that collect and translate a visual scene into a matrix of dots help robots “see” their environment. The researchers note, though, that conventional techniques that try to pick out objects from such clouds of dots, or point clouds, can do so with either speed or accuracy, but not both.

With the new technique developed by MIT researchers, it takes a robot just seconds from when it receives the visual data to accurately pick out an object that is otherwise obscured within a dense cloud of dots, such as a small animal. This technique can help improve a variety of situations in which machine perception must be both speedy and accurate, the researchers say, including driverless cars and robotic assistants in the factory and the home.

“The surprising thing about this work is, if I ask you to find a bunny in this cloud of thousands of points, there’s no way you could do that,” says Luca Carlone, assistant professor of aeronautics and astronautics and a member of MIT’s Laboratory for Information and Decision Systems (LIDS).

“But our algorithm is able to see the object through all this clutter. So we’re getting to a level of superhuman performance in localizing objects.”

Currently, robots attempt to identify objects in a point cloud by comparing a template object, i.e. a 3-D dot representation of an object such as a rabbit, with a point cloud representation of the real world that may contain that object.

The template image includes collections of dots—also known as features—that indicate characteristic curvatures or angles of that object, such as the ear or tail of the bunny. Similar features from the real-life point cloud are first extracted by existing algorithms, and then those algorithms attempt to match those features and the template’s features, and ultimately rotate and align the features to the template to determine if the point cloud contains the object in question.

The point cloud data that streams into a robot’s sensor includes errors, though, as the dots are in the wrong position or incorrectly spaced, which can cause great confusion in the process of feature extraction and matching. Consequently, robots can make a lot of wrong associations—or “outliers,” as researchers call them—between point clouds, which ultimately leads to the misidentification of objects, or missing them entirely.

According to Carlone, state-of-the-art algorithms can recognize the bad associations from the good once features have been matched, but this can take an “exponential” amount of time. While accurate, these techniques are “impractical” for analyzing larger, real-life datasets containing dense point clouds, researchers say.

According to the researchers, other algorithms that can quickly identify features and associations “do so hastily,” which creates a lot of outliers or misdetections in the process, without being aware of these errors.

“That’s terrible if this is running on a self-driving car, or any safety-critical application,” Carlone says. “Failing without knowing you’re failing is the worst thing an algorithm can do.”

With this in mind, Carlone and graduate student Heng Yang have developed a technique that “prunes away” outliers in “polynomial time,” which means that it can do so quickly, even for increasingly dense clouds of dots. As a result, the technique can quickly and accurately identify objects hidden in cluttered scenes.

Conventional techniques were first used by researchers to extract features of a template object from a point cloud. After this, the researchers developed a three-step process to match the size, position, and orientation of the object in a point cloud with the template object, while at the same time identifying good from bad feature associations.

To prune outliers and match an object’s size and position, researchers developed an “adaptive voting scheme” algorithm. For size, the algorithm makes associations between template and point cloud features, and then compares the relative distance between features in a template and corresponding features in the point cloud. If, for instance, the distance between two features in the point cloud is five times that of the corresponding points in the template, the algorithm assigns a “vote” to the hypothesis that the object is five times larger than the template object.

The algorithm does this for every feature association, and it then selects those associations that fall under the size hypothesis with the most votes, and identifies those as the correct associations, while pruning away the others.

“In this way, the technique simultaneously reveals the correct associations and the relative size of the object represented by those associations. The same process is used to determine the object’s position,” the researchers explain.

For rotation, the researchers developed a separate algorithm, which finds the orientation of the template object in three-dimensional space.

Researchers says that this “is an incredibly tricky computational task.”

“Imagine holding a mug and trying to tilt it just so, to match a blurry image of something that might be that same mug,” the researchers say. “There are any number of angles you could tilt that mug, and each of those angles has a certain likelihood of matching the blurry image.”

To handle this problem, current techniques consider each possible tilt or rotation of the object as a “cost,” so the lower the cost, the more likely that that rotation creates an accurate match between features. A topographic map of sorts represents each rotation and associated cost, consisting of a multiple hills and valleys, with lower elevations associated with lower cost.

According to Carlone, though, this can easily confuse an algorithm, especially if there are a number of different valleys and no discernible lowest point representing the “true, exact match between a particular rotation of an object and the object in a point cloud.” So the MIT team has developed a “convex relaxation” algorithm that simplifies the topographic map, with one single valley representing the optimal rotation.

“In this way, the algorithm is able to quickly identify the rotation that defines the orientation of the object in the point cloud,” the researchers say.

With its approach, the team says that it was able to quickly and accurately identify three different objects hidden in point clouds of increasing density. The team was also able to identify objects in real-life scenes, including a living room, in which the algorithm was able to quickly spot a cereal box and a baseball hat.

Because the approach can work in “polynomial time,” it can be easily scaled up to analyze even denser point clouds, which resembles the complexity of sensor data for driverless cars, for example.

“Navigation, collaborative manufacturing, domestic robots, search and rescue, and self-driving cars is where we hope to make an impact,” Carlone says.

Photo below: Robots currently attempt to identify objects in a point cloud by comparing a template object — a 3-D dot representation of an object, such as a rabbit — with a point cloud representation of the real world that may contain that object. Image: Christine Daniloff, MIT