How Uber AI Labs used filters to fix a ConvNets deficiency

Can ConvNets learn new tricks? Uber AI Labs research scientist unpacks how her team identified and fixed a major deficiency in its convolutional neural networks.

Convolutional neural networks have the capability to perform feats of image processing, but they can also become a little lost, as scientists from Uber AI Labs recently discovered.

The scientists were trying to generate images of objects in motion, but there were problems with the output, and when they drilled down they found that the convolutional neural network (CNN) could not reliably mark a single pixel on a two-dimensional field.

"ConvNets really don't know how to paint a pixel," Rosanne Liu, a research scientist at Uber told an audience at the Global Artificial Intelligence Conference in Boston on Thursday.

Even though ConvNets can create realistic images and outline objects more deftly than the average person, they kept missing the correct spot. A person would have little difficulty with the same task of making a mark at a particular point on a grid, according to Liu.

Rosanne Liu, a research scientist, Uber AI Labs
Rosanne Liu, a research scientist at Uber AI Labs, talks about how her team fixed a ConvNet's deficiency at the Global Artificial Intelligence Conference.

"Some really simple tasks like this one is where it breaks and that's surprising," Liu said after her talk. She said, "We were trying to do something else, and then we felt like this should be an easy thing to do, and somehow it wasn't able to do it. So we reduced the problem little by little until we reached that task."

Uber is one of several technology companies experimenting with autonomous vehicles that might one day become a primary means of transportation, and ConvNets now represent a key aspect of how self-driving cars see the streetscape around them. If developers succeed in supplanting human-operated automobiles that would have major implications on supply chains, mobility and the look and feel of cities built around the car.

After identifying the problem, the Uber scientists set about solving it, and in the process they enhanced the ability of the ConvNets to perform a number of other tasks, according to Liu.

The scientists added two filters with coordinates that allow the nodes or "artificial neurons" on the network to know where they fit in the bigger picture. Without that, each one behaved like a traveler who had zoomed in on a map of an intersection without knowing whether the crossroads was in Dallas or Detroit or somewhere in between, according to Liu.

"[The filters provide] a way of zooming out and telling you, 'Actually in this big map you are in this northeast corner of the states,'" Liu said.

ConvNet evolves to CoordConv

Learn more about "An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution," at the Uber AI Lab website, where scientists break down the problem and solution in detail.

With the two filters added, the new system -- dubbed CoordConv -- nailed the single pixel task with a fraction of the training required of the less accurate ConvNet, according to Liu. The CoordConv could also detect objects on a canvas more quickly and it could generate images of moving objects that look smoother without the same fuzz and other imperfections created without the filters. CoordConv could paint a moving image of a room's interior without the glitchy appearance of objects fading in and out of the frame, which occurred without the filters, according to Liu's presentation.

The new approach, however, was only marginally better at classifying images and there was mixed success when it was applied to playing old Atari games. CoordConv excelled at Ms. Pacman -- where navigating the maze determines success -- but it was slightly worse than the CNN at playing Asteroids -- where a spaceship blasts rocks heading for it from all directions.

Some really simple tasks like this one is where it breaks and that's surprising.
Rosanne Liuresearch scientist, Uber AI Labs

With additional coordinating filters, ConvNets could also be able to have a better understanding of three-dimensional spaces, according to Liu.

Indeed, ConvNets have a number of applications, but one obvious one for the ride-hailing company that sponsored the research is autonomous vehicles. As with human drivers, autonomous vehicles must know who and what is around them to operate safely.

"Part of it is vision -- to look at the road and figure out this is the road; this is a pedestrian. For that part, definitely, [convolutional neural networks] are being used, at least for now," Liu said after her talk.

Neural networks in general are a "black box," according to Liu and other attendees, who said the particular machinations at work in the networks remain a mystery. The vastness of neural networks adds to their inscrutability, according to Liu.

Identifying limitations within ConvNets and potential solutions to them is a positive development, according to Nikhil Dighe, who attended the talk and works in risk management for a financial institution.

"There is a typical thing in the industry where, when enough people say something works, people start blindly using it, but over a bigger problem we find that it doesn't work," Dighe said after the talk, noting that the "tweaks" seem to have fixed ConvNets' deficiency in placing a single pixel. Not understanding how the technology works can put a business at risk. "If you don't really understand the basis of your decision, can you really trust it?"

Liu had a similar take on the importance of the research.

"To be able to use anything, you should understand that thing, and the key thing to understanding anything is to understand its failures -- not only just the good parts but also where it fails," she said.

The Uber AI Labs paper on CoordConv will be presented at the Neural Information Processing Systems in December, Liu said.

Dig Deeper on Digital transformation