Neurala claims 'lifelong deep neural nets' don't forget
Boston startup Neurala says it has developed deep neural networks that can learn on the fly. Neurala's COO Heather Ames explains.
Can deep learning be done at the edge of the network, in real time, without a team of data scientists in attendance? That's the promise of Boston-based startup Neurala Inc. and its twist on deep learning, a technology it's dubbed lifelong deep neural networks or L-DNNs.
L-DNNs are designed to "overcome the catastrophic forgetting" problem encountered with traditional deep neural nets, technology that uses a hierarchy of algorithms and layers of processing to produce an outcome.
Deep neural nets learn sequentially. To teach a deep neural net to recognize a new object, data scientists have to start the entire training process over, which requires time and computational power via the cloud. L-DNNs, according to its inventors, learn incrementally or can hold on to what they've learned while adding new knowledge on the fly -- hence the "L" for "lifelong."
Neurala COO Heather Ames recently sat down with SearchCIO to talk about L-DNNs and use cases. In late 2016, Neurala announced a $14 million Series A round of funding, enabling the startup to move operations from the Boston University campus to the city's Seaport District and hire an additional 20 employees. Recently named one of top 100 AI companies by Fortune Magazine, Ames said that this year, Neurala will "show people what we've got."
Explain for our CIO readers how lifelong deep neural nets differ from traditional DNNs.
Heather Ames: Lifelong DNNs overcome some of the issues you find in traditional deep neural nets. What it allows you to do is add knowledge on the fly, in real time. L-DNNs can update immediately without having to do any retraining of the system. You don't have to go back and retrain on everything you've already learned. It keeps that.
The other thing, which we're still figuring out how to quantify the extent of but looks promising, is the amount of data needed to [initially] train the system. When you train a DNN, we're looking at tens to hundreds of thousands of images per object type just to get a decent amount of generalization or use from it. We're looking to drop that down significantly in the L-DNN case.
Let's say you trained up a DNN on people and dogs. You port it onto your cellphone and you're walking around with your cellphone looking at people and dogs and taking video around you. It does a pretty good job of catching most people, but when it comes to dogs, it misses black labs because, perhaps, the data set you used to train it didn't include any black labs.
But there are black labs everywhere where you are. So what a lifelong DNN allows you to do is to train on that black lab right then and there. You aim the device at the dog, put that into your video feed and tell the system, hey this is also a dog. You have a [user interface] to outline the dog, label it, and now your system can automatically update that new information and add it to its understanding of dogs.
Neurala Inc. at a glance
- A deep learning startup
- Co-founded in 2006 by three Ph.D. students at Boston University
- Cut its teeth on a project for NASA and space exploration
Can you elaborate on "object type"?
Ames: Say you have a Photoshop application and you want to be able to find people in your pictures. You need to initially train the system on what are people because it doesn't know that out of the box. In order to do that, you need between 10 and 100,000 images of people that show where the people are in the image -- so they're tagged as people in that image -- and you need to feed that into the system.
The next step is to run training sets so you build hundreds if not thousands of different DNNs running on that data -- re-presenting the data over and over. So it's not just one presentation of that data, it could be 500 presentations of the data, 1,000 presentations of the data. That system can take days to run to try to get it to a place where it can perform.
With L-DNN technology, it doesn't need that much data to perform, because it does a much more efficient job of learning.
How so?
Ames: It's a matter of the algorithm that we have underneath. It's a proprietary algorithm that essentially allows the addition of information without losing things it's already learned. When you run these large-scale DNNs, you're doing so many computations to try and find an optimal solution, that's why it takes a long time. [Our] system is able to robustly learn really quickly.
Where is this type of technology being applied?
Ames: We are looking at things like drone inspection. These are cases where people may be flying a drone around a piece of infrastructure. They're looking for damage. They may have a human operator that may see a different kind of damage that they haven't seen before. They've trained the system to look for rust, but they've come to this new location and find out that a flock of geese has been hanging out, and now you have geese droppings all over that piece of infrastructure. And the system needs to learn that right then and there. So the operator could do that and send a drone back up without having to go back to their headquarters to relearn the system and retrain the system on this new kind of damage.
It also could be used in situations where you have a set of objects, maybe it's a supply chain question or a retail question, so you have a set of labeled objects -- food, plates, cups -- and there's an object your system doesn't understand. L-DNNs can highlight unknown objects and allow a human operator to name them in real time. So if you have a restaurant tray with labeled objects and a kid leaves his toy behind -- this isn't a use case we're doing, I'm just giving you an example -- our system would tag that as unknown, it's never seen this before, and a human operator could label it 'toy' and so the system would learn what a toy is.