ustas - stock.adobe.com

Wayfair takes a dip into NLP image processing technology

At Wayfair, using computer vision and NLP to understand the meaning behind images and searches is the key to customer recommendation, satisfaction and easy substitutability.

Boston-based Wayfair LLC is on the pursuit of perfecting image processing techniques. As an e-commerce furniture site, recognizing searcher intent, substitutability and likelihood to purchase is crucial to maintaining the company's business. By utilizing elements of computer vision, deep learning and recommendation engines, the data science team is trying to understand at a deeper level consumers' aesthetic and financial motivations for purchase.  

Dan Wulin, head of data science and machine learning at Wayfair, says his team's road to NLP image processing -- adding a deeper level of machine understanding of text components to visual search tools -- begins with layering open source computer vision software with three data sets, and taking advantage of technology's potential to overlap for complex image understanding. In this interview, Wulin explains how it all works.

What made you choose to utilize NLP with your computer vision technologies?

Dan Wulin: From a business side, why we care about computer vision has to do with our vertical. We're super visually oriented. Folks come in with some idea of what they are looking to buy, but the challenge is how to help them articulate that as a customer. Most people aren't going to know about different styles of furniture, what they're called or the different adjectives to use. Anything that we can do to bridge the gap in an intuitive way -- where people can filter through the catalog and browse in a visually oriented way -- helps us fundamentally as a business.

Dan Wulin, head of data Science and machine learning, WayfairDan Wulin

We have features that just aren't observable necessarily [in photos]. There are questions around material, where maybe it's hard to see in the picture if it's actual wood or artificial wood. Price point is the other great example of where [NLP works]. If you know somebody is shopping for really expensive things that are going to last a long time, you want to make sure that you're aware of that, compared to maybe somebody who's shopping for something that's less permanent, and they just want to look good and maybe not last forever.

We started with focusing on our visual search -- given a product, what are other products that look like it? For our visual search, users can take a photo or upload one, and then we find similar looking things in our catalog. There's an engine behind that that doesn't really know about the user experience, it just can find other things similar to an image it's been given. It took us the better part of a year to get from proof of concept to something that works. Since then, we've been able to iterate off of that and apply similar technology throughout the website.

You're using NLP image processing on the front-facing customer side, but are how are you using it on the back end?

Wulin: In our NLP recommendation-oriented project, we use the visual information of what people are browsing to try to find things that have a similar aesthetic without user input. Let's say you're browsing chairs, and then you shift over to sofas. We're able to use that visual information from your chair browsing to inform what we're showing you for sofas. Hopefully, we're removing friction from the user, since we're not even asking them to give us this visual information -- it ends up being a pretty natural experience for them.

We're using NLP vision technology as a tool to help merchandiser catalog at the core. I use the example of training [our technology to] guess which products we think can be successful. Wayfair has dozens of exclusive brands that are carefully curated by folks here to have a certain kind of visual [aesthetic] and price point by using our algorithms.

We're constantly adding new products to the catalog, and it's important to figure out the ones that fit within the style that we're selling, and that we think can be successful. So we use NLP visual information for new products to try to guess whether or not we think it can be a winner long term, and to inform how much to invest in it.

What specific vendors, platforms and technology are you using?

Wulin: We pay a lot of attention to what's required of the use case. From a computer vision and visual search perspective, we tend to start by using a lot of open source packages and libraries. Typically, what we're doing is layering on top of that, using Wayfair's proprietary image data.

We use something called Inception ResNet, that's publicly available. That's trained off generally available images. Then, Wayfair gets a lot of images from suppliers and different users; we're able to render a lot of images from different 3D models for furniture. Using those three sources, we get a lot of data that's not publicly available. But we start from the publicly available one, and then we're able to really rapidly improve upon that by layering in home furnishing information.

What's the most exciting current project that you're working on?

Wulin: One that's tied to NLP and computer vision is understanding product substitutability. We have a really big catalog, and we don't want to expose our customers to too many redundant things. [Using this technology,] you can understand the pricing dynamics of what you're doing on the website. If I lower the price in one thing, how much demand is that going to pull from the other things that are similar?

The reason why it's super tricky is when you think about what does it mean to be substitutable. Is the customer going to -- if they're willing to buy A, [are they] willing to buy B? In what way will they make that decision? For sure they're looking at the image, but they're also looking at the text associated with the product, the price and so on. A lot of our work has been trying to figure out how to take advantage of both text information, labels and images to come up with what products are substitutable or not. We're trying to build something that can do that at scale, and serve the different teams.

What factors have been hardest to overcome in developing NLP image processing technology?

Wulin: Being at Wayfair, we're super data-driven, so there's a lot of interest and excitement around things like machine learning. That said, you know, there is continual conversation of making sure folks understand that things take a long time to develop. We could do something incredibly quick and dirty, maybe the results look great, but it's not going to work when you roll it out to all our customers. [We've made a point in] keeping realistic timelines and being very transparent around what we need to do to make a really high-quality, successful product.

Dig Deeper on AI technologies