SEVANS/TechTarget
Wayfair charts open source components course to growth
Teams at Wayfair mix new open source tools to power customer-facing apps. In such shops, tech leaders like Ben Clark must deftly maneuver an obstacle course of data components.
In a low lit operations room in the upscale Back Bay neighborhood, technical teams watch monitors that show the electronic heartbeat of Wayfair.com's data flow.
They watch big data move through a stream of open source components that Wayfair's developers have built and its data scientist ultimately analyzes, with the objective to continually improve the customer experience at the company's e-commerce site for home furnishings.
Behind this is a new style of technology management. It requires deft handling of open source components for data processing. It is a style marked by the work of Ben Clark, chief architect at Wayfair, the popular Boston-based online furniture retailer.
"What we've seen is a huge change in streaming technology and use of quantitative methods for data analytics," a pensive Clark said in a wide-ranging interview at the headquarters here. Step-by-step, such software has been key to Wayfair teams learning about the customer on the scale and at the depth required by modern e-commerce, he said.
Along the way, Clark indicated, "build" has usually taken a front seat over "buy" in assorted build-versus-buy decisions Wayfair engineers have made.
Online furnishings
Founded in 2002, Wayfair has swapped open source components in and out of its data systems on its way to garnering about $6.8 billion in revenue in 2018.
Wayfair teams have become experts in Kafka, Hadoop, Spark, Kubernetes, ElasticSearch, Redis and many other components used to customize customers' on-line buying experience. The company's goal is to enable people to easily compare and select furniture, and turn empty houses and apartments into comfortable homes.
Clark's own efforts at Wayfair began in 2011 when he became tech lead for customer recommendations and search technology. He and his colleagues wanted to learn about and improve upon customer click-through rates.
That led to their using open source big data components, such as Hadoop and Spark, and bringing in the first quants -- quantitative analysts -- to study website activity.
The recommendation engine is a mix of advanced artificial intelligence and just plain programming, and it has become a mainstay of e-commerce sites. In practice today, the Wayfair recommenders take advantage of an ever growing stream of data that clarifies customers' likes and dislikes.
From RabbitMQ to Kafka
Early Wayfair data integrations focused on a RabbitMQ open source message broker, which Wayfair still uses. But the more recent open source Kafka message broker has found increased use at Wayfair in applications that use real-time data to feed analytics jobs, Clark said.
Such Kafka tooling is another mainstay in e-commerce implementations these days. But for developer teams, it means delving deeper into ways of distributed data processing.
"RabbitMQ is good example of early software that was adequate to the needs of the time. Meanwhile, Kafka was rapidly maturing. For a while, there was overlap. Now, we use Kafka and RabbitMQ for different use cases," Clark said.
All that means developers today must have great flexibility. What that means at Wayair, he said, is "we have built a team over time where people can turn on a dime."
"We have 1,000 flowers blooming of ways to move data around. Today, you get tremendous amounts of data flowing through, and then you see what should be re-engineered or tweaked."
The right tool for the job
Clark said Wayfair teams regularly review the status of components used for data processing. He has even published a Tech Radar report, patterned after a model created by Chicago-based design house ThoughtWorks, which rates software components in terms of adoption status within the organization.
This report is meant to guide developer discussions -- these can become passionate at times -- about using different open source software packages. Developers sometimes put software on trial, and sometimes on hold. They use other software for a time, but may eventually replace it with still other components.
Meanwhile, when it makes sense, Wayfair is prepared to continue with established commercially available software. In that regard, Clark points to Microsoft SQL Server, which has been the go-to relational database throughout Wayfair's history.
Also, the company opts to buy new best-of-breed commercial software, when it makes more sense than building its own open source equivalent for application performance monitoring.
Here, Clark cites InfluxDB from InfluxData as a recent example. This time-series database has taken on some of the work formerly done by open source Graphite software for logging and graphing incoming streams of data metrics.
Time-series and related systems "provide visibility into scaling," said Clark, who marks this as a key factor in supporting growth at a successful web startup.
Using information on how systems are working allows people in the Wayfair tech group to gain what he calls "an intuitive sense of issues -- you don't want staff to have roadblocks."
This particular pipeline shows Wayfair's readiness to adapt software as new needs develop. As part of it, Wayfair developers took open source Statsdcc threading software, originally written at e-commerce site Etsy in JavaScript, and created faster programs by porting it to C++.
In the spirit of open source, Wayfair teams have contributed this port and other software back to the larger developer community. That work can be accessed on github.
Striking a balance
Behind all this, Clark said, is the notion of the minimum viable product.
"You have to be right on the line. You have to understand the balance. If you err on the side of the viable, you have over-engineered components," he said. "If you err on the side of minimal, you're not going to give innovative ideas a fair shake."
But there is no perfect formula, veteran Clark cautioned.
"Sometimes the introduction of a distributed system can be gratuitous. Developers must ask if the system really needs to be distributed, and users actually need to get the data faster," he said.