kentoh - Fotolia
Bloomberg storage engineering team leans on open source, SDS
The Bloomberg Storage Engineering team built an internal storage cloud that has withstood great trading volatility during the pandemic and keeps 'infrastructure out of the way.'
In describing his job as engineering manager of Bloomberg's Storage Engineering team, Matthew Leonard uses the words "challenging" and "fun" a lot. The challenge comes from overseeing storage ranging from cutting-edge NVMe SAN arrays to open source software-defined storage DevOps. That's where fun often comes in, too.
Leonard and his 25-person team oversee more than 100 petabytes of capacity and an internal cloud for 6,000 engineers who develop applications for the Bloomberg Terminal -- the technology that helped make Michael Bloomberg a billionaire. The Storage Engineering team designs, builds and maintains storage for Bloomberg Engineering.
Like all IT pros, 2020 has been unique for Bloomberg's Storage Engineering team members, as COVID-19 has forced them to work remotely. Leonard said the pandemic affected his "close-knit team" socially by preventing face-to-face interaction, but the members quickly adjusted to working at home on laptops and holding video meetings.
"Shockingly, I would say it hasn't slowed us down," he said. "There was a bit of adjustment period -- not everyone was set up to work from home. After a week or two, everyone figured it out. We've been able to find ways to make it work to do our hardware purchases and refreshes, and capacity additions to support the company during this. We've had to get creative, but we have not been affected."
Perhaps the biggest challenge came before the full impact of COVID-19 hit. It was caused by market trading volatility due to fears of what the pandemic would do to the world economy. Data hitting Bloomberg terminals from global capital markets nearly doubled, reaching 240 billion pieces of information some days in late March. That severely tested the storage systems.
"When you're doubling storage requirements instantaneously over one day, it does present interesting challenges," Leonard said. "We were able to handle that and ensure that applications teams had the space and the performance they needed. A lot of that has to do with the way we think about our storage systems. We don't build something for today. We don't say, 'Our usage is ABC, so we'll build our system for ABC.' We do what we call 'data budgeting' with our teams to forecast their usage, look at trends of usage and performance, and we build in safety factors. All of that planning and thinking and methodical due diligence up front allows us to take on unexpected bursty loads without breaking a sweat. I wouldn't say I wasn't nervous, but I wasn't super uncomfortable."
Leonard recently spoke at length with SearchStorage about managing storage for a data-driven business. He discussed what it takes to offer a private storage cloud that can give his users features of AWS while keeping all data within Bloomberg's data centers.
When there's no pandemic, what are the challenges of managing storage for Bloomberg's engineers?
Matthew Leonard: We have a lot of needs, and we're pulled in a lot different directions. So we have to offer a lot of multiple types of products at different [service-level agreement] SLA levels, really to help our application engineers focus on solving their domain problems, and not worrying about the underlying storage itself.
What is your storage strategy for doing that?
Leonard: Part of what we try to do is productize storage. Think of the AWS model where an application engineer comes in, pushes a button and then magically 'poof,' they get the type of storage they need that solves their problem.
What does your storage infrastructure look like?
Leonard: Because we have such diverse ecosystem and such a diverse ecosystem of application engineers, we can't just offer one product. We have object, file and block. Those are the products, and we have different types of technologies that provide those. For block storage, we have a SAN. We also use software-defined storage that provides a different flavor of block storage with a different set of performance requirements. In terms of file, we leverage NFS. We also use software-defined for object. Some of the block storage and the object store form Bloomberg's internal private cloud for compute and storage.
So you're not using public cloud storage?
Leonard: Correct. Some application teams have approval to use the public cloud. But due to the nature of our business, where data is our business, we like to be more in control in terms of what leaves our four walls. So we have our private clouds, which are completely under our control. This is hardware, sitting in our data center, under our management.
We have a multiple-vendor strategy for every product in our data center. They're the big vendors, but we prefer not to say who. [Editor's note: Bloomberg's policy is to not endorse any vendors.]
Do you use hyper-converged infrastructure to build your private cloud?
Leonard: No. The direction Bloomberg is taking is we are trying not to move into the hyper-converged direction. We are trying to separate compute from storage so we can scale them independently of each other. The direction we are going -- especially in terms of our private cloud -- is we like to be able to disaggregate the two. That's because some things are compute-intensive and some things are storage-intensive. If you were to scale them uniformly, you can waste resources, whether that's monetary resources, floor space in data centers or buying capacity you don't need. That's why we like to leverage a common interface between the two so that they are completely different systems and managed by different teams.
What obstacles must you overcome to build a private cloud?
Leonard: It's a scale problem. Like most things, the devil's in the details. When you think about how these things operate, how you design them to be fault-tolerant, how you deal with operational burden, how you partner with teams that handle physical assets, then it becomes a bit interesting. The challenge there is finding the way to make this scale and be supportable and be a product that our application developers want to use while being able to enrich the feature set to stay at forefront of what the public cloud is doing. And marrying that with all the stuff under the hood to keep it going. That is our biggest challenge -- we're pulled in multiple business directions, trying to serve all those needs while not ignoring other needs.
Do you feel like you have to keep up with the latest features from AWS and other public clouds?
Leonard: One of the fun things about S3 is it's a living standard, always changing, there's features always being added. It's like a new toy. If somebody sees a new feature release in the wild, they're going to want that. Not every feature of AWS is applicable in our environment, so staying on top of what's important, what's going to help application developers and how to get that in-house is the interesting thing.
What storage hardware technologies do you use?
Leonard: Our hardware is cutting-edge. Our internal private cloud is based on all NVMe flash storage, which makes those systems very performant. It makes our lives a little easier, and it's a nice feature for our application developers because they don't have to worry about the performance of the storage.
What do you use object storage for?
Matthew LeonardEngineering manager, Bloomberg Storage Engineering
Leonard: We have 6,000 application engineers sitting on top of our infrastructure -- they don't coalesce around a single use case. Any use case you can think of probably exists on object storage. Some teams use us for cold archival storage, some use us for data transfer, we have teams that use us for transactional applications. Those use cases all require different SLAs, so you can see we have all different kinds of traffic, all kinds of needs for different users on our infrastructure. It's not a homogenous use case that sits on top of any of our storage, which adds to the challenge.
How big of a role does containers and Kubernetes play for you, and what is the impact on storage?
Leonard: We're pushing the productization of storage to have a cloudy feel, anything-as-a-service feel, where it's push-button for developers to accelerate their ingenuity and get infrastructure out of their way.
We have three teams: One team is the storage API team. They build all the programmatic access, programmatic endpoints and deterministic workflows for our clients, who are application engineers at Bloomberg. That team is a full stack web development team -- they're using Node.JS, Python, open source technologies like Apache Airflow -- and they're exploring containerization and virtualization.
And we have two storage technical teams that actually manage the bits and bytes. They deal more directly with hardware. The amount of hardware we deal with is quite large. Those teams aren't using virtualization and containers much.
We're trying to be on top of what's going on in the industry. We were looking into the Kubernetes CSI [Container Storage Interface] driver, and we partnered with the team that does the Kubernetes platform at Bloomberg to see if we can provide durable storage through that interface with some of the technologies we have in place, and we were able to do that. We use software-defined storage to back our Kubernetes platform attached to durable storage. We did a successful proof of concept with that technology, and there's on ongoing discussion between the two teams as to how and when we might actually want to make that available to the larger audience at Bloomberg. We were able to show that that is an option.
What other open source software do you use, particularly for storage?
Leonard: We use Apache Airflow, we leverage HAProxy for traffic-shaping applications. We use the Ceph software-defined storage platform. With Ceph, you can have one system for teams but present multiple interfaces to clients. One of the virtualization platforms we support is OpenStack on our compute side -- a sister team to my team. We have an open source virtualization platform backed by an open source-distributed software-defined storage platform. That's fun.
What storage technologies are you looking at for the next two or three years?
Leonard: We're always looking at other interesting new things that have just come into the storage industry. That's a fun part of the job: It's not, 'Here's your SAN, manage this, here's your NFS, manage this.' We get to interact with our clients, our application engineers. We work with them to understand the problems they're trying to solve, and how that impacts Bloomberg's external clients -- the financial clients that use our software. And then we take that back to the storage world to see how we can help them achieve their goal. How can we help them find the right storage technology that meets their SLA or meets what they're trying to do? Because we have such a large number of application engineers doing interesting cool stuff, it's never boring.
One thing we're looking at is how we can get more performance with software-defined storage products that work potentially on white box servers. NVMe over TCP/IP is really interesting, and one of the cool initiatives we're working on. We've been working with key people in the industry and some of our existing vendors to see what their offerings are, what the performance actually looks like, and if we can start to leverage it in production in-house. That opens new doors that weren't open before.