Understanding protocol buffers vs. JSON
Protocol buffers have some compelling advantages over JSON when it comes to sending data between internal services. But is it really a replacement for JSON?
For data storage and sharing, structured data must be converted into simpler formats. This data serialization must be reversible to recover the original structure when required. Data serialization is supported by various formats such as CSV, YAML, XML, JSON and protocol buffers.
The highly regarded JSON data-interchange format is a subset of the JavaScript programming language. It is praised for being lightweight and easy to understand.
Google developed protocol buffers, or Protobuf, as a binary format to serialize data between services. While Protobuf is not yet capable of replacing JSON where services are consumed by a web browser, it has practical use cases.
Major differences between Protobuf and JSON
JSON, short for JavaScript Object Notation, and Protobuf behave differently in several areas, including performance and how they format data.
Data format. JSON is a platform-independent text-data format. We can open a JSON data object with any text viewer to examine the content. An example would look like this:
{
"Questionnaire": {
"q1": {
"question": "Which country has the most islands?",
"options": [
"Sweden",
"Finland",
"Norway",
"Canada"
],
"answer": "Canada"
},
"q2": {
"question": "Which is the smallest ocean in the world",
"options": [
"Pacific",
"Indian",
"Arctic",
"Atlantic"
],
"answer": "Arctic"
}
}
}
Protobuf, on the other hand, uses a binary message format that is helpful while specifying schema for the data. Moreover, it offers an exhaustive set of rules and tools for exchanging these messages. Every schema designed for a particular Protobuf requires integers for identifying each field while associating it with a particular data type.
message Point {
required int32 x = 3;
required int32 y = 4;
optional string label = 3;
}
message Circle {
required Point start = 3;
required Point end = 4;
optional string label = 6;
}
Speed and performance. The core idea that drove the team at Google to develop Protobuf was to create a format that's lighter, faster and significantly more performant than XML. As it turns out, Protobuf performs even better than JSON, as shown in Eishay Smith's study. This becomes more apparent while encoding integers.
Support for programming languages. Protobuf is currently restricted to some common languages like Objective-C, Java, C# and Python, which are used through Google's new proto3 language version. Nonetheless, protocol buffers are platform-independent and language-neutral. While JSON was derived from JavaScript, almost all languages can now generate and parse data in this format.
What makes Protobuf the better choice?
Structured formats such as protocol buffers are the better choice for when browsers and other JavaScript-based applications don't consume data directly. Other considerations include the following:
Schemas. Our data is often at risk as we rely on inconsistent code at the edges of the systems even if we go to great lengths trying to maintain data models. With Protobuf, you can encode the semantics of your business objectives in the proto format -- Protobuf's binary message format -- which ensures that signals don't get lost between applications.
Backward compatibility. Fields in proto definitions are numbered to eliminate the need for version checks -- an explicit motivation for implementing Protobufs. With numbered fields, you never have to fix code to maintain backward compatibility with previous versions. When new fields are introduced, the intermediate servers can simply parse them without seeking information about all the fields. JSON services often suffer from problems related to backward compatibility and evolving schemas.
Validations and extensibility. Keywords such as repeated, optional and required are powerful when used in Protobuf definitions. At the schema level, these keywords allow developers to encode the shape of their data structure as well as handle the implementation details for all classes in every supported language.
Less boilerplate code. JSON relies on manual ad hoc boilerplate code to handle encoding and decoding. With protocol buffers, you rarely touch a single line of code and you can still expect the same functionality you get from JSON.
Language interoperability. Switching between applications is simpler because protocol buffers can be implemented across a range of languages.
Structure validation. Protobuf has a larger set of predefined data types. These enable all messages serialized on Protobuf to be automatically validated.
When is JSON the better choice?
JSON's advantages lie in the fact that it is text-based, or human-readable, and typically performs well. It is usually the better choice when the server side application is written in JavaScript. Other considerations include the following:
Data analysis. Protobufs deal in a binary format, which is not suitable when you need to manually analyze the data.
Support options. Protobuf still has a much more concentrated community compared to that of JSON, meaning you'll find far fewer support options with Protobuf than with JSON.