Google’s Protocol Buffer is a library to encode and decode messages in a binary format optimised to be compact and portable between different platforms.
At the moment the core library can generate code for C/C++, Java and Python but additional languages can be implemented by writing a plugin for the Protobuf’s compiler.
There is already a list of plugins that support third party languages, however they directly translate the .proto files into the target language code, which then makes it possible to add business logic to the generated code.
In my case we wanted to have more control of what we generate and include some logic as well, so we decided to write our code generation plugin.
This post is a simple example of a plugin written in Python, which can be used as starting point for any other Google Protocol Buffer plugin.
What we’re going to build
In this post we are going to build and understand step by step:
- an interface between our code and the Protobuf compiler
- a parser for
- the output of our generated code
Before start writing the plugin we need to install the Protocol Buffer compiler first to be able to compile any .proto file:
and then the Python Protobuf package to implement our plugin:
Writing the plugin
The interface between the
protoc compiler is pretty simple: the compiler will pass a
CodeGeneratorRequest message on the stdin and your plugin will output the generated code in a
CodeGeneratorResponse on the
stdout. So the first step is to write the code which reads the request and write an empty response:
protoc compiler follows a naming convention for the name of the plugins, as state [protobuf-plugin][here] you can save the code above in a file called
protoc-gen-custom in your
PATH or save it with any name you prefer (like
my-plugin.py) and pass the plugin’s name and path to the
--plugin command line option.
We are choosing the second option - passing the full path of our plugin to the
--plugin command line option - because it will be much easier to pass a full path to our plugin instead of putting it into the
PATH and it will make the entire compiler invocation more explicit.
So we’ll save our plugin as
my-plugin.py and then then compiler’s invocation will looks like this (assuming that the build directory already exists)::
The content of
hello.proto file is simply this:
The command above will not generate any output because our plugin does nothing. Now it’s time to write some meaningful output.
Lets modify the
generate_code() function to generate a JSON representation of the
.proto file. First we need a function to traverse the AST - the Abstract Syntax Tree of the input
.proto file - and return all the enumerators, messages and (nested types)[https://developers.google.com/protocol-buffers/docs/proto#nested):
And now the new
.proto file in the request we iterate over all the items (enumerators, messages and nested types). We store the metadata about any messages and enumerators we encounter during the AST traversal into a dictionary-like data structure which will be used later for generating the output.
We then add a new file to the response and we set the filename. In this case it is equal to the original filename plus the
.json extension, and the content which is the JSON representation of the dictionary.
If you run again the protobuf compiler it will output a file named
hello.proto.json in the
build directory with this content:
In this post we walked through the creation of a Protocol Buffer plugin to compile a
.proto file into simplified representation in JSON format. The core being the interface code to read a request from the
stdin, traverse the AST and write the response on the
The most challenging part was to figure out how the information about the Protobuf data is passed to the plugin and back to the compiler. I was expecting a kind of common data format like JSON or XML instead a custom binary data structure is used instead. This was where I spent most of the time building the first plugin prototype but thanks to the list of plugin examples I was able to understand the plugin/compiler communication.
You are not limited to only transforming the input into another format, you can also use the request to output any code in any language, you can parse a
.proto file and output code for a RESTful API in Node.js, converting the message and enum definitions into a XML file or even generate another
.proto file i. e. without the deprecated fields.