Sunday, February 16, 2020

All the basics of Protobuf for a script kiddie

Protocol buffers, Protobuf, are Google's …… hmmm, you can find any basics about development in the official docs. And more implementation details, such as serialized binary format, wire types, Varint encoding and ZigZag encoding,  can be easily found on some wikis and blogs.

This blog will show you how to reverse and analyze protobuf structures rapidly with off-the-shelf tools.

Extract Protobuf Structures

The .proto file will be parsed to a google::protobuf::descriptor message during compiling. This message will be serialized and saved into compiled results. And finally it will be compiled into the binary. The PBTK is used to extract this descriptor message from the binary (java class, pyc, etc.) , and then recovers it to the .proto file.

The descriptor message in the binary is used for the reflection. Reflection is a important feature to implement a RPC framework. As we know, cpp doesn’t have native support for runtime reflection. So protobuf uses the descriptors pool and vtables to implement it.

But in particular, most of protobuf used in the lightweight clients is protobuf-lite. Protobuf-lite does not support descriptors or reflection. So there is no descriptor message in the binary compiled with protobuf-lite.

Fuzz Protobuf is a fuzzer implemented by py3.  So it's easy to use. But I don’t recommend it. It has poor support for some common protobuf structures, such as repeated field.

libprotobuf-mutator is used as a cpp lib.

MyPbMessage message;
protobuf_mutator::Mutator mutator;
mutator.Mutate(&message, 200);

Variable message can be an arbitrary protocol message. And it don’t need to be initialized completely. The first arg of Mutator.Mutate is the pointer of a message instance. The second arg is the max bytes size of the mutation results. You can get mutation results from the message pointer after each call.

I think the ninja don’t install pkg-config correctly. I use the following command to compile.

g++  -o test -I/usr/local/include/libprotobuf-mutator -L/usr/local/lib -lprotobuf -lprotobuf-mutator -pthread

Decode and Edit Raw
In many cases, however, we can't get the .proto definition. We must decode and edit the raw data directly as the common structure.  is a parser implemented by python. It can print a colored representation of their contents with wire types. is a Burp Suite extension for decoding and modifying (except adding) arbitrary protobuf messages without the protobuf type definition. It's writed by python.

Protobuf-inspector and Blackbox-protobuf is easy to use. But I prefer to use native c++ APIs of protobuf.

protoc  --decode_raw < message.pb.bin

That command is used to decode raw data with proto compiler. And we can find the implementation in the source code google/protobuf/ , the method TextFormat::Printer::PrintUnknownFields .

First, for decode a raw data without any proto definition, we should define an empty message.

DescriptorPool pool;
FileDescriptorProto file;
GOOGLE_CHECK(pool.BuildFile(file) != NULL);

Or, just define a .proto file with a empty message structure and compile it.

EmptyMessage message;
const google::protobuf::Reflection* reflection = message.GetReflection();
google::protobuf::UnknownFieldSet* ufs = reflection->MutableUnknownFields(&message);
google::protobuf::UnknownField* unf = ufs->mutable_field(3); // index in the Set, from 0
std::cout << unf->type(); // UnknownField::Type::TYPE_LENGTH_DELIMITED
std::cout << unf->number(); // field id
const std::string foo = "wqwwqeqwewww";

The above code can be used to decode and edit raw data.

But there is not a specific type for an embedded message. We can see how to deal with this kind of fields in the method TextFormat::Printer::PrintUnknownFields .

const std::string& value = field.length_delimited();
UnknownFieldSet embedded_unknown_fields;
if (!value.empty() && embedded_unknown_fields.ParseFromString(value)) {
// This field is parseable as a Message.
// So it is probably an embedded message.
} else {
// This field is not parseable as a Message.
// So it is probably just a plain string.

As you can see, the accurate types are obviously lost, together with some high-level details such as encoding. But you can use these APIs to operate each fields accurately and correct details effectively.

Hook serializing and parsing
 If you want to hook the key point of data conversion, you should find the functions as close to the prototype as possible, such as MergePartialFromCodedStream and SerializePartialToCodedStream. And the message subclass will override their own internal functions, such as _InternalParse and _InternalSerialize.

If the lib protobuf and the messages is statically compiled, it will be more complex to hook them with generic hook points. So I prefer to hook the functions in the messages' vtables according to the offset.

Good luck for having fun with cracking protobuf ;-)

No comments:

Post a Comment