This is Ronny Speaking: All the basics of Protobuf for a script kiddie

Protocol buffers, Protobuf, are Google's …… hmmm, you can find any basics about development in the official docs. And more implementation details, such as serialized binary format, wire types, Varint encoding and ZigZag encoding, can be easily found on some wikis and blogs.

This blog will show you how to reverse and analyze protobuf structures rapidly with off-the-shelf tools.

Extract Protobuf Structures

Pbtk: https://github.com/marin-m/pbtk

The .proto file will be parsed to a google::protobuf::descriptor message during compiling. This message will be serialized and saved into compiled results. And finally it will be compiled into the binary. The PBTK is used to extract this descriptor message from the binary (java class, pyc, etc.) , and then recovers it to the .proto file.

The descriptor message in the binary is used for the reflection. Reflection is a important feature to implement a RPC framework. As we know, cpp doesn’t have native support for runtime reflection. So protobuf uses the descriptors pool and vtables to implement it.

But in particular, most of protobuf used in the lightweight clients is protobuf-lite. Protobuf-lite does not support descriptors or reflection. So there is no descriptor message in the binary compiled with protobuf-lite.

Fuzz Protobuf

https://github.com/trailofbits/protofuzz is a fuzzer implemented by py3. So it's easy to use. But I don’t recommend it. It has poor support for some common protobuf structures, such as repeated field.

libprotobuf-mutator: https://github.com/google/libprotobuf-mutator

libprotobuf-mutator is used as a cpp lib.

MyPbMessage
message;

protobuf_mutator::Mutator mutator;

mutator.Seed(123);

mutator.Mutate(&message, 200);

Variable message can be an arbitrary protocol message. And it don’t need to be initialized completely. The first arg of Mutator.Mutate is the pointer of a message instance. The second arg is the max bytes size of the mutation results. You can get mutation results from the message pointer after each call.

I think the ninja don’t install pkg-config correctly. I use the following command to compile.

g++ -o test any_message.pb.cc test.cc -I/usr/local/include/libprotobuf-mutator -L/usr/local/lib -lprotobuf -lprotobuf-mutator -pthread

Decode and Edit Raw

In many cases, however, we can't get the .proto definition. We must decode and edit the raw data directly as the common structure.

https://github.com/mildsunrise/protobuf-inspector is a parser implemented by python. It can print a colored representation of their contents with wire types.

https://github.com/nccgroup/blackboxprotobuf is a Burp Suite extension for decoding and modifying (except adding) arbitrary protobuf messages without the protobuf type definition. It's writed by python.

Protobuf-inspector and Blackbox-protobuf is easy to use. But I prefer to use native c++ APIs of protobuf.

protoc --decode_raw < message.pb.bin

That command is used to decode raw data with proto compiler. And we can find the implementation in the source code google/protobuf/text_format.cc , the method TextFormat::Printer::PrintUnknownFields .

First, for decode a raw data without any proto definition, we should define an empty message.

DescriptorPool
pool;

FileDescriptorProto
file;

file.set_name("empty_message.proto");

file.add_message_type()->set_name("EmptyMessage");

GOOGLE_CHECK(pool.BuildFile(file) != NULL);

Or, just define a .proto file with a empty message structure and compile it.

EmptyMessage
message;

message.ParsePartialFromZeroCopyStream(&in);

const google::protobuf::Reflection* reflection = message.GetReflection();

google::protobuf::UnknownFieldSet* ufs = reflection->MutableUnknownFields(&message);

google::protobuf::UnknownField* unf = ufs->mutable_field(3); // index in the
Set, from 0

std::cout
<< unf->type(); // UnknownField::Type::TYPE_LENGTH_DELIMITED

std::cout
<< unf->number(); // field id

const std::string foo = "wqwwqeqwewww";

unf->set_length_delimited(foo);

The above code can be used to decode and edit raw data.

But there is not a specific type for an embedded message. We can see how to deal with this kind of fields in the method TextFormat::Printer::PrintUnknownFields .

const std::string& value = field.length_delimited();

UnknownFieldSet
embedded_unknown_fields;

if (!value.empty() && embedded_unknown_fields.ParseFromString(value)) {

// This
field is parseable as a Message.

// So it
is probably an embedded message.

……

} else {

// This
field is not parseable as a Message.

// So it
is probably just a plain string.

……

}

As you can see, the accurate types are obviously lost, together with some high-level details such as encoding. But you can use these APIs to operate each fields accurately and correct details effectively.

Hook serializing and parsing

If you want to hook the key point of data conversion, you should find the functions as close to the prototype as possible, such as MergePartialFromCodedStream and SerializePartialToCodedStream. And the message subclass will override their own internal functions, such as _InternalParse and _InternalSerialize.

If the lib protobuf and the messages is statically compiled, it will be more complex to hook them with generic hook points. So I prefer to hook the functions in the messages' vtables according to the offset.

End

Good luck for having fun with cracking protobuf ;-)

This is Ronny Speaking

OUTLINE

Sunday, February 16, 2020

All the basics of Protobuf for a script kiddie

No comments:

Post a Comment