Protocol buffers, Protobuf, are Google's …… hmmm, you can find any basics about development in the
official docs. And more implementation details, such as serialized binary format, wire types, Varint encoding and ZigZag encoding, can be easily found on some wikis and blogs.
This
blog will show you how to reverse and analyze protobuf structures rapidly with off-the-shelf tools.
Extract Protobuf Structures
The
.proto file will be parsed to a google::protobuf::descriptor message during
compiling. This message will be serialized and saved into compiled results. And
finally it will be compiled into the binary. The PBTK is used to extract this
descriptor message from the binary (java class, pyc, etc.) , and then recovers
it to the .proto file.
The
descriptor message in the binary is used for the reflection.
Reflection is a important feature to implement a RPC
framework. As we know, cpp
doesn’t have native support for runtime reflection. So protobuf uses the descriptors pool and vtables to implement it.
But
in particular, most of protobuf used in the lightweight clients is protobuf-lite. Protobuf-lite does not support
descriptors or reflection. So there is no descriptor message in the binary compiled with protobuf-lite.
Fuzz Protobuf
https://github.com/trailofbits/protofuzz is a fuzzer implemented by py3. So it's easy to use. But I
don’t recommend it. It has poor support for some common
protobuf structures, such as repeated field.
libprotobuf-mutator
is used as a cpp lib.
MyPbMessage
message;
protobuf_mutator::Mutator mutator;
mutator.Seed(123);
mutator.Mutate(&message, 200);
Variable
message can be an arbitrary protocol message. And it don’t need to be initialized completely. The first arg of Mutator.Mutate
is the pointer of a message
instance. The second arg is the max bytes size of the mutation results. You can get mutation results from the message pointer after each call.
I
think the ninja don’t install pkg-config correctly. I use the following command to
compile.
g++ -o test any_message.pb.cc test.cc
-I/usr/local/include/libprotobuf-mutator -L/usr/local/lib -lprotobuf
-lprotobuf-mutator -pthread
Decode and Edit Raw
In
many cases, however, we can't get the .proto definition.
We must decode and edit the raw data directly as the common structure.
Protobuf-inspector and Blackbox-protobuf is easy to use. But I prefer to
use native c++ APIs of protobuf.
protoc --decode_raw < message.pb.bin
That command is used to decode raw data with proto compiler. And we
can find the implementation in the source code google/protobuf/text_format.cc ,
the method TextFormat::Printer::PrintUnknownFields .
First,
for decode a raw data without any proto definition, we should define an empty
message.
DescriptorPool
pool;
FileDescriptorProto
file;
file.set_name("empty_message.proto");
file.add_message_type()->set_name("EmptyMessage");
GOOGLE_CHECK(pool.BuildFile(file) != NULL);
Or, just
define a .proto file with a empty message structure and compile it.
EmptyMessage
message;
message.ParsePartialFromZeroCopyStream(&in);
const google::protobuf::Reflection* reflection = message.GetReflection();
google::protobuf::UnknownFieldSet* ufs = reflection->MutableUnknownFields(&message);
google::protobuf::UnknownField* unf = ufs->mutable_field(3); // index in the
Set, from 0
std::cout
<< unf->type(); // UnknownField::Type::TYPE_LENGTH_DELIMITED
std::cout
<< unf->number(); // field id
const std::string foo = "wqwwqeqwewww";
unf->set_length_delimited(foo);
The above
code can be used to decode and edit raw data.
But there is not a specific type for an embedded message. We can see how to deal with this kind of fields in
the method TextFormat::Printer::PrintUnknownFields .
const std::string& value = field.length_delimited();
UnknownFieldSet
embedded_unknown_fields;
if (!value.empty() && embedded_unknown_fields.ParseFromString(value)) {
// This
field is parseable as a Message.
// So it
is probably an embedded message.
……
} else {
// This
field is not parseable as a Message.
// So it
is probably just a plain string.
……
}
As
you can see, the accurate types
are obviously lost, together with some high-level details such as encoding. But
you can use these APIs to operate each fields accurately and correct details effectively.
Hook serializing and parsing
If you want to hook the key point of data
conversion, you should find the functions as close to the prototype as
possible, such as MergePartialFromCodedStream and
SerializePartialToCodedStream. And the message subclass will override their own
internal functions, such as _InternalParse and _InternalSerialize.
If
the lib protobuf and the messages is statically
compiled, it will be more complex to hook them with
generic hook points. So I
prefer to hook the functions in the messages' vtables according
to the offset.
End
Good
luck for having fun with cracking protobuf ;-)