Sunday, February 16, 2020

All the basics of Protobuf for a script kiddie


Protocol buffers, Protobuf, are Google's …… hmmm, you can find any basics about development in the official docs. And more implementation details, such as serialized binary format, wire types, Varint encoding and ZigZag encoding,  can be easily found on some wikis and blogs.

This blog will show you how to reverse and analyze protobuf structures rapidly with off-the-shelf tools.

Extract Protobuf Structures

The .proto file will be parsed to a google::protobuf::descriptor message during compiling. This message will be serialized and saved into compiled results. And finally it will be compiled into the binary. The PBTK is used to extract this descriptor message from the binary (java class, pyc, etc.) , and then recovers it to the .proto file.

The descriptor message in the binary is used for the reflection. Reflection is a important feature to implement a RPC framework. As we know, cpp doesn’t have native support for runtime reflection. So protobuf uses the descriptors pool and vtables to implement it.

But in particular, most of protobuf used in the lightweight clients is protobuf-lite. Protobuf-lite does not support descriptors or reflection. So there is no descriptor message in the binary compiled with protobuf-lite.

Fuzz Protobuf
https://github.com/trailofbits/protofuzz is a fuzzer implemented by py3.  So it's easy to use. But I don’t recommend it. It has poor support for some common protobuf structures, such as repeated field.

libprotobuf-mutator is used as a cpp lib.

MyPbMessage message;
protobuf_mutator::Mutator mutator;
mutator.Seed(123);
mutator.Mutate(&message, 200);

Variable message can be an arbitrary protocol message. And it don’t need to be initialized completely. The first arg of Mutator.Mutate is the pointer of a message instance. The second arg is the max bytes size of the mutation results. You can get mutation results from the message pointer after each call.

I think the ninja don’t install pkg-config correctly. I use the following command to compile.

g++  -o test any_message.pb.cc test.cc -I/usr/local/include/libprotobuf-mutator -L/usr/local/lib -lprotobuf -lprotobuf-mutator -pthread

Decode and Edit Raw
In many cases, however, we can't get the .proto definition. We must decode and edit the raw data directly as the common structure.

https://github.com/mildsunrise/protobuf-inspector  is a parser implemented by python. It can print a colored representation of their contents with wire types.

https://github.com/nccgroup/blackboxprotobuf is a Burp Suite extension for decoding and modifying (except adding) arbitrary protobuf messages without the protobuf type definition. It's writed by python.

Protobuf-inspector and Blackbox-protobuf is easy to use. But I prefer to use native c++ APIs of protobuf.

protoc  --decode_raw < message.pb.bin

That command is used to decode raw data with proto compiler. And we can find the implementation in the source code google/protobuf/text_format.cc , the method TextFormat::Printer::PrintUnknownFields .

First, for decode a raw data without any proto definition, we should define an empty message.

DescriptorPool pool;
FileDescriptorProto file;
file.set_name("empty_message.proto");
file.add_message_type()->set_name("EmptyMessage");
GOOGLE_CHECK(pool.BuildFile(file) != NULL);

Or, just define a .proto file with a empty message structure and compile it.

EmptyMessage message;
message.ParsePartialFromZeroCopyStream(&in);
const google::protobuf::Reflection* reflection = message.GetReflection();
google::protobuf::UnknownFieldSet* ufs = reflection->MutableUnknownFields(&message);
google::protobuf::UnknownField* unf = ufs->mutable_field(3); // index in the Set, from 0
std::cout << unf->type(); // UnknownField::Type::TYPE_LENGTH_DELIMITED
std::cout << unf->number(); // field id
const std::string foo = "wqwwqeqwewww";
unf->set_length_delimited(foo);

The above code can be used to decode and edit raw data.

But there is not a specific type for an embedded message. We can see how to deal with this kind of fields in the method TextFormat::Printer::PrintUnknownFields .

const std::string& value = field.length_delimited();
UnknownFieldSet embedded_unknown_fields;
if (!value.empty() && embedded_unknown_fields.ParseFromString(value)) {
// This field is parseable as a Message.
// So it is probably an embedded message.
……
} else {
// This field is not parseable as a Message.
// So it is probably just a plain string.
……
}

As you can see, the accurate types are obviously lost, together with some high-level details such as encoding. But you can use these APIs to operate each fields accurately and correct details effectively.

Hook serializing and parsing
 If you want to hook the key point of data conversion, you should find the functions as close to the prototype as possible, such as MergePartialFromCodedStream and SerializePartialToCodedStream. And the message subclass will override their own internal functions, such as _InternalParse and _InternalSerialize.

If the lib protobuf and the messages is statically compiled, it will be more complex to hook them with generic hook points. So I prefer to hook the functions in the messages' vtables according to the offset.

End
Good luck for having fun with cracking protobuf ;-)

Saturday, February 8, 2020

Starbucks Account Takeover, from MITM in the Instant App — New Scene New Challenges


It was another boring "hard working" afternoon. My sleepiness was fighting against the boooooring smali code. I opened the takeaways app and wanted to order a cup of coffee.

When I tapped the Starbucks Delivers item in the app, an instant app  popped up. Although various instant apps embed in various apps are common in China, it's the first time I see the instant app in this takeaways app. I think that is not a public interface for ordinary developers, it's for contractors and business partners.

Business Logic

When I used it, I found an interesting business logic. At first, you should bind your original Starbucks Delivers account to use your membership benefits and preferences. If your Starbucks account and the takeaways app use the same phone number, you can just bind them with a tap for auto-populate in the takeaways app, without any other authentication. The implementation of this logic is a issue worthy of consideration.

I analyzed the data transmission in the process above. And get the following authentication process.

TAA : the takeaways app
SD : Starbucks Delivers

My first thought is if there is a lateral movement at the decision process. That's I use a same phone number to pass the decision but change it at the process of binding account. Unfortunately after capturing packet, I got a detailed authentication process s, showing lateral movement is impossible at this point.

Authentication Process

Processes "auto-populate" and "input number & send sms" use the same server-end API of the takeaways app, /membership/istore/sendVerificationCode .  If the phone number in the request body is same as the number of the takeaways app account, a token will be returned directly. Otherwise it will return nothing but send a sms to the phone. You should call another API of the takeaways app, /membership/istore/bindMember, to verify the code and then get the token.

The "token" above is a key field in the whole authorization process. Getting the "token" from ://restapi.TAA.domain is the beginning of the authorization. Then it will call the api
://restapi.TAA.domain/membership/istore/queryEncryptAuthInfo to
encrypt more info about the account, but I think the cipher returned is reserved and not used in the following. Next, enter the stage of Starbucks domains. It will open a webview in the instant app and access https://3pph5.starbucks.com.cn/api_path/?token=xxxx&params=xxxx&returnUrl=/pages/join-member-redirect/join-member-redirect

There are three params in the GET request, token is the token above, params is the cipher authinfo returned above and the returnUrl is a location path in the instant app. The url is a pure static html page. It will load a complex file named main.version_code.js to handle APIs and communicate with instant app framework. And from that html page, it will call the APIs of https://3ppapi.starbucks.com.cn .

Call https://3ppapi.starbucks.com.cn/external/h5/member/verifyByMobile 

The token will be filled in Authorization bearer token. And another token will be returned, decoded as a JWT:

Next, I use the JWT as a parameter to call API https://3ppapi.starbucks.com.cn/external/h5/member/verifyBind3PP

That will return two fields. Field "sbuxid" is a unique identifier for Starbucks accounts ( but I am not sure its suitable range ). Another field is also named token. But I have no idea about what it means. Just ignore it. It's nothing in the rest of the process.

The above is what happened in the embed webview and Starbucks  domains. And we can see, that's a variant JWT Authorization process. The first token got from TAA API is just like a session id. It's used to get JWT from Starbucks  API. Next, we get the sbuxid and return it to the outside instant app.

Finally, call the api https://restapi.TAA.domain/membership/istore/bindMember

The parameter "partnerMemberId" in the request body is the field "sbuxid" above. According to this field, the Starbucks  account will be bound to current TAA account directly.

MITM

When I audited the traffic, I found there is a http redirection at the stage of loading Starbucks  html pages.

When the instant app starts to loading Starbucks  domain, the first request is based on https. We can’t decrypt it even in the MITM. But the response contains a 301 redirection.




That's obviously an issue of route adaptation. It leaks the token in the MITM environment. So we can use it to complete the rest of the authorization process. But in my practice, there are some problems need to solve.

In the first test, as soon as I catch the token in the http request, I will block the subsequent requests by the victim. But when I finally access the bindMember API to bind the victim's sbuxid with attacker's TAA account , the API returns a system error with ambiguous comments.

More details about implementation

After auditing every API request, I found it's a error about duplicate binding. When you call the API https://3ppapi.starbucks.com.cn/external/h5/member/verifyBind3PP with the JWT, the Starbucks  account will be bound with the original TAA account directly. I think the original TAA account info is saved in the JWT, maybe the field customerId or uuid. So that's a bidirectional binding process. To get the sbuxid, you must bind the original TAA account in the Starbucks server at first. And then when you call the API to bind Starbucks  account in the TAA server, TAA back-end will communicate with Starbucks  server and bind them. If the current TAA account is not same as the original one bound with previously, API will return a error.

The whole process is maybe like the following.

Because the information about binding is saved in JWT which cant be tampered, we must unbind the original member before binding attacker's TAA account. It's possible in the webview and MITM environment. Go back to the stage of MITM, the request to http url will also return a 301 redirection. We can redirect to our website and use CORS ajax to complete /membership/istore/bindMember and /membership/istore/unbindMember API call, Just like normal CSRF attack. But there is a basic problem that requires attention: Content Type should be set to application/x-www-form-urlencoded and request body should be converted to corresponding format like that:

request="{\"uniCrmId\":2634283647,\"brandCode\":2634283647184908,\"extParam\":{…}}"

Otherwise, OPTIONS preflight request will check failed.

Finally, call bindMember API with attack’s TAA account and use the `sbuxid` as the field ` partnerMemberId ` to bind them together.

Ending

As we can see in instant apps, the implementation of  the business logic usually involves multi-party authentication and data sync.This is a new and more complex multi-party Interaction scenario. The servers and clients communicate, overlap and and even embed with each other.But most of the protocols are non-standard and the implementation is in the black box without effective audits. The convenience maybe cost the complete authentication elements. 

There are new challenges in the new mobile app scenarioWe need to explore more potential attack surface.

Responsible Disclosure And More

I have reported this bug to Starbucks  at hackerone. But it's out of scope because of MITM. So I disclose it as an example for Introducing the attack scenario in the instant apps.

And you can learn more about this new scenario in my talk at BlackHat Asia 2020. The inside story: there are apps in apps and here is how to break them. It's about compromising instant app frameworks instead of a specific instant app.

2019 年度总结

 2019年最后的一周的日程让人记忆犹新,29号那天我在路上坐了25个小时,从莱比锡赶到厦门去参加表姐的婚礼。一开始要到华沙转机,我第一次坐巴航工的小飞机,之前还查了这个型号的失事率23333,不过让人印象深刻的是波兰航空飞行员的高超技巧,不管是转机的小飞机,还是飞回北京的大飞机,降落的时候都是完全没有那种缓冲的颠簸的,让人莫名对波兰产生莫名的好感呢,是不是能和华南波兰游第一人攒个行程了👊。


2019年我全年的工作都是在 android 上,18年的时候算是边做web相关的工作边学习 android 的基础知识,19年的话,web这方面除了翻翻比较前沿的paper和大佬的博客之外,基本没有动手了。 cc说他得了 android 逆向的 PTSD,说实话,我也有种 PTSD,而且绝对比 android 逆向要恐怖的多,毕竟这一年我是逆了十几个大型app,对于 android app的逆向,我只能说 “鸭类鸭类👴”,对 200 多个银行网站做基线检查才是真他妈的噩梦,对,就是那种,加了个单引号就把你 ban 掉的银行网站。 说实话,我是喜欢做渗透的,但是这两年我总觉得有些很模糊的东西我总是把握不到,在我的想法里,渗透不是“基线”检查,而 web,也不是trick的堆积。但是我是非常迷茫的,v师父去了蚂蚁,头老师出去创业,我这个人呢,又做不到像头老师那样在技术上脚踏实地,我一直是个想法很飞的人,始终处于一个很不稳定的状态,否则我也不至于前几年非要去考 cfa,2333,我需要很明确的技术方向,研究路线和训练方法。2019年算是我对自己的指导和尝试,与其说是在做 android,不如说我选了一个最开放的client去研究,去理解语言、系统、内核、认证等等各个方面的实现和研究方法。
去理解安全研究,到底是在研究什么。这也正是我最大的收获。这二十年来,开发者构建的体系不断更新换代,数量越来越多,结构越来越复杂,但是他们的设计思想始终是交互交融的。

具体到 android 上的话,18年下半年到19年上半年,都是在做app的逆向和漏洞吧,对内和对外都搞了不少东西,不过都是国内的,不太方便披露。不过还是有很多有趣的事情,比如之前对 protobuf 做逆向的时候,用了一个替换虚表的骚姿势,后来有个项目要做同样的事情,我把源码发给同事,然而他用我这个方法指针会直接飞掉,最后他把我之前编译的binary逆了一遍,发现同样的源代码编译出来,有个变量的生命周期会不一样。具体我记不清了,好像就是 cpp 函数返回字符串指针之类的问题。下半年的话,为了组里的一个产品项目,我搞了半年的 android 系统,为产品的底层做支持,总之就是疯狂 patch 某开源项目,有的bug读读代码总归能找到出问题的点,而有的bug是真的可以叫做玄学 bug 了。我解决的最神仙的一个bug是跟 jvm 类初始化时死锁相关的一个bug,更神仙的是,我虽然把这个bug解决了,但是我压根都没搞懂为什么这样就解决了。除了在已有的开源项目上修修补补,在遇到一些从实现原理上就没法解决的问题时,只能去强行 hack 系统,这让我深入理解了 art 的类加载机制和 oat 的编译机制。

额,身在实验室,有时候写blog只能这样吞吞吐吐遮遮掩掩的了,毕竟保密工作还是要做的,我也不知道什么能说什么不能说,这里是我私人的博客。

19年下半年,我开始逐渐转移研究的关注点,因为 application 层的研究已经不会再有新的攻击面了,这一年关于app层的工作,我也已经写了一些工具、产品和演讲作为输出。如果还要做android的话,只能继续往系统和内核做,不过说实话,我对client上的兴趣并不是那么浓厚吧,总觉得还是不够兴奋。之后的一段时间里,我做了一些关于云和identity的研究,不过还是进行时,所以按下不表。后来有一次,我听了实验室内一次关于 android 内核提权的分享,这给了我很多其他方向上的灵感。

19年还有个重头戏,占用了我大量的时间精力和¥¥¥,治牙。我小时候是在农村长大的,刚出生那几年,村里有口井,井水位置不合理,氟含量高度超标,导致我们那一代人的牙齿都是高氟牙,牙釉质发育有问题,牙质易碎。大学的时候在家里的诊所治了几次牙,结果今年全部都要打开重新做根管,两颗门牙之前在家做的根管有问题,瘘管了好几年一直没管,结果发炎了,去治疗的时候拍片看,脓腔已经腐蚀了骨头,医生说这样植牙都有问题,只能拆冠重新做根管,然后观察愈合情况。这一年下来,做牙齿的钱花了几万,而且每次看完牙齿抵抗力都奇低,特别容易感冒。
我还问我妹子能不能起诉之前的牙医,结果她跟我说,中国没有专门做医疗纠纷的律师,因为钱太少了,我想想也是,咱这跟美国不一样,在家那边看个牙,算上戴冠也就一两千块,你说处理纠纷能赔你多少钱呢,说白了你承受的身体精神和时间金钱上的损失是这个价格的几十倍,但是你能怎么办呢。
其实这个问题在根本上是一个很深刻的问题,是一个跟社会运行机制有关的问题,毕竟我是外行,说多了我怕过两年我要羞愧的删贴,只说一句观点,无限看好中国保险行业的发展。

额,不知不觉都快一点了,想了想,就这样吧,只是一时兴起想写点东西,已经好久好久没写过什么了。
哦对了,我最近半年特别喜欢听 Eminem ,有段时间听了很多说唱歌手,然而只让我意识到可能并不怎么喜欢听说唱,但是我确实特别喜欢听 Eminem ,挺奇怪的,所以这是一种说唱风格吗,还有其他代表歌手吗,希望可以推荐给我。
最后,祝大家在新的一年里,身体健康 。