>> Pokémon GO - REVISITING THE "HACKING" SCENE (PART 5)
Oh the joys of implementing heuristics and making sense of all the numbers
as they fly by.
In my
previous entry
I provided the basics for being able to utilize the Unicorn CPU emulator
and Captsone disasssembler to provided the ability to implement heuristics
to help understand the flow of execution and how to detect when things happen.
In this entry, I will demonstrate the code modifications required to detect
loading of constant values into 32-bit registers - which could be useful for
tracking down some of the magic numbers in the Pokémin GO hash function.
I identified earlier a pattern of ARM opcodes where we could detect where
constants were loaded into a 32-bit registers, more specifically a combination
of the movw and movt instructions. This is the most simplest
pattern to detect - so, how would we go about implementing it?
address: 0x01d8d2c6
opcodes: 4d f2 22 55 :: movw r5, #0xd522
opcodes: c1 f6 aa 55 :: movt r5, #0x1daa <-- R05 0x1daad522
We already have the ability to trace the ARM opcodes; the trick now is to
hook into the place where the instruction is shown and start tracking the
use of the movw and movt instructions - if we see that
two happen next to each other on the same register, we can flag it. For
example:
// display the instruction (hack)
{
cs_insn* insn;
int i, cnt;
static int movlo = 0;
static int movhi = 0; // static variables for heuristics
cnt = cs_disasm(cs, &code_buffer[address], 10, address, 1, &insn);
if (cnt > 0)
{
printf("opcodes: ");
for (i=0; isize; i++) printf("%02x ", insn->bytes[i]);
for (i=insn->size; i<4; i++) printf(" ");
printf(":: %s\t%s", insn->mnemonic, insn->op_str);
// detect when movw/movt happens
if (strncmp(insn->mnemonic, "movw", 4) == 0) movlo = 1;
else if (strncmp(insn->mnemonic, "movt", 4) == 0) movhi = 1;
else { movhi = 0; movlo = 0; }
// do we have a movw and movt in succession?
if ((movhi == 1) && (movlo == 1))
{
// YAY - detection of movw/movt in succession
}
cs_free(insn, cnt);
}
}
The basic idea is to use two flags, namely movhi and movlo
in the code that get reset when neither of the two instructions happen - in
the event that both of the flags are set, we know that we have found a case
where the movw and movt instructions happen in sequence
(in either order). Of course; additional checks are needed to make sure
the instructions are on the same register.
Unfortunately, in most cases it will not be as simple as this - here are some
more use-cases:
>> address: 0x01b17d44
opcodes: 4f f2 68 50 :: movw r0, #0xf568
opcodes: 40 f6 80 12 :: movw r2, #0x980
opcodes: c0 f2 1e 10 :: movt r0, #0x11e <-- R00 0x011ef568
opcodes: c5 f2 85 42 :: movt r2, #0x5485 <-- R02 0x54850980
In the above case; we have movw and movt instructions; but
they are not in sequence.
>> address: 0x01b30d3e
opcodes: 46 f2 20 55 :: movw r5, #0x6520
opcodes: 4f f0 06 0a :: mov.w sl, #6
opcodes: c2 e9 00 03 :: strd r0, r3, [r2]
opcodes: c0 f2 f6 15 :: movt r5, #0x1f6 <-- R05 0x01f66520
opcodes: 05 98 :: ldr r0, [sp, #0x14]
opcodes: 2e 46 :: mov r6, r5
opcodes: 44 f6 04 65 :: movw r5, #0x4e04 <-- R05 0x01f64e04
opcodes: 07 9c :: ldr r4, [sp, #0x1c]
opcodes: d0 e9 00 02 :: ldrd r0, r2, [r0]
opcodes: c1 f6 bf 75 :: movt r5, #0x1fbf <-- R05 0x1fbf4e04
Another case; we have two pairs of movw and movt - yet
they yield three different constant cases.
address: 0x01b17e1a
opcodes: 02 20 :: movs r0, #2
Let's not forget there is also another operand that could impact the
detection; specifically movs that sets low bytes (8bit). All of
a sudden; you start to get an idea that implementing some pattern detection
may be a bit of work/complicated - but, isn't impossible if you put your
mind to it.
In order to make sure as much as the code is executed in the hash function;
use the test vector:
:: Hash(buffer, sizeof(buffer) [iOS code]
61 24 7f bf 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 len=133
0x65c88067392184af
done
Using the scripts in the research/0.45.0/ folder; I was able to
verify that all the magic numbers were in fact loaded - I will not demonstrate
that here, you will just need to have faith and trust me. With that said
and done; how many constants were loaded - and how many of them were unique?
$ ./pogohash > x
$ grep LOAD x | cut -f4 -d\ | wc -l
525
$ grep LOAD x | cut -f4 -d\ | sort | uniq | wc -l
361
The results are in. 525 cases were detected; and 361 one of them were unique.
Now; if we were to assume that the hash function will not change in the
next release (there has been no change between 0.41, 0.43
and 0.45) - it would be a lot easier to brute-force with 361 values
instead of 4,294,967,296 of them (232). By
isolating the ones we have detected; we have drastically reduced the amount
of possibilities.
So; what is the magic behind all of this?
Well - it is in fact quite complicated; implemented in around 100 lines of
code. The basic gist of it is that not only do we need to keep track of
the movw/movs and movt instructions; it needs to be
done on a register level. We must also validate that nothing else happens
to the register - for example; it may be modified by another instruction
and then isn't a constant load. Finally, an assumption that the span of
instructions should be limited to filter out false positives.
It definitely is way over the head of most people - not something that
could be explained in a few sentences, but nothing beats looking at the
source code. Of course; you must have a good understanding of the C
programming language - hopefully I have put sufficient documentation in
there to assist with explaining how it all works.
I may finalize the blog post series to show how one would go about
brute-forcing an attack to isolate which, out of the 361 constants found
would be the right magic numbers. Of course; that would only be useful
if Niantic doesn't change the hashing function altogether in the next
update.
I must state that this is purely for educational purposes - it has been
quite an interesting experience.