OCTAGRAM August 20, 2017 at 08:56

Breaking haX completely. We read machine codes as an open book

Tutorial

If haXe is translated in C ++, and from it to machine codes, this may seem hopeless, especially since at first glance this code is full of calls to virtual methods, which, without starting the debugger, are difficult to correlate with the addresses of the methods bodies.

But all is not so bad. Even with scripting support disabled (HXCPP_SCRIPTABLE), lines with the names of methods and fields can be found in the file. We analyze how you can unwind this tangle, compare the names of the methods with their addresses and offsets in the table of virtual methods.

Bit of theory

After translation in C ++, all haXe classes are inherited from hx.Object (defined in hxcpp / include / hx / Object.h ). Of particular interest are the following methods:

   virtual Dynamic __Field(const String &inString, hx::PropertyAccess inCallProp);
   virtual Dynamic __SetField(const String &inField,const Dynamic &inValue, hx::PropertyAccess inCallProp);

These methods are overridden in classes translated in C ++, and their implementation everywhere looks something like this:

src / openfl / geom / Matrix.cpp

Dynamic Matrix_obj::__Field(const ::String &inName,hx::PropertyAccess inCallProp)
{
	switch(inName.length) {
	case 1:
		if (HX_FIELD_EQ(inName,"a") ) { return a; }
		if (HX_FIELD_EQ(inName,"b") ) { return b; }
		if (HX_FIELD_EQ(inName,"c") ) { return c; }
		if (HX_FIELD_EQ(inName,"d") ) { return d; }
		break;
	case 2:
		if (HX_FIELD_EQ(inName,"tx") ) { return tx; }
		if (HX_FIELD_EQ(inName,"ty") ) { return ty; }
		break;
	case 5:
		if (HX_FIELD_EQ(inName,"clone") ) { return clone_dyn(); }
		if (HX_FIELD_EQ(inName,"scale") ) { return scale_dyn(); }
		if (HX_FIELD_EQ(inName,"setTo") ) { return setTo_dyn(); }
		break;
	case 6:
		if (HX_FIELD_EQ(inName,"concat") ) { return concat_dyn(); }
		if (HX_FIELD_EQ(inName,"equals") ) { return equals_dyn(); }
		if (HX_FIELD_EQ(inName,"invert") ) { return invert_dyn(); }
		if (HX_FIELD_EQ(inName,"rotate") ) { return rotate_dyn(); }
		break;
	case 7:
		if (HX_FIELD_EQ(inName,"__array") ) { return __array; }
		if (HX_FIELD_EQ(inName,"toArray") ) { return toArray_dyn(); }
		break;
	case 8:
		if (HX_FIELD_EQ(inName,"copyFrom") ) { return copyFrom_dyn(); }
		if (HX_FIELD_EQ(inName,"identity") ) { return identity_dyn(); }
		if (HX_FIELD_EQ(inName,"toString") ) { return toString_dyn(); }
		break;
	case 9:
		if (HX_FIELD_EQ(inName,"copyRowTo") ) { return copyRowTo_dyn(); }
		if (HX_FIELD_EQ(inName,"createBox") ) { return createBox_dyn(); }
		if (HX_FIELD_EQ(inName,"translate") ) { return translate_dyn(); }
		break;
	case 10:
		if (HX_FIELD_EQ(inName,"to3DString") ) { return to3DString_dyn(); }
		break;
	case 11:
		if (HX_FIELD_EQ(inName,"copyRowFrom") ) { return copyRowFrom_dyn(); }
		if (HX_FIELD_EQ(inName,"setRotation") ) { return setRotation_dyn(); }
		if (HX_FIELD_EQ(inName,"toMozString") ) { return toMozString_dyn(); }
		if (HX_FIELD_EQ(inName,"__toMatrix3") ) { return __toMatrix3_dyn(); }
		break;
	case 12:
		if (HX_FIELD_EQ(inName,"copyColumnTo") ) { return copyColumnTo_dyn(); }
		if (HX_FIELD_EQ(inName,"__transformX") ) { return __transformX_dyn(); }
		if (HX_FIELD_EQ(inName,"__transformY") ) { return __transformY_dyn(); }
		break;
	case 13:
		if (HX_FIELD_EQ(inName,"__cleanValues") ) { return __cleanValues_dyn(); }
		break;
	case 14:
		if (HX_FIELD_EQ(inName,"copyColumnFrom") ) { return copyColumnFrom_dyn(); }
		if (HX_FIELD_EQ(inName,"transformPoint") ) { return transformPoint_dyn(); }
		break;
	case 16:
		if (HX_FIELD_EQ(inName,"__transformPoint") ) { return __transformPoint_dyn(); }
		break;
	case 17:
		if (HX_FIELD_EQ(inName,"createGradientBox") ) { return createGradientBox_dyn(); }
		break;
	case 19:
		if (HX_FIELD_EQ(inName,"deltaTransformPoint") ) { return deltaTransformPoint_dyn(); }
		if (HX_FIELD_EQ(inName,"__transformInverseX") ) { return __transformInverseX_dyn(); }
		if (HX_FIELD_EQ(inName,"__transformInverseY") ) { return __transformInverseY_dyn(); }
		break;
	case 22:
		if (HX_FIELD_EQ(inName,"__translateTransformed") ) { return __translateTransformed_dyn(); }
		break;
	case 23:
		if (HX_FIELD_EQ(inName,"__transformInversePoint") ) { return __transformInversePoint_dyn(); }
	}
	return super::__Field(inName,inCallProp);
}

As you can see, not only what is usually considered to be fields, but also methods, however, in dynamic wrappers, from which they still have to be pulled out, are understood to be fields in the understanding of the HaKs translator.

Training

Accordingly, it is worth starting with finding the __Field method. For example, you can get into it by following a link to a line with the name of the method. If you read what lines are in the file, then you can get to the back links, for example, in __ToString or RTTI. Of these, follow the back link to go to VMT. If the string is the name of the field, then instead of __Field, you can get into a similar __SetField method, which is less suitable, since there are no links to dynamic wrappers for methods. While in VMT, open the overridden methods (allocated by addresses) and look for which of them are similar to __Field (you can see a large switch at the beginning):

Start __Field

.text:010B3DB8 var_30          = -0x30
.text:010B3DB8 var_2C          = -0x2C
.text:010B3DB8 var_28          = -0x28
.text:010B3DB8 var_20          = -0x20
.text:010B3DB8
.text:010B3DB8                 PUSH.W          {R4-R9,LR}
.text:010B3DBC                 SUB             SP, SP, #0x14
.text:010B3DBE                 MOV             R7, R2
.text:010B3DC0                 MOV             R4, R0
.text:010B3DC2                 LDR             R0, [R7]
.text:010B3DC4                 MOV             R9, R3
.text:010B3DC6                 MOV             R5, R1
.text:010B3DC8                 SUBS            R0, #4  ; switch 28 cases
.text:010B3DCA                 CMP             R0, #0x1B
.text:010B3DCC                 BHI.W           def_10B3DD0 ; jumptable 010B3DD0 default case
.text:010B3DD0                 TBH.W           [PC,R0,LSL#1] ; switch jump
.text:010B3DD0 ; ---------------------------------------------------------------------------
.text:010B3DD4 jpt_10B3DD0     DCW 0x1C                ; jump table for switch statement
.text:010B3DD6                 DCW 0x35

Start __SetField

.text:010B48DC var_38          = -0x38
.text:010B48DC var_30          = -0x30
.text:010B48DC var_28          = -0x28
.text:010B48DC var_24          = -0x24
.text:010B48DC var_20          = -0x20
.text:010B48DC arg_0           =  0
.text:010B48DC
.text:010B48DC                 PUSH.W          {R4-R9,LR}
.text:010B48E0                 SUB             SP, SP, #0x1C
.text:010B48E2                 MOV             R7, R2
.text:010B48E4                 MOV             R8, R0
.text:010B48E6                 LDR             R0, [R7]
.text:010B48E8                 MOV             R6, R3
.text:010B48EA                 LDR             R5, [SP,#0x38+arg_0]
.text:010B48EC                 MOV             R9, R1
.text:010B48EE                 SUBS            R0, #6  ; switch 13 cases
.text:010B48F0                 CMP             R0, #0xC
.text:010B48F2                 BHI.W           def_10B48F6 ; jumptable 010B48F6 default case
.text:010B48F6                 TBH.W           [PC,R0,LSL#1] ; switch jump
.text:010B48F6 ; ---------------------------------------------------------------------------
.text:010B48FA jpt_10B48F6     DCW 0xD                 ; DATA XREF: .text:01329970↓o
.text:010B48FA                                         ; jump table for switch statement
.text:010B48FC                 DCW 0x25

__Field in the virtual method table comes earlier than __SetField, and there are usually fewer options. In this example, 13 vs 28.

The first stage: we are looking for dynamic wrappers

When both methods are found, you need to go to __Field, look where the branch goes after 0 == memcmp and give names to the wrappers. In this case, both ordinary fields and wrappers can come across. Learning how to distinguish them is easy, here is an example of an ordinary field, then a dynamic wrapper for a method:

.text:010B44B0 loc_10B44B0                             ; CODE XREF: __Field+16A↑j
.text:010B44B0                 LDR             R0, [R5,#0x20]
.text:010B44B2                 B               loc_10B4582
.text:010B44B4 ; ---------------------------------------------------------------------------
.text:010B44B4
.text:010B44B4 loc_10B44B4                             ; CODE XREF: __Field+1B0↑j
.text:010B44B4                 LDR             R2, =(get_error_dyn+1 - 0x10B44BA)
.text:010B44B6                 ADD             R2, PC ; get_error_dyn
.text:010B44B8                 B               loc_10B44D2

There was, but not in this file, such a problem that pointers to wrappers are not recognized. It looks like an abnormally large integer operand of orange color. Via Ctrl + R, it must be made an offset in the IDA.

Second stage: the simplest cases

To begin with, let's see how generally after the translation in C ++ the methods and wrappers for them are located:

src / openfl / geom / Matrix.cpp

// …
::lime::math::Matrix3 Matrix_obj::__toMatrix3( ){
	HX_STACK_FRAME("openfl.geom.Matrix","__toMatrix3",0xaf6ed17e,"openfl.geom.Matrix.__toMatrix3","openfl/geom/Matrix.hx",480,0xa0d54189)
	HX_STACK_THIS(this)
	HX_STACK_LINE(482)
	Float tmp = this->a;		HX_STACK_VAR(tmp,"tmp");
	HX_STACK_LINE(482)
	Float tmp1 = this->b;		HX_STACK_VAR(tmp1,"tmp1");
	HX_STACK_LINE(482)
	Float tmp2 = this->c;		HX_STACK_VAR(tmp2,"tmp2");
	HX_STACK_LINE(482)
	Float tmp3 = this->d;		HX_STACK_VAR(tmp3,"tmp3");
	HX_STACK_LINE(482)
	Float tmp4 = this->tx;		HX_STACK_VAR(tmp4,"tmp4");
	HX_STACK_LINE(482)
	Float tmp5 = this->ty;		HX_STACK_VAR(tmp5,"tmp5");
	HX_STACK_LINE(482)
	::lime::math::Matrix3 tmp6 = ::lime::math::Matrix3_obj::__new(tmp,tmp1,tmp2,tmp3,tmp4,tmp5);		HX_STACK_VAR(tmp6,"tmp6");
	HX_STACK_LINE(482)
	return tmp6;
}
HX_DEFINE_DYNAMIC_FUNC0(Matrix_obj,__toMatrix3,return )
Void Matrix_obj::__transformInversePoint( ::openfl::geom::Point point){
{
		HX_STACK_FRAME("openfl.geom.Matrix","__transformInversePoint",0xde42fb73,"openfl.geom.Matrix.__transformInversePoint","openfl/geom/Matrix.hx",487,0xa0d54189)
// …

It can be seen that the body of the method goes first, then a dynamic wrapper is constructed with a macro, then the next method, then a dynamic wrapper for it and so on. Since the wrappers were given names at the first stage, but the methods themselves are not yet, the IDA should have a “striped” picture in the list of routines when named routines interspersed with named ones.

This is not entirely true, but at this stage only the most obvious cases need to be processed - when there is exactly one subroutine between dynamic wrappers, and most likely this is the method. He is given a name by a wrapper that is lower than him.

Caution : there were cases, but not in this file, when the IDA did not recognize the body of the method as a subroutine, but it recognized something auxiliary coming after the method. This method is backtracked from VMT.

The third stage: when there are two subprograms between the wrappers

Dynamic wrappers are created with a macro that looks like this :

#define HX_DEFINE_DYNAMIC_FUNC0(class,func,ret) \
 ::Dynamic __##class##func(hx::Object *inObj) \
{ \
      ret reinterpret_cast(inObj)->func(); return  ::Dynamic(); \
}; \
 ::Dynamic class::func##_dyn() \
{\
   return hx::CreateMemberFunction0(this,__##class##func); \
}

As you can see, two wrappers are created here at once, typed and untyped, but typed is usually thrown out by the C ++ translator as unnecessary. If there are two nameless subprograms between dynamic wrappers at once, then most likely the first of them is the desired method, and the second is a typed wrapper.

By the beginning of the third stage, most of the methods should already be named, so if you look from VMT, then these will be single spaces, and at this stage they will be eliminated.

The fourth stage: closing large gaps in VMT

Sometimes, there are big gaps in VMT, in two or more methods. Once again, we can note the convenience of looking from VMT. So, if you miss one method during the __Field traversal, it will look like three nameless subroutines between dynamic wrappers in the list of IDA routines, but hack can generate additional routines for other needs, and then three nameless subroutines between dynamic wrappers can also be obtained.

From VMT, you can see: if there is a space of two elements, then this is a missed dynamic wrapper in __Field. We find in the list of routines where this gap is, go to the middle routine, it should be a wrapper. Using X, we open the list of backlinks, among them there should be __Field. We go there, find out the name of the wrapper, the space in the list of subprograms is “dragged on” by a strip, and then we put the names of the methods according to the described algorithm.

Hx.Object methods

For completeness, you can open hxcpp / include / hx / Object.h , write out all the virtual methods in order, and so identify the methods at the beginning of VMT.

Defining data types for fields and arguments

When methods (like all virtual methods) are called on fields and arguments, you need to understand in which VMT to look for them, and for this you need to understand what types they generally are. If you do not run the debugger, dynamic wrappers help to do this. They receive arguments of formal types (Dynamic, Dynamic, Dynamic, ...) and, to make a call, they first cast Dynamic to the actual type expected by the method. During this transformation, it is just possible to recognize these very types.

For example, if we see in the body of the wrapper:

.text:010B3884                 LDR             R1, =(off_23DE1D4 - 0x10B388E)
.text:010B3886                 MOVS            R3, #0
.text:010B3888                 LDR             R2, =(off_23E04A0 - 0x10B3890)
.text:010B388A                 ADD             R1, PC ; off_23DE1D4
.text:010B388C                 ADD             R2, PC ; off_23E04A0
.text:010B388E                 LDR             R1, [R1] ; hx_Object_ci
.text:010B3890                 LDR             R2, [R2] ; off_22D9DE0
.text:010B3892                 BLX.W           __dynamic_cast

... then you can see that the cast from hx.Object to something else is being done. If hx_Obejct_ci you have not yet identified, then both classes will be unknown, but this can be solved. We look at whose RTTI the pointers lead to (in this example, off_22D9DE0), put down the names, draw conclusions.

Similarly, __SetField comes in handy for fields, which is forced to cast the type from Dynamic to the actual type of the field, thereby giving a hint.

Static Fields and Methods

If a class has static elements, it overrides the static methods __GetStatic and / or __SetStatic. In VMT, for obvious reasons, they are not visible, but if the class has both static and regular elements, then the translated code goes in the order of __Field, __GetStatic, __SetField, __SetStatic, so knowing where __Field and __SetField you can calculate __GetStatic and __SetStatic next to them. There also at the beginning of the switch along the length of the string, and then the comparison operation.

Screencast

00:00 We find __Field and __SetField
03:00 The first stage: we are looking for dynamic wrappers
21:30 The second stage: the simplest cases
30:48 The third stage: when there are two routines between the wrappers
33:15 The fourth stage: we close large gaps in VMT
49:00 Hx.Object methods

Tags: