Due to the recent patched vulnerabilities in Adobe Illustrator (CVE-2012-2023, CVE-2012-2024, CVE-2012-2025, and CVE-2012-2026) it becomes interesting to analyze the exploitability facts of the .ai file format. Early versions of the AI file format are true EPS files with a restricted, compact syntax, with additional semantics represented by Illustrator-specific DSC comments that conform to DSC's Open Structuring Convention. Originally, the AI file format was an augmented subset of postscript/eps and until version 7 its internals are described here. This EPS based file format can still be opened with modern Adobe software but nowadays it is embedded into a PDF shell file. As Postscript is itself a programming language with conditionals, loops and everything else, it may be interesting to research what can be done with it in the different programs that accept this format. For ps detail see this, this or this.
Postscript Heap SprayIllustrator operator 'XI' is used to embed and paste a 'raster' image in an illustration. From the Adobe Illustrator File Format Specification:
[ a b c d tx ty ] llx lly urx ury h w bits ImageType AlphaChannelCount reserved bin-ascii ImageMask XIPlaying with the width and height of the image we can easily make Illustrator allocate 1M of controlled data. For example an image of 128x8190 pixels will consume 1 Megabyte of memory. Note that both the width and the height shall be less than 32k pixels or it will hit an implementation limit. Dimensions like 1x1048576 seem to be out of the question. Here there is an example of a file which will allocates ~1M of "A"s, IllustratorHS1M.ai:
Arguments to the XI operator specify the location and size of the image, its pixel bit depth, color type, and other attributes
The metadata needed for representing this image in memory consumes near 0x100 bytes extra. We need to ask for a little less of pixel data (128x8190 = 1048320 bytes) in order to get the desired rounded megabyte. A screenshot of a debugging session of Illustrator after opening a file like this follows.
Note that at the beginning (and also at the end) of the VirtualAlloced memory there is a bit of memory (0x80bytes) of uncontrolled metadata, this won't affect much a normal heap spraying scenario. A ready to try python script that generates such a file is here, and the example AI/EPS file is here.
The simplest way to spray the Illustrator memory is to repeatedly include one of this images. It could be interesting to analyze the factibility of using a postscript 'for' or 'repeat' statement for this task, though we haven't went that way, instead we use the PDF way.
Structure of the Illustrator PDF shellHere we discus the modern PDF encapsulated Ai format and focus on the PDF part. In the current version of the format, the Illustrator pseudo postscript is embedded in a PDF shell, specifically in PDF Streams. At first glance a .ai file pass for a normal PDF. Even the file unix program recognize it as a PDF:
$ file illustration.aiThe PDF /PieceInfo key in a /Page dictionary points to the Illustrator private data. If present Illustrator uses this private data to render the illustration and try to parse the normal PDF page contents otherwise.
illustration.ai: PDF document, version 1.5
PDF Reader opens the page content and ignore the Illustrator/private bit. An exploit for illustrator doesn't affect the Reader, and in fact exploits for one and the other can co-exist in the same PDF/AI file.The minimal structure an Ai PDF must comply to be interpreted by Illustrator as a vector graphic is simple. The actual postscript must be divided in chunks and linked from a normal PDF page like this...
C1...C100 are PDF streams holding chunks of the postscript illustration. As the data is contained in PDF Streams, all the compression facilities available in the PDF format become available for free. For example and most notably: deflate compression.
Each private Ai chunk must be 64k bytes or less, and it shall be linked with sequential keys from the AIPrivate dictionary like this:
The trickSeveral /AIPrivateData references can point to the same PDFStreams. This way repetitions in the postscript data can be arranged so they are repeated using several references to the same stream. This saves saving a lot of space. PDFREFX here are PDF referesnces to indirect objects/streams like "R 0 10" and we can simply make it point to the same object.
The python script
This python script used to construct this heap spraying files . It's easily configurable from the command line.
$ python IllustratorHS.py --helpThe default action is to output an "X" heap spraying .ai file to stdout. The resultant file will fill 300 megabytes of memory with the character 'X'. The script will construct the image so the the 0x1000 bytes chunk is repeated several times until it fills each megabyte. It also will play well with the 0x80 bytes of metadata so we can predict with great deal of probability what is under a selected address like 0x18EB0090.
Usage: IllustratorHS.py [options]
Adobe Illustrator HeapSpray PoC
-h, --help show this help message and exit
--verbose For debugging
--doc Print detailed documentation
--size=size Size in megabytes to spray. (300M spray means 500k file)
--chunk=chunk File containing the data to spray. Shall be less than 0x1000
bytes, and it will be padded to that size. The default is to
spray with a lot of "X"
The running exampleHere there is a screenshot of Adobe Illustrator under the debugger when sprayed. Note the are several 0x10000000 sized memory maps. Any nice OS would try to put each map consecutive to the previous to prevent fragmentation. Where the OS will accommodate the pack of 300 different 1Megabyte maps is the only source of unreliability.
At 0x18EB0090 there is the 0x90th byte of the 0x1000 bytes chunk. Question: Does the address 0x18EB0000 have the same chance than 0x18EB0090 of being controlled?