Discover New Forensic Evidence with File Structure Analysis | Hack42 Labs - Expert Forensics Research

Discover New Forensic Evidence with File Structure Analysis

By Andrew Hoog | January 14, 2020

tl;dr extract and search file structure to uncover new evidence with ftree

  1. Recursively extract and save file structure
  2. Search ftree file structures for keywords
  3. Locate where search structure is located in targeted file
  4. Review new evidence

What is file structure?

Analysts can loosely couple files into two categories:

  1. Structured
  2. Unstructured

This is true for both binary as well as text files. Unstructured files do not conform to a set structure. While they may contain useful evidence, they generally have to be analyzed individually or searched (if they are text files).

Conversely, structured files follow a specification which is is a huge plus for forensics. This allows us to understand the structure of the file and its data, analyze and search in a consistent manner and extract data.

Some structures files are quite simple such as a plaintext XML or JSON file. These files are both machine and human readable, self-describing and easy to understand.

Other structured files are not easily read by humans and range from straightforward to very complex files which can support sophisticated processing. Some examples include:

  1. jwt tokens
  2. Binary plist files (see fplist viewer)
  3. JPEG images
  4. protocol buffer
  5. SQLite databases

File structure in forensic analysis

Today, forensic analysts consistently leverage file structure for their analysis however it happens behind the scenes so many folks don’t even think about.

When you are reviewing text messages, exif data in JPEG files, Apple unified logs, etc. the tools you use are leveraging file structures to present the data to you. However, many tools only focus on the most commonly analyzed files potentially leaving highly responsive evidence undiscovered.

There’s an important distinction to make about file structure: we are not talking about the actual data at this point. The file structure is literally the structure and metadata of a file.

As an example, most reader know that images and movies can contain metadata, often in the exif format. The file structure the the combination of how exif data is encoded (and extracted) from images as well as the “key” sometimes it’s data type but not the value itself. So, the field names like:

  • Camera Model Name
  • ISO
  • Focal Length
  • GPS Latitude

File formats like Apple’s binary plist have specify the key primitive data type such as string, integer, arrays and even data (which are basically byte streams that can represent any file or value).

One of the most common structures binary file formats forensic analysts come across are SQLite database files. These files are incredibly sophisticated and include not only a wide range of data types but also support caching, indexing, crash protection and more.

And let’s not forget the file system itself and Brian Carrier’s amazing File System Forensic Analysis.

How to leverage file structure for forensic analysis

In order to go beyond what commercial tools provide, you need to be able to :

  1. Extract file structure
  2. Recursively run extraction against a set of files
  3. Save results in a searchable format
  4. Explore the data to discover new evidence

This is a key goal of my free ftree tool. It takes care of 1-3 above and I’ll demonstrate how to search the resulting data for new evidence.

Recursively extract and save file structure

First, download and then install ftree. You can read the ftree overview for additional details.

From there, simple run ftree against your evidence directory and choose which file format that’s best suited for your use case (or that your most comfortable with): json, sqlite or csv.

ftree currently extracts file structure for the following file types:

  • SQLite
  • Binary property list (bplist)
  • Property list (plist)
  • XML
  • JPEG exif

You can then search for the type of evidence you are looking for. For this example, we’re going to see if we can uncover files where GPS data (specifically longitude) might be present in an iPhone 6 backup.

running ftree

$ ftree generate -f sqlite -o ./iphone6-backup.sqlite iphone6-backup
✔ Crawled 8285 files/dirs (0 inaccessible)
✔ No errors accessing files
✔ No files >1GB found
Saved sqlite file to ./iphone6-backup.sqlite

It took about 30 seconds to crawl the 8,000+ file and “destructure” the known file types. If you prefer save the results in json or csv, just modify the format switch:

ftree - json and csv output

$ ftree generate -f json -o ./iphone6-backup.json iphone6-backup

$ ftree generate -f csv -o ./iphone6-backup.csv iphone6-backup

Note: Excel limits spreadsheet to 32k characters per cell so the structure of some files is truncated. While I understand the simplicity and interoperability of using cvs, I tend to use as a last resort.

With the data in hand, you can quickly see how many files had structure, e.g.:

$ sqlite3 iphone6-backup.sqlite 'select count(id) from ftree where structure_hash is not null'
2919

and out of curiosity you can see the total number of file structure elements extracted:

$ sqlite3 iphone6-backup.sqlite 'select sum(element_count) from ftree where structure_hash is not null'
2506336

So in about 30 seconds we now extracted over 2.5m file structures from nearly 3,000 files. Pretty cool! :-)

Search ftree file structures

Searching the actual file structure data is also fast and easy.

searching in sqlite3

The following query will look for the word latitude (case insensitive) in any of the file structure data:

$ sqlite3 iphone6-backup.sqlite 'SELECT key, magic_base FROM ftree WHERE structure_data like "%latitude%"'
./00/001d122ffe3459755880d03f3b45d0deffb0fbca|JPEG image data
./f3/f30d6ef41c65177e0d949cbbefa7e114bb39a212|Apple binary property list
./20/2041457d5fe04d39d0ab481178355df6781e6858|SQLite 3.x database
<truncated>

The query actually returned 296 rows returned in 283ms. Here’s the results by file type:

$ sqlite3 iphone6-backup.sqlite 'SELECT magic_base, count(id) FROM ftree WHERE structure_data like "%latitude%" GROUP BY magic_base'
Apple binary property list|2
JPEG image data|288
SQLite 3.x database|6

The two plist files were:

  • Clock app, specifically the World Clock settings (Library/Preferences/com.apple.mobiletimer.plist)
  • Data related to the “iFind iPhone” app (Library/Preferences/com.apple.mobileme.fmip1.plist)

Obviously the JPEG files has exif properties. If you wanted to zero in on one of the file types, you could modify the query with:

$ sqlite3 iphone6-backup.sqlite 'SELECT key, magic_base FROM ftree WHERE structure_data like "%latitude%" AND magic_base = "SQLite 3.x database"'
./0d/0d609c54856a9bb2d56729df1d68f2958a88426b|SQLite 3.x database
./20/2041457d5fe04d39d0ab481178355df6781e6858|SQLite 3.x database
./12/12b144c0bd44f2b3dffd9186d3f9c05b917cee25|SQLite 3.x database
./40/4096c9ec676f2847dc283405900e284a7c815836|SQLite 3.x database
./4f/4f98687d8ab0d6d1a371110e6b7300f6e465bef2|SQLite 3.x database
./69/69a2779c0de95b5fb93aeceecc36db61b7746ae3|SQLite 3.x database

which returns 6 results you can then explore.

locate where search structure is located in targeted file

You could then simply browse the data to locate the data however there are tons of tables and columns in each table. Here’s a quick way to return the structure data (stored as json), format it for reading on the terminal with jq and piping it through less so you can paginate and search:

$ sqlite3 iphone6-backup.sqlite 'SELECT structure_data FROM ftree WHERE key = "./12/12b144c0bd44f2b3dffd9186d3f9c05b917cee25"' | jq . | less
{
  "ACHANGE": {
    "rowCount": 638899,
    "columns": [
      {
        "name": "Z_PK",
        "type": "INTEGER",
        "constraint": "PRIMARY KEY"
      },
<snip>
  "ZGENERICASSET": {
    "rowCount": 885,
    "columns": [
      {
        "name": "Z_PK",
        "type": "INTEGER",
        "constraint": "PRIMARY KEY"
      },
<snip>
      {
        "name": "ZLASTSHAREDDATE",
        "type": "TIMESTAMP",
        "constraint": ""
      },
      {
        "name": "ZLATITUDE",
        "type": "FLOAT",
        "constraint": ""
      },
      {
        "name": "ZLONGITUDE",
        "type": "FLOAT",
        "constraint": ""
      },
<truncate>

For each table, you can see if there are any rows of data. Obviously the ACHANGE table has tons of records. After a quick search for “LATITUDE”, I saw that it was in a table called ZGENERICASSET which has 885 row. So that sounds like a great place to start. Since the table has 89 columns, I’ve truncated the output of one record:

example latitude data discovered

$ echo '.mode line \nSELECT * from ZGENERICASSET LIMIT 1' | sqlite3 iphone6-backup/12/12b144c0bd44f2b3dffd9186d3f9c05b917cee25

                         Z_PK = 1294
                        Z_ENT = 31
                        Z_OPT = 187
             ZCLOUDLOCALSTATE = 1
                       ZWIDTH = 2448
                   ZADDEDDATE = 569727803.608739
         ZADJUSTMENTTIMESTAMP = 593493605.816124
                 ZDATECREATED = 569727803.355859
       ZFACEADJUSTMENTVERSION = 593493605.816124
              ZLASTSHAREDDATE =
                    ZLATITUDE = 41.8827133333333	
                   ZLONGITUDE = -87.6233066666667
            ZMODIFICATIONDATE = 599373659.983771
                   ZSORTTOKEN = 569727803.355859
                   ZDIRECTORY = DCIM/100APPLE
                    ZFILENAME = IMG_0094.JPG
          ZORIGINALCOLORSPACE = sRGB IEC61966-2.1
       ZUNIFORMTYPEIDENTIFIER = public.jpeg
                        ZUUID = 2B01893B-E75A-40E9-953C-78A3E5FFE0B7
           ZIMAGEREQUESTHINTS =

As you can see, this database tracks precise latitude and longitude along with other useful data (e.g. timestamps).

searching in csv

If you’re not comfortable working with databases, you could open the .csv file up in Excel or search it on the command line:

$ grep -i latitude iphone6-backup.csv

which also returned 269 results (whew!).

searching in json

I’ve still only scratched the surface with jq so only have a partial example to share. My inital attempts with using select with then either contains (no way to do case-insensitive) or test (supports regex but hit snag pulling up the whole record). Here’s a way to leverage jq and grep but there’s certainly a more effective approach:

$ jq '. | select(.structure_data)' iphone6-backup.json  | grep -i latitude | wc -l
     296

As you can see, this returns the same 296 results. However, it only returns the “structure_data” data and we need more of the record. So for now, I’m just going to leave this here for my future reference!