Finding Hidden Data: Understanding SQL Databases

BlueskyReddit
 Finding Hidden Data: Understanding SQL Databases

co-authored by Jean-Philippe Noat, Specialist, Cellebrite

The world of digital forensics presents many challenges, particularly when uncovering hidden data in structured formats like SQL databases. From mobile devices to computer systems, SQL databases store crucial information that significantly aids investigations. Understanding their structure is essential for forensic investigators to extract valuable data.

The Ubiquity of SQLite Databases

SQLite is a widely used database system found in nearly every digital device, from Android and iOS smartphones to MacOS and Windows computers. Whether investigators are analyzing messages in chat apps like WhatsApp or tracking user activity on browsers such as Firefox and Chrome, they frequently encounter SQLite databases. These databases store a range of information, including SMS logs, call histories and app interactions.

Understanding the structure and behavior of SQLite databases is crucial in digital forensics. For example, investigators often work with cache databases (cache.db), which temporarily store recent user actions. In some cases, these databases can reveal valuable details about a user's location or contacts. Since SQLite databases are global and ubiquitous, forensic examiners must know how to extract and interpret its data effectively.

Identifying SQLite Databases

A challenge in digital forensics is identifying hidden or disguised SQLite databases. These databases might not always have a “.db” extension. For instance, iPhone mail databases use the name envelope index without extension and call logs use the .storedata extension, despite being SQLite databases.

The key to identifying these files lies in their headers. A hex value of 0x53514C69 is helpful in indicating a SQLite database. Specific hex values such as 0x377F0682 or 0x377F0683 indicate Write-Ahead Log (WAL) files, crucial for transaction logging and data recovery of information associated to SQLite databases. Forensic tools like Cellebrite’s Physical Analyzer can assist in identifying and analyzing these files.

Understanding SQLite internal behavior

The first step is to understand the design of the database, including important information like the page size, database size in pages, number of free list pages or the page number of the first freelist trunk page. We also need to understand how the vacuum is configured on this database.

Vacuum could be seen as the process of reclaiming space occupied by data that has been marked for deletion yet has not been physically removed from the database. So, the database is completely rebuilt sometimes - depending on its internal configuration. By default, auto_vacuum value is disabled but it can be set to 1 (full) or 2 (incremental). This means if vacuum is configured, and there are pages in the freelist, an examiner should be able to retrieve some deleted data.

Analyzing the SQlite helps one determine if there is deleted data or not. This is critical before taking a deep diving into the database itself.

The Importance of WAL Files

Older databases have a journal file to keep trace of the transactions, but most recent databases have a WAL file and shm file.

WAL files play a significant role in SQLite databases by serving as transaction logs, which store data before it is committed to the main database. These logs often contain critical evidence, such as deleted records or uncommitted data. WAL files operate by creating frames—data pages that record every transaction. Investigators must examine both the main database and the WAL files to ensure they do not miss any crucial information.

The right forensic tools prevent unintended changes to the database when examining a WAL file. For example, opening a WAL file in a non-forensic tool could trigger a commit command, merge the WAL data with the main database, and potentially overwriting evidence. This is why forensic-grade tools are indispensable in maintaining digital evidence integrity.

The Challenges of Deleted Data

In addition to recovering deleted data, forensic experts face challenges with incremental backups or vacuums, where some pages may be removed entirely while some pages could still be present. Specialized forensic tools can detect these deleted records by analyzing the SQLite database structure and identifying gaps in record numbers. This allows investigators to trace the history of data changes and recover critical information that might otherwise be lost.

Human eyes are critical in retrieving deleted information when some pages are lost yet are still visible and could have a small part of a message. When a trained examiner or investigator knows what they are looking for (first name, dedicated keyword, etc.) it could help to rebuild the message – even partially – which could make a big difference in a case.

One of the most intriguing aspects of SQLite databases in forensics is the ability to recover deleted data. When records are deleted from an SQLite database, their pages are often marked as free and moved to a Free List depending on the vacuum configuration. These pages may still contain valuable information, and trained forensic examiners may be able to recover the data before the pages are overwritten or when the database is vacuumed. What’s more, the WAL files could have several versions of the same messages and could be a gold mine for trained forensic examiners. By default, the WAL file can store up to 1,000 transactions which means that sometimes the WAL file is more interesting to analyze than the database itself.

Best Practices for Evidence Preservation

To maximize data recovery, it is essential to anticipate potential data loss scenarios. One key practice is to perform a full file system extraction before conducting a logical extraction. This ensures that all data, including uncommitted transactions stored in the WAL file, is preserved.

Forensic examiners must also account for ephemeral data, especially in apps like Snapchat or Signal, which may automatically delete messages after a certain period. To prevent losing this data, some investigators have seen success by immediately closing those apps before beginning the extraction process. In some cases, this step has helped preserve ephemeral messages and ensures that critical evidence is not deleted before it can be recovered.

In some cases, placing devices in a Faraday environment (to block GPS signals) can help prevent updated location data from overwriting historical records.

SQL databases, particularly SQLite, hold a wealth of data in digital forensic investigations. From identifying hidden databases to recovering deleted records from WAL files or freelist, mastering SQLite’s structure is essential for forensic investigators. By leveraging forensic tools and adopting best practices, investigators can ensure the integrity of digital evidence and uncover valuable insights that help solve cases efficiently.

About the Authors

Heather Barnhart is the Senior Director of Community Engagement at Cellebrite, a global leader in premier Digital Investigative solutions for the public and private sectors. She educates and advises digital forensic professionals on cases around the globe. For more than 20 years, Heather’s worked on high-profile cases, investigating everything from child exploitation to Osama bin Laden's digital media.

Jean Philippe Noat or J.P. is a Cellebrite specialist who excels at finding deleted data and diving deep into app behavior with extensive knowledge in SQLite and unified logs. His skills helped solve prolific murder, terrorism and child exploitation cases. Knowing every case presents its unique challenges, he regularly shares his knowledge and experience (including his mistakes) to help build a safer world.

 

Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and products for the lab. Plus, get special offers from Forensice – all delivered right to your inbox! Sign up now!