Bill Buchanan

Bill Buchanan is a Professor in the School of Computing at Edinburgh Napier University, and a Fellow of the BCS, and the IET, and a Principle Fellow of the HEA. In 2017, Bill received an OBE for services to Cyber Security and has published over 280 academic papers and 29 academic books. Currently he leads the Centre for Distributed Computing, Networks, and Security and The Cyber Academy.

Well, we’ve all heard of the magic of Christmas, but let’s look at another magic thing … the magic of digital forensics. For this we have the concept of magic numbers, and which are identifiers of different file types. These magic numbers are special gifts for digital investigators, as they make the job of finding things a whole lot easier [here]. So, since it is Christmas, let’s have a bit of fun with 10 trivial facts on these magic numbers:

Trivial Fact 1: There’s an Elf in Linux. Unfortunately it’s not a Christmas Elf, but it is a magic file identifier for a LINUX executable, and where the file format starts with “.ELF” , and which defines the Executable and Linkable Format [here]:

Trivial Fact 2: The identifier for ZIP files was named after Phil Katz. At the start of a .ZIP file we will see the characters “PK”, and these are the initials of the creator of the ZIP file format. So what’s so special about PK? Ask any digital forensics investigator, and they will say that the two characters are often used to perform a quick search on a disk for ZIP files. We can see the “PK” magic number in all its glory [here]:

Trivial Fact 3: A Microsoft document is just a ZIP file. ZIP files are used to compress and package files, but it has also expanded its scope to integrate Microsoft Office documents which are now just ZIP files with an associated file extension to identify the file type [DOCX][XLSX][PPTX]:

If you ever have to change anything to do with the rights of a Microsoft document or extract some content, you just change the file extension to .ZIP, and can then open it as a ZIP file.

Trivial Fact 4: The identifier for EXE files is named after Mark Zbikowski(“MZ”). Mark was one of the lead developers of MS-DOS’s and his initials appear in the two characters of an EXE file [here]:

Trivial Fact 5: Sometimes it is good to look for TVs. Well, this fact is related to Trivial Fact 4, as the Base64 conversion for “MZ” is … “TV” [here]:

And so when an EXE is embedded into an email, it will travel in a Base64 format, such as with [here]:

Thus many network scanners look for the “TV” value within strings, as it might identify a Windows program that has been converted into a Base-64 format.

Trivial Fact 6: An Adobe Illustrator file is just a PDF. Adobe has long supported the PDF format as its main way to encapsulate a whole lot of files into a single package. The tell-tail sign of a PDF file is “%PDF”. Illustrator files are often just PDFs and can be opened in Adobe Reader [here]:

Here is an example of opening an AI file with Adobe Acrobat:

Trivial Fact 7: You don’t need X-ray eyes to see what’s going on in a program. Programs compiled from C++ often do not hide the strings within the program in the executable code. In the following we see a Linux executable and the text in the program is clear to see [link]:

The same thing happens with Microsoft Windows programs [here]:

An investigator can thus often scan across a disk and look for important identifiers, and where secret content could be embedded within an executable program.

Trivial Fact 8: Many documents just dump images and other content in their raw format. For file formats such as PDF and PPT we see images contained within the file in their original format, and where we can carve them out with tools such as scalpel. In the following we see TIF files, and PDFs contained in a single file [here]:

This helps digital forensics investigators as they can search a disk for images, even if they are contained in other files.

Trivial Fact 9: An encrypted ZIP file gives away its contents. And so you might think you can hide the contents of a ZIP files if you put a password on them. But, the names of the files can be seen in the plain when looking at the header of the ZIP file with a binary viewer. Here we see that this ZIP file contains the files “PROG2_02.PAS” and “PROG1_2.PAS” [here]:

Trivial Fact 10: RIFFs are used in music files (doh!). No, it’s not that kind of Jim Hendrix rif, as “RIFF” is defined as a Resource Interchange File Format bitstream, and is used in WAV files [here]:


So, after you have opened all your presents on Christmas Day, and bored with the Boxing Day film, here’s a little test for you: Tests: File Forensics
Edit description