FileAnalyzer

Search in Binary Files

Searching for specific content within text files is comparatively easy, as the bytes of text files can be clearly assigned to specific characters given the encoding used. However, the situation is different when searching within binary files.

Depending on the format, binary files are subject to a specific defined file structure, so that the same byte sequence within the same file can have a completely different meaning in different places. In one place, a byte sequence may be used to specify the size of a chunk, while in another place the exact same sequence of bytes may be interpreted as a string because this place is located within an element that stores metadata, for example.

Due to these difficulties, in this article we would like to look at the different types of byte searches that can be carried out with the help of the FileAnalyzer and what needs to be taken into account when doing that searches. We start with a simple search for simple byte sequences without considering their meaning, continue with the search for byte sequences based on numbers or strings and end with the search for paths, properties and values ​​stored within the file structure of binary files:

Invoking the Search Function

All functions presented in this tutorial can be found in the search dialog of the FileAnalyzer. This can be called up using the button "Search" under the file list, using the menu "Tools > Search" or using the keyboard shortcut CTRL + S.

In addition, the search function can also directly be called up using the right mouse button from the file structure, the detail table or the hex view. In this way, bytes from the hex view, data from the table as well as paths from the tree view of the file structure can be included in the search directly without entering anything.

Search for Byte Sequences

The simplest search within binary files is to simply search for individual bytes or longer sequences of bytes without considering the context in which the bytes appear within the file.

However, you should keep in mind that a byte sequence found using this option may of course not only occur within a meaningful element within the file structure, but may also even span multiple elements or chunks. Therefore, if you care about the meaning of the bytes and only want to search within the ordered elements of a binary file, you should always prefer the file structure search instead, which we will discuss later.

If the byte sequence you are looking for is based on a number or a string, you don't have to go to the trouble of calculating the byte sequence required for the number or the string you are looking for yourself. In this case, you can simply use the number or string search, which we will look at in the next two sections.

Search for Numbers

The search for numbers is something like an input simplification for the search for byte sequences. This search works in the same way as searching for byte sequences, but with the difference that you do not have to enter the byte sequence directly, but instead you enter a number in the search field, from which the byte sequence to be searched for is automatically generated in regard to the selected number format.

As with the simple search for byte sequences, again, the format and the file structure are not taken into account when searching for numbers using this function. This means that the search for numbers only scans the file being searched for the byte equivalent of the number to be searched for, but does not take into account whether the bytes found actually represent this number within the file structure. Such a hit can therefore also occur within the file in the meaning of a string or even span several meaningful elements.

If you only want to find those numbers that actually occur as such as a value within the file structure, you should use the functin "Search in File Structure" instead. This function allows you to search directly for specified values ​​within the properties actually defined in the file.

Search for Strings

Like the search for numbers, also the search for strings can be understood as a simplification of the search for byte sequences. The only difference is that with this function, a string or text can be entered instead of a number, from which the byte equivalent required for the search is then automatically generated.

As with the search for numbers function, also when you search for strings you will receive a preview of the generated byte sequence below the input fields. At this point you will also receive an error message if the text you have entered cannot be displayed in your chosen encoding (for example when trying to convert Unicode characters to ASCII encoding).

Furthermore, you should note that also when searching for strings using this function, as with all three search types presented so far, the format and file structure are not taken into account. If you want to search for strings within the file structure of your files and only want to include in your search results those occurrences for which a string occurs in the file in its actual meaning, please use the search in the file structure function, which we will look at in the next section.

Search in the File Structure

The heart of the FileAnalyzer search dialog is the search in the file structure, which you can activate using the option of the same name in the search criteria. In contrast to the other search types presented so far, you can use this search to find numbers, strings and other values ​​and properties within binary files that occur with exactly this meaning within the file structure and therefore do not only get their meaning at the pure byte level.

The search in the file structure includes the three fields "Path", "Property" and "Value", which can be used for your searches individually, together or in any combination:

Path

If you want to limit your search to a specific part of the file structure, you can specify the path of this part here. It is important to know that the FileAnalyzer understands the parts respectively chunks of a file in the sense of a folder structure, through which each element of the file can be addressed uniquely. You can find out more about this topic in the introduction to the FileAnalyzer path concept. In this article you will also learn how to address paths with the same name that are thus ambiguous paths.

To avoid having to enter a path manually into the search dialog, you can simply right-click on an element within the file structure in the main window and select "Search Path" from the context menu. For ambiguous paths, the context menu contains the entries "Search Path at this Position" as well as "Search Path at all Positions" to invoke the search with or without indices in the path.

Property

The next field, "Property," allows you to restrict the search to certain properties or to search exclusively in certain file properties. For example, in the track header of the ISO Base Media files mentioned above, some properties such as "Width," "Height", "Volume" or "Duration" are stored, the names of which you can enter here for a search in order to only see values ​​for this property in your search results.

You also do not have to enter the name of the property manually, since also in this case, you can simply right-click a row of the detail table in order to start the search directly form there. In the context menu of the detail table, you have the option of searching only for the property you clicked on (regardless of the path) as well as the option of searching for this property only within a path of the same name (again with the option of taking the path position into account or not).

Value

The "Value" field works in the same way, allowing you to search your files for an arbitrary value. You can use this field for numbers as well as for any strings. Depending on whether the fields "Path" and/or "Property" are also filled out during your search, the search for values ​​is carried out either only within the selected paths or properties or within the entire file, regardless of element and property types.

When searching for decimal numbers, it does not matter which decimal separator you use. For example, you can use the English writing using a dot, such as 1.0, or the German writing with a comma as delimiter, such as 1.0. Both writings are also used equivalently within the file structure and will be found accordingly with such searches.

However, please note, that if you want to search for decimal numbers, you must also enter a decimal number. If, on the other hand, you enter an integer number, only integer numbers will be searched for. For example, let's say we have properties with the values ​​0 - 0.5 - 1 - 1.5 - 2 - 2.5 - 3. If you now search for "greater than 1", you will only get 2 and 3 as a result, since only these numbers are real integer numbers without a fraction. If, on the other hand, you want to include the decimals, you must search for "greater than 1.0". Then your search result will include the numbers 1.5 - 2 - 2.5 and 3.

As with the other two search fields, you can also start the search for values ​​directly from the detail table by right-clicking on a table row and selecting "Find (this) Value" from the context menu, thus starting the search directly.

Search Operator

Next to both the "Property" field and the "Value" field you will find a selection box that allows you to set the operator for your search.

Other operators are "ENDS_WITH", "MATCHES_REGEX" and "CONTAINS_REGEX" (value matches all or part of a regular expression) as well as "GREATER", "GREATER_OR_EQUAL", "SMALLER", "SMALLER_OR_EQUAL" and "BETWEEN". Using the "BETWEEN" operator, you can define a range from one value to another. For example, the search term "2-5" finds the values ​​2, 3, 4 or 5 - but not 1, 6 or 10.

By the way, you can use search operators such as GREATER, GREATER_OR_EQUAL, SMALLER, SMALLER_OR_EQUAL or BETWEEN not only with numbers but also with text respectively strings. A search for GREATER w would therefore, for example, find values ​​such as x, y or z (but not a, b or v) while a search for BETWEEN i-o would find j, k, l, m and n, but not a or x.

Inverse Search

Under the two fields "Property" and "Value" you will also find the option "Reverse" with which you can reverse the search criteria of the respective field. This means that if, for example, you search for a value with EQUALS 1 and check the "Reverse" box, all values ​​that do not have the value 1 will be found.

Copy and Save Search Results

After you have generated a search result using one of the search types introduced in the last sections, you have various options for working with it and using the result for other purposes.

If the purpose of your search is to sort (out) files, you can use the buttons "Remove Found" as well as "Remove Pthers", which you can also find below the results table. The first button removes all files with hits from the file list in the FileAnalyzer, while the second button does the opposite and removes all files from the file list that do not contain any hits.

Automate Searches using FileAnalyzer's Script Control

All functions introduced in this tutorial can be operated not only via the graphical user interface of the FileAnalyzer as shown, but also via the command line. This means that the search in binary files can, for example, also be integrated into scripts and thereby automated.

You can find out more about this topic in the tutorial on script controlling the FileAnalyzer in the sections Search for Byte Sequences, Numbers, Strings and in the File Structure.