Search in Binary Files
Searching for specific content within text files is comparatively easy, as the bytes of text files can be clearly assigned to specific characters given the encoding used. However, the situation is different when searching within binary files.
Depending on the format, binary files are subject to a specific defined file structure, so that the same byte sequence within the same file can have a completely different meaning in different places. In one place, a byte sequence may be used to specify the size of a chunk, while in another place the exact same sequence of bytes may be interpreted as a string because this place is located within an element that stores metadata, for example.
Due to these difficulties, in this article we would like to look at the different types of byte searches that can be carried out with the help of the FileAnalyzer and what needs to be taken into account when doing that searches. We start with a simple search for simple byte sequences without considering their meaning, continue with the search for byte sequences based on numbers or strings and end with the search for paths, properties and values stored within the file structure of binary files:
- Invoking the Search Function
- Search for Byte Sequences
- Search for Numbers
- Search for Strings
- Search in the File Structure
- Copy and Save Search Results
- Automate Searches using Script Control
Invoking the Search Function
All functions presented in this tutorial can be found in the search dialog of the FileAnalyzer. This can be called up using the button "Search" under the file list, using the menu "Tools > Search" or using the keyboard shortcut CTRL + S.
In addition, the search function can also directly be called up using the right mouse button from the file structure, the detail table or the hex view. In this way, bytes from the hex view, data from the table as well as paths from the tree view of the file structure can be included in the search directly without entering anything.
Search for Byte Sequences
The simplest search within binary files is to simply search for individual bytes or longer sequences of bytes without considering the context in which the bytes appear within the file.
- You can find this type of search by selecting the option "Search for Byte Sequence" in the Search Criteria. This option gives you the opportunity of entering a free byte sequence into the search field.
- When entering, it does not matter whether you write the bytes in upper or lower case letters or whether you write or omit spaces between the individual bytes. Searching for "0D 0A", "0D0A", "0d0a" or "0d 0a" will therefore produce the same result. Alternatively, you can also mark a byte sequence within the FileAnalyzer's hex view and right-click on this selection to automatically search for this byte sequence without having to enter the byte sequence manually.
- After entering your byte sequence, you can start the search by clicking on the "Search" button. All files that are currently loaded in the file list in the FileAnalyzer's main window will be searched with this.
- The search result then appears in the search dialog in the table next to the search criteria. Even though the meaning of the individual bytes is initially of secondary importance when just searching for byte sequences, the search result nevertheless shows the byte offset, the path and the property within the file structure of the binary file in which the byte sequence was found. This gives you the possibility to better assess the meaning of the byte sequence based on the context.
However, you should keep in mind that a byte sequence found using this option may of course not only occur within a meaningful element within the file structure, but may also even span multiple elements or chunks. Therefore, if you care about the meaning of the bytes and only want to search within the ordered elements of a binary file, you should always prefer the file structure search instead, which we will discuss later.
If the byte sequence you are looking for is based on a number or a string, you don't have to go to the trouble of calculating the byte sequence required for the number or the string you are looking for yourself. In this case, you can simply use the number or string search, which we will look at in the next two sections.
Search for Numbers
The search for numbers is something like an input simplification for the search for byte sequences. This search works in the same way as searching for byte sequences, but with the difference that you do not have to enter the byte sequence directly, but instead you enter a number in the search field, from which the byte sequence to be searched for is automatically generated in regard to the selected number format.
- To use the search for numbers, first activate the option "Search for Number" from the search criteria.
- You can then enter the number you want to search for in the field "Number".
- Which byte sequence this number corresponds to depends on the size (1 to 8 bytes - how many bytes should the number be represented with?), the signedness (signed or unsigned - should the number range only include positive or also negative numbers?) as well as the endianness (little endian or big endian - are the bytes arranged from the front or the back?). You can enter this information into the fields of the same name and with each change you will receive a live preview of the generated byte sequence or an error message if your number cannot be represented using your parameters set (for example because the byte size is too small for the number or a negative number cannot be "unsigned").
- Finally, you can click on the "Search" button again in order to receive your search result.
As with the simple search for byte sequences, again, the format and the file structure are not taken into account when searching for numbers using this function. This means that the search for numbers only scans the file being searched for the byte equivalent of the number to be searched for, but does not take into account whether the bytes found actually represent this number within the file structure. Such a hit can therefore also occur within the file in the meaning of a string or even span several meaningful elements.
If you only want to find those numbers that actually occur as such as a value within the file structure, you should use the functin "Search in File Structure" instead. This function allows you to search directly for specified values within the properties actually defined in the file.
Search for Strings
Like the search for numbers, also the search for strings can be understood as a simplification of the search for byte sequences. The only difference is that with this function, a string or text can be entered instead of a number, from which the byte equivalent required for the search is then automatically generated.
- To use this function, first activate the option "Search for String" in the search criteria of the search dialog.
- You can then enter any text you want to search for in the field "String.
- It is also important to specify which encoding should be used for translating the string into bytes. You can make this setting using the "Encoding" field. There are numerous encodings available there, such as UTF-8, UTF-16 LE and BE, UTF-32 LE and BE, as well as some Latin and Windows code pages.
- Furthermore, you can use the option "Byte Order Mark (BOM)" to set whether your string should be searched for with or without the initial bytes for the byte order mark of the encoding.
- After you have entered your data, you can start the search again using the "Search" button in order to receive the table with your search results.
As with the search for numbers function, also when you search for strings you will receive a preview of the generated byte sequence below the input fields. At this point you will also receive an error message if the text you have entered cannot be displayed in your chosen encoding (for example when trying to convert Unicode characters to ASCII encoding).
Furthermore, you should note that also when searching for strings using this function, as with all three search types presented so far, the format and file structure are not taken into account. If you want to search for strings within the file structure of your files and only want to include in your search results those occurrences for which a string occurs in the file in its actual meaning, please use the search in the file structure function, which we will look at in the next section.
Search in the File Structure
The heart of the FileAnalyzer search dialog is the search in the file structure, which you can activate using the option of the same name in the search criteria. In contrast to the other search types presented so far, you can use this search to find numbers, strings and other values and properties within binary files that occur with exactly this meaning within the file structure and therefore do not only get their meaning at the pure byte level.
The search in the file structure includes the three fields "Path", "Property" and "Value", which can be used for your searches individually, together or in any combination:
- If you only search for a path and leave the other two fields blank, the FileAnalyzer will find any properties and values that are stored in this path.
- The situation is similar if you use only the field "Property". Then, regardless of the value and regardless of which chunk type the property is in, all hits that match the property you are looking for are listed.
- On the other hand, if you only use the field "Value" and leave all other fields blank, the program will find all values that match your search criteria, regardless of the path and properties in which these values are stored.
- Furthermore, also any combination of different search fields is possible. For example, if you fill out only the fields "Path" and "Property" and leave the "Value" field blank, you will find any values of this property in the specified path. If you search only for path and value, you will find any properties having assigned the searched value within the searched path, and so on.
- If you fill out all fields, you will only find the files that exactly match this search pattern, which means that in this case both the path, the property as well as the value must match your search criteria.
Path
If you want to limit your search to a specific part of the file structure, you can specify the path of this part here. It is important to know that the FileAnalyzer understands the parts respectively chunks of a file in the sense of a folder structure, through which each element of the file can be addressed uniquely. You can find out more about this topic in the introduction to the FileAnalyzer path concept. In this article you will also learn how to address paths with the same name that are thus ambiguous paths.
To avoid having to enter a path manually into the search dialog, you can simply right-click on an element within the file structure in the main window and select "Search Path" from the context menu. For ambiguous paths, the context menu contains the entries "Search Path at this Position" as well as "Search Path at all Positions" to invoke the search with or without indices in the path.
Property
The next field, "Property," allows you to restrict the search to certain properties or to search exclusively in certain file properties. For example, in the track header of the ISO Base Media files mentioned above, some properties such as "Width," "Height", "Volume" or "Duration" are stored, the names of which you can enter here for a search in order to only see values for this property in your search results.
You also do not have to enter the name of the property manually, since also in this case, you can simply right-click a row of the detail table in order to start the search directly form there. In the context menu of the detail table, you have the option of searching only for the property you clicked on (regardless of the path) as well as the option of searching for this property only within a path of the same name (again with the option of taking the path position into account or not).
Value
The "Value" field works in the same way, allowing you to search your files for an arbitrary value. You can use this field for numbers as well as for any strings. Depending on whether the fields "Path" and/or "Property" are also filled out during your search, the search for values is carried out either only within the selected paths or properties or within the entire file, regardless of element and property types.
When searching for decimal numbers, it does not matter which decimal separator you use. For example, you can use the English writing using a dot, such as 1.0, or the German writing with a comma as delimiter, such as 1.0. Both writings are also used equivalently within the file structure and will be found accordingly with such searches.
However, please note, that if you want to search for decimal numbers, you must also enter a decimal number. If, on the other hand, you enter an integer number, only integer numbers will be searched for. For example, let's say we have properties with the values 0 - 0.5 - 1 - 1.5 - 2 - 2.5 - 3. If you now search for "greater than 1", you will only get 2 and 3 as a result, since only these numbers are real integer numbers without a fraction. If, on the other hand, you want to include the decimals, you must search for "greater than 1.0". Then your search result will include the numbers 1.5 - 2 - 2.5 and 3.
As with the other two search fields, you can also start the search for values directly from the detail table by right-clicking on a table row and selecting "Find (this) Value" from the context menu, thus starting the search directly.
Search Operator
Next to both the "Property" field and the "Value" field you will find a selection box that allows you to set the operator for your search.
- By default, the operator "EQUALS" is used. This means that the property or value being searched for must exactly match the search term. A search for "abc" will therefore only find values that exactly match the string "abc".
- On the other hand, if you use the "STARTS_WITH" operator, a search for "abc" would also find "abcdef" but not "_abcdef".
- However, this would be the case if you used the operator "CONTAINS".
Other operators are "ENDS_WITH", "MATCHES_REGEX" and "CONTAINS_REGEX" (value matches all or part of a regular expression) as well as "GREATER", "GREATER_OR_EQUAL", "SMALLER", "SMALLER_OR_EQUAL" and "BETWEEN". Using the "BETWEEN" operator, you can define a range from one value to another. For example, the search term "2-5" finds the values 2, 3, 4 or 5 - but not 1, 6 or 10.
By the way, you can use search operators such as GREATER, GREATER_OR_EQUAL, SMALLER, SMALLER_OR_EQUAL or BETWEEN not only with numbers but also with text respectively strings. A search for GREATER w would therefore, for example, find values such as x, y or z (but not a, b or v) while a search for BETWEEN i-o would find j, k, l, m and n, but not a or x.
Inverse Search
Under the two fields "Property" and "Value" you will also find the option "Reverse" with which you can reverse the search criteria of the respective field. This means that if, for example, you search for a value with EQUALS 1 and check the "Reverse" box, all values that do not have the value 1 will be found.
Copy and Save Search Results
After you have generated a search result using one of the search types introduced in the last sections, you have various options for working with it and using the result for other purposes.
- Copy to Clipboard: Directly under the search result you will find the two buttons "Copy as TSV" as well as "Copy as CSV". With these buttons you can copy the entire results table of your search to the clipboard, for example to paste it into another application. The TSV function uses the tab as a separator between the individual fields of your table, while "CSV" - depending on your language settings - uses the comma or the semicolon.
- Save Result as File: With the third button under the results table "Save as" you can save the search result as a file. The formats available to you are CSV (Comma Separated Values), TSV (Tab Separated Values), XLSX (Microsoft Excel Spreadsheet), ODS (Open Document Spreadsheet), DIF (Data Interchange Format), HTML (web page) as well as TXT (Plain Text). You can select the format in the save dialog under "File Type".
- Show Result in the Original File: By clicking on a table row, you can open the original file in which the byte sequence was found in the FileAnalyzer and mark the relevant byte sequence in the hex view. This gives you an immediate overview of the context in which your search result is embedded in the file.
If the purpose of your search is to sort (out) files, you can use the buttons "Remove Found" as well as "Remove Pthers", which you can also find below the results table. The first button removes all files with hits from the file list in the FileAnalyzer, while the second button does the opposite and removes all files from the file list that do not contain any hits.
Automate Searches using FileAnalyzer's Script Control
All functions introduced in this tutorial can be operated not only via the graphical user interface of the FileAnalyzer as shown, but also via the command line. This means that the search in binary files can, for example, also be integrated into scripts and thereby automated.
You can find out more about this topic in the tutorial on script controlling the FileAnalyzer in the sections Search for Byte Sequences, Numbers, Strings and in the File Structure.