Malware analysts routinely use the Strings program during static analysis in order to inspect a binary’s printable characters. However, identifying relevant strings by hand is time consuming and prone to human error. Larger binaries produce upwards of thousands of strings that can quickly evoke analyst fatigue, relevant strings occur less often than irrelevant ones, and the definition of “relevant” can vary significantly among analysts. Mistakes can lead to missed clues that would have reduced overall time spent performing malware analysis, or even worse, incomplete or incorrect investigatory conclusions.
SringSifter is built to sit downstream from the Strings program; it takes a list of strings as input and returns those same strings ranked according to their relevance for malware analysis as output. It is intended to make an analyst’s life easier, allowing them to focus their attention on only the most relevant strings located towards the top of its predicted output. StringSifter is designed to be seamlessly plugged into a user’s existing malware analysis stack. Once its GitHub repository is cloned and installed locally, it can be conveniently invoked from the command line with its default arguments according to:
strings <sample_of_interest> | rank_strings
StringSifter is a machine learning tool that automatically ranks strings based on their relevance for malware analysis.