Researchers from the Microsoft Threat Protection Intelligence Team and Intel Labs have come together to work on a new research project that utilized a novel approach to detect and classify malware.
The project, called STAtic Malware-as-Image Network Analysis (STAMINA), used a new technique to convert malware samples into grayscale images which were then scanned for textural and structural patterns specific to known malware samples.
During the first part of their collaboration, the researchers built on Intel’s previous work on deep transfer learning for static malware classification and used a real-world dataset from Microsoft to better understand the practical value of approaching malware classification as a computer vision task.
The STAMINA approach makes the argument that malware can be classified at scale by performing static analysis on malware codes represented as images.
Turning malware into images
The researchers first prepared the malware binaries by converting them into two-dimensional images using pixel conversion, reshaping and resizing. The binaries were then converted into a one-dimensional pixel stream by assigning each byte a value between 0 and 255 which corresponded to pixel intensity. Each Pixel stream was then transformed into a two-dimensional image by using the file size to determine the width and height of each image.
These resized images were then fed into a pre-trained deep neural network (DNN) that scanned the 2D representations of malware strains and classified them as either clean or infected. To serve as a base for the research, Microsoft provided a sample of 2.2m infected Portable Executable (PE) file hashes.
Microsoft and Intel researchers used 60 percent of the known malware samples to train the original DNN algorithm, 20 percent of the files were used to validate the DNN and the other 20 percent were used for the actual testing process. According to the research team, STAMINA was able to achieve an accuracy rate of 99.07 percent in identifying and classifying malware samples with a false positive rate of only 2.58 percent. When working with smaller files, STAMINA was accurate and fast though the project wavered when working with larger images.