Phison already offers on-the-fly encryption on our Opal and FIPS 140-2 SSD products. As mentioned above, this works because it is a capability that can operate on data that is already going to the SSD. Compression is easy to accommodate on the SSD and aligns with the streaming model concept, but it provides limited benefit given that most of the bulk data (Photos, Video or Music) is already fully compressed. There are large data sets that can benefit from compression, but the use-case is relatively uncommon, so it tends to be delegate to dedicate server appliances.
The case for dedupe breaks the streaming model for several reasons:
1) It requires a huge amount of memory to track the hashes for each sector.
2) SSD’s are already fully tasks in datacenter environments, so any work spent searching is taken away for host IO
The only real benefit in having the SSD perform the search is a slight reduction in PCIe bus transfer time and a reduced load on the host CPU. Conversely the SSD has to go up in cost due to higher computational requirements and additional DRAM. Its active power also necessarily has to go up. The dedupe problem is better implemented using spare system resources, particularly over night when people are sleeping, instead of adding 10-20% SSD.
A type of computational hybrid devices exist today and it is very successful: Smart NIC. They combine a high speed NIC (typ. 10 GB/s) with a powerful CPU or FPGA. Though this combination works for NIC, it does not work as well for storage. The reason is fairly straight forward. The Smart part of the NIC is processing data that is already passing through the NIC to the host. The Smart NIC works well when it can process data as it streams through or when the Smart NIC is capable of servicing a request by directly accessing resources within the chassis.
The typical value proposition for Computational Storage is presented as followed: the SSD is closer to the data, it frees up bus bandwidth and it offloads the host CPU. At face value Computation Storage appears to be an easy sell, but it hasn’t turned out that way.
First the SSD today is already using 100% of it’s resources and power budget to service its primary function. In many cases, high density enterprise SSD have to limit performance to avoid exceeding their power or cooling budget. Second the SSD are typically using small CPU cores that are nowhere near what the host CPU or a GPU can do. Third, this experiment has already been tried before Computation Storage was a buzzword. One company attempted to combine a GPU and SSD, but the solution ended up degrading both technologies. To meet the GPU requirements, the SSD had to run very fast and add significant heat load to the GPU. The GPU is much hotter than an SSD and created substantial retention stress on the NAND. Lastly, an SSD is a consumable item that has a finite write bandwidth, whereas a GPU can run indefinitely until it becomes obsolete.
Taking a different approach, we could add a more powerful CPU directly on the SSD. Then we run into the RAM problem. Today most enterprise SSD maintains a 1000:1 NAND to DDR ratio. The SSD only needs to pull a few bytes for every 4K LBA translation so the DDR bandwidth is relatively low. This means SSD can use slower grade DRAM which lowers the entire module cost. Adding a larger guest CPU to the SSD along with more DDR for applications decreases the power available for the SSD’s primary role of providing IO to the main host. It also increases the SSD cost, but does not provide a proportional gain in compute power.
Then there is the problem with how storage is deployed today that has to be addressed. Data is usually aggregated into multi-unit RAID sets and so no one SSD will ever see the full data set. We could change the way storage is used, ensuring each SSD always sees complete data elements and use full replication to ensure redundancy. This is not likely to take hold because this model does a poor job of sharing storage bandwidth if one SSD contains more data that is currently needed. RAID stripes address this problem by staggering the accesses so that each subsequent client starts shortly after the current client. We could extend the model where each SSD has a full copy of a data set by implementing replication across multiple units, but then we have to add a lookup and load share mechanism. Duplication also has a much higher storage footprint than simple RAID5 or RAID6. Simply put, the way we use storage today is cost effective, easy to deploy and works well for most scenarios. Completely changing the storage infrastructure for what amounts to adding a few server CPU is hard to justify.
Despite the downside for general purpose Computation Storage, there are specific cases it does make sense. It occurs when the storage use-case mirrors the winning case for Smart NIC. That is to say that the SSD only has to process the data once as it moves through the device. We can associate encryption and compression with computational storage, but that’s a stretch. It is more accurate to define these two use-cases as in-line or streaming data processing using a very simple algorithm.
Phison and one of our customers developed a product where we have found a Computational Storage application that is well suited to the SSD. It does not require a large amount of memory or CPU power and does not interfere with the primary purpose of the SSD which is storage IO. We are developing a security product that uses machine learning to look for signs the data is being attacked. It can identify ransomware and other unauthorized activities with no measurable impact on the SSD performance.