Why NVIDIA's AI Servers Must Adopt Liquid Cooled SSDs
We saw a demo of how this works and the platform at GTC 2025
At NVIDIA GTC 2025, the focus was on AI servers, ranging from the small ones like the NVIDIA DGX Spark to the high-end ones like the 2027 NVIDIA Rubin NVL576. We will start to see liquid cooling locally installed NVMe SSDs in some NVIDIA GB300 systems this year, but by 2027, we expect the high-end AI server market to use liquid-cooled SSDs. The reason is quite different from liquid cooling the AI accelerators.
Solidigm’s “Liquid Cooled SSD” Demo
At NVIDIA GTC 2025, we saw a great demo on using liquid cooling to cool SSDs. Here, Solidigm has the D7-PS1010 in an E1.S form factor. The major innovation is that the thermal design of the casing is such that it can remove heat easily from both sides of the PCB mounted with the NAND, DRAM, SSD controller, and other components.
Taking a step back, the biggest challenge in liquid cooling SSDs is that they are a service item. Although failure rates of SSDs are often cited at around an order of magnitude better, having many SSDs that we know can physically wear out means that they have remained a service item.
Although the SSD form factors have been well standardized at this point, adding an inlet and outlet quick disconnect fitting is not trivial. First, there are considerations of the physical space. Second, each removal and insertion brings with it the opportunity for a drop of fluid to escape. Third, just ensuring that everything aligns when a SSD is inserted or removed can be a challenge due to chassis flex. As a result, folks are thinking differently.
Instead of bringing the liquid to the SSD, the liquid is circulated on a cold plate that is then pressed against an SSD’s metal housing. That way, the SSD assembly never has to have the fluid circulate. This design is more like making an SSD designed to be cooled in a liquid-cooled server slot.
There is another challenge, however. The server cold plate and SSD design do not have thermal interface material such as a pad or paste between the two metal surfaces. As such, it will be essential to keep pressure on the cold plate to make even contact with the SSD.
As such, the answer for what appears to be in focus for some GB300 platforms is a design where the cold plate must be moved out of the way to service the SSD. It is marginally harder, especially in the demo, to do this. Folks at the Solidigm booth were surprised I could do this with one hand.
Looking ahead to the Rubin NVL576 platforms, we can see this design, but with a significant change NVIDIA showed at GTC 2025 just to get to that density level.