This type of technology could definitely help with disk caching on consumer products. Whilst going full flash can be great, if this technology get's cheaper or as cheap as traditional 200 gigabyte drives, having a terabyte hard drive with this to augment it would be ideal.
Yes, it requires a kernel driver. In Linux the driver bypasses SCSI/SATA, whereas in Windows and VMware it emulates SCSI to appear as a storage volume. See this presentation for more details about the operation:
thanks. It is still confusing for me. If memory bus is available like that what takes other to not host their co-processors in there? like for example having couple of very low-latency TPMs for encryption or some ASIC for computation? Is it something wrong with PCI-E bus for very low-latency IO?
This can solve the latency problem we have in ASICs for computation? We can have HUMA like interface and low latency/parallel interface to memory (which typically has been implemented better by Intel) of the device.
I think the problem is that ASICs tend to generate more heat and be larger in physical size, which don't make them the prime candidate for DIMM form factor since the DIMMs need to be passively cooled.
A subset of overclockers ram has offered a fan module (typically 3x40mm or 2x60mm) that clips on top of the ram bank and plugs into a mobo fan header; so cooling should be doable. I suspect the dimm socket is probably limited to only providing enough power for dram and couldn't feed a power hungry ASIC. I suppose you could kludge that with a molex connector on the top edge; but I'd be concerned about durability for that layout.
yeah but I still think there can be some other good ideas left exploring or maybe there are some just as research projects! but thanks this is actually very interesting.
I like how it looks... reminds me of expansion cards from the 80s and 90s with several different shapes and colors of components all crammed onto one PCB.
To me it seems like better controllers on PCIe with NVMe ar still far sufficient for NAND. They claim a write access time of 5 µs, yet in the benchmarks the latency achieved in the real world is still comparable to the Fusion IO. This tells me that both drives are still pretty much NAND-limited in their performance and the slight overhead reduction by moving from PCIe to DDR3 simply doesn't matter (yet).
And servers still need RAM usually plenty of it. NAND can never replace it due to its limited write cycles. Putting sophisticated memory controller into the CPUs and those sockets onto the PCB costs much more than comparably simple PCIe lanes. It seems like a waste not to use the memory sockets for DRAM.
One can argue that with 200 & 400 GB per ULLtraDIMM impressive capacities are possible. But any machine with plenty of DIMMs slots to spare will also have plenty of PCIe lanes available.
5 uS is much higher than the NAND itself, at least that they've given public specs for, so I don't get where that's coming from.
It's not a matter than PCIe is or isn't good enough, really, as it is that this allows 1U, and 2U proprietary FFs with little to no room for cards, to be much more capable, and practical. PCIe cards take up a lot of room themselves, and M.2 2280 takes up a lot of board space (compare it to DIMM slots).
IMFT's 64Gbit 20nm MLC has a typical page program latency of 1,300µs, so 5µs is definitely much lower than the average NAND latency. Of course, with efficient DRAM caching, write-combining and interleaving it's possible to overcome the limits of a single NAND die but ultimately performance is still limited by NAND.
This is a step in the right direction - towards eliminating DRAM. As more CPU cache becomes available on die (especially when technologies as Micron's Memory Cube take off), and as flash becomes faster, it will be possible to load everything from SSDs straight into cache. Hopefully the next step would be getting rid of a need for virtual memory.
Why on Earth would you want to get rid of virtual memory? It's the best thing since sliced cinnamon roll bread, and is a fundamental part of all non-embedded modern software.
Why on Earth would your want to have this ugly cludge if you had a choice? Maybe you enjoy all that extra complexity in your processor designs? How about al the overhead of TLBs and additional processing for every single instruction?
Virtual memory concept was introduced to look at a hard drive as an extension of main memory when a program could not fit in DRAM. This is not the case anymore.
It's true that it also has been used for a different purpose - to create separate logical address spaces for each process, which is tricky when you have limited amount of DRAM. However, it becomes much simpler if you have a 1TB of SSD address space available. Now nothing stops you from physical separation of address space on the drive. Just allocate one GB to the first process, second GB to the second, and so on. Process ID can indicate the offset needed to get to that address space. Problem solved, no need for translation. I'm simplifying of course, but when you have terabytes of memory, you simply don't have most of the problems virtual memory was intended to solve, and those that remain can be solved much more elegantly.
I don't think you have an understanding of the actual problems that virtual memory solve. The ability to swap pages out to mass storage is only one of the minor benefits virtual memory systems compared to "real" memory systems. What you are describing with process ids is still a virtual memory system, just a very simplistic and limited form. You would still have overhead of calculating a real address, while losing a significant number of other features. Virtual memory is not going to go away anytime soon.
Correct me if I'm wrong but it doesn't look bootable for consumers yet. It's certainly cool and is a good compromise between SATA SSD and RAM drive for hosting minecraft (see: unreasonable amounts of random I/O read). Still, nothing beats a ram drive if you just have too much memory, don't care about the high risk of data loss, and are willing to go through the effort of making a Proper backup system.
You assume VirtMem came about to page memory to disk. That's fine. But it isn't what its been used for for quite some time. Its extremely rare to non-existent for a modern machine to be over committed in memory.
The reality is that VM is used for: program correctness, isolation, security, sharing, et al.
Any attempt to replace a modern VM system is just going to end up looking exactly like a modern VM system. There's basically only 1 alternative out there and it is actually significantly more complex and prone to significantly more issues.
And trying to replace DRAM with NAND is pretty much a losing proposition. You will be replacing something with order nS latency with uS latency. Its never going to be a good trade-off, regardless of how much "cache" a processor has.
NAND flash may not become feasible, but doesn't NOR flash provide both shorter latencies and a generally more RAM-like programming logic? I know that NOR is used in some embedded devices to run software directly from the flash, without copying it to RAM first.
(I'm assuming you're responding to my comment above.)
I acknowledged the virtmem is being used for program isolation, and I suggested a vastly simpler method to do that. Physical separation of address spaces on disk, as opposed to a hybrid logical separation of RAM/disk, avoids most of the complexity of modern virtual memory implementation. Essentially, address translation becomes as trivial as a few extra logic gates in the program counter. No overhead, no need for complex page tables.
I agree that at present DRAM is much faster, however, first of all, the gap is closing. There's a lot of innovation going on currently in the field of non-volatile memory, so even if not NAND/NOR, some other technology, such as RRAM, might become viable. Also, you're underestimate the amount of cache that can be placed on die once you start stacking layers of SRAM vertically. In fact, it might be possible in the future to eliminate DRAM by simply having enough SRAM capacity (for example, as 100 layers on top of CPU).
I think this is a stupid idea to sacrifice a DDR channel for a storage drive. A DDR channel requires over 200 pins, while 4 PCIe channels only require 20-30 pins. DDR channel should be used for memory to improve computing performance. There is not a lot of application that requires constant read and write data out of storage, except upon booting up and shutting down. So this is a very stupid idea.
In the limited field in which you work that may be the case. I can think of several use cases where this is an improvement and more memory would not help.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
30 Comments
Back to Article
esoel_ - Monday, August 18, 2014 - link
http://www.mysqlperformanceblog.com/2014/08/12/ben...Here are some benchmarks on the IBM ones.
Kristian Vättö - Monday, August 18, 2014 - link
Awesome, thank you. I updated the article with a link to the benchmarks.nevertell - Monday, August 18, 2014 - link
This type of technology could definitely help with disk caching on consumer products. Whilst going full flash can be great, if this technology get's cheaper or as cheap as traditional 200 gigabyte drives, having a terabyte hard drive with this to augment it would be ideal.Mobile-Dom - Monday, August 18, 2014 - link
So these are SSDs that fit in RAM slots?mmrezaie - Monday, August 18, 2014 - link
Does it need special drivers? how os sees the ssd?Kristian Vättö - Monday, August 18, 2014 - link
Yes, it requires a kernel driver. In Linux the driver bypasses SCSI/SATA, whereas in Windows and VMware it emulates SCSI to appear as a storage volume. See this presentation for more details about the operation:http://www.diablo-technologies.com/wp-content/uplo...
mmrezaie - Monday, August 18, 2014 - link
thanks. It is still confusing for me. If memory bus is available like that what takes other to not host their co-processors in there? like for example having couple of very low-latency TPMs for encryption or some ASIC for computation? Is it something wrong with PCI-E bus for very low-latency IO?mmrezaie - Monday, August 18, 2014 - link
This can solve the latency problem we have in ASICs for computation? We can have HUMA like interface and low latency/parallel interface to memory (which typically has been implemented better by Intel) of the device.Kristian Vättö - Monday, August 18, 2014 - link
I think the problem is that ASICs tend to generate more heat and be larger in physical size, which don't make them the prime candidate for DIMM form factor since the DIMMs need to be passively cooled.DanNeely - Monday, August 18, 2014 - link
A subset of overclockers ram has offered a fan module (typically 3x40mm or 2x60mm) that clips on top of the ram bank and plugs into a mobo fan header; so cooling should be doable. I suspect the dimm socket is probably limited to only providing enough power for dram and couldn't feed a power hungry ASIC. I suppose you could kludge that with a molex connector on the top edge; but I'd be concerned about durability for that layout.mmrezaie - Monday, August 18, 2014 - link
yeah but I still think there can be some other good ideas left exploring or maybe there are some just as research projects! but thanks this is actually very interesting.ozzuneoj86 - Monday, August 18, 2014 - link
I like how it looks... reminds me of expansion cards from the 80s and 90s with several different shapes and colors of components all crammed onto one PCB.nathanddrews - Monday, August 18, 2014 - link
I love stuff like this... even if I'll never use it. XDHollyDOL - Monday, August 18, 2014 - link
Is there any need to have motherboard/cpu (memory controller) support for this or does it work in any DDR3 slot?MrSpadge - Monday, August 18, 2014 - link
To me it seems like better controllers on PCIe with NVMe ar still far sufficient for NAND. They claim a write access time of 5 µs, yet in the benchmarks the latency achieved in the real world is still comparable to the Fusion IO. This tells me that both drives are still pretty much NAND-limited in their performance and the slight overhead reduction by moving from PCIe to DDR3 simply doesn't matter (yet).And servers still need RAM usually plenty of it. NAND can never replace it due to its limited write cycles. Putting sophisticated memory controller into the CPUs and those sockets onto the PCB costs much more than comparably simple PCIe lanes. It seems like a waste not to use the memory sockets for DRAM.
One can argue that with 200 & 400 GB per ULLtraDIMM impressive capacities are possible. But any machine with plenty of DIMMs slots to spare will also have plenty of PCIe lanes available.
Cerb - Monday, August 18, 2014 - link
5 uS is much higher than the NAND itself, at least that they've given public specs for, so I don't get where that's coming from.It's not a matter than PCIe is or isn't good enough, really, as it is that this allows 1U, and 2U proprietary FFs with little to no room for cards, to be much more capable, and practical. PCIe cards take up a lot of room themselves, and M.2 2280 takes up a lot of board space (compare it to DIMM slots).
Kristian Vättö - Monday, August 18, 2014 - link
IMFT's 64Gbit 20nm MLC has a typical page program latency of 1,300µs, so 5µs is definitely much lower than the average NAND latency. Of course, with efficient DRAM caching, write-combining and interleaving it's possible to overcome the limits of a single NAND die but ultimately performance is still limited by NAND.p1esk - Monday, August 18, 2014 - link
This is a step in the right direction - towards eliminating DRAM. As more CPU cache becomes available on die (especially when technologies as Micron's Memory Cube take off), and as flash becomes faster, it will be possible to load everything from SSDs straight into cache.Hopefully the next step would be getting rid of a need for virtual memory.
Cerb - Monday, August 18, 2014 - link
Why on Earth would you want to get rid of virtual memory? It's the best thing since sliced cinnamon roll bread, and is a fundamental part of all non-embedded modern software.p1esk - Monday, August 18, 2014 - link
Why on Earth would your want to have this ugly cludge if you had a choice? Maybe you enjoy all that extra complexity in your processor designs? How about al the overhead of TLBs and additional processing for every single instruction?Virtual memory concept was introduced to look at a hard drive as an extension of main memory when a program could not fit in DRAM. This is not the case anymore.
It's true that it also has been used for a different purpose - to create separate logical address spaces for each process, which is tricky when you have limited amount of DRAM. However, it becomes much simpler if you have a 1TB of SSD address space available. Now nothing stops you from physical separation of address space on the drive. Just allocate one GB to the first process, second GB to the second, and so on. Process ID can indicate the offset needed to get to that address space. Problem solved, no need for translation. I'm simplifying of course, but when you have terabytes of memory, you simply don't have most of the problems virtual memory was intended to solve, and those that remain can be solved much more elegantly.
jamescox - Monday, August 18, 2014 - link
I don't think you have an understanding of the actual problems that virtual memory solve. The ability to swap pages out to mass storage is only one of the minor benefits virtual memory systems compared to "real" memory systems. What you are describing with process ids is still a virtual memory system, just a very simplistic and limited form. You would still have overhead of calculating a real address, while losing a significant number of other features. Virtual memory is not going to go away anytime soon.p1esk - Tuesday, August 19, 2014 - link
Datapath could be designed so there's no overhead of calculating the address if the offset is derived from the process id.Please do explain what are "the actual problems that virtual memory solve"? Other than those two I mentioned?
hojnikb - Monday, August 18, 2014 - link
Would it be possible to use something like that (if fast enough and latency low enough) as a "Instant" on system ?Cerb - Monday, August 18, 2014 - link
From full off? No. From a hibernate-like state? Quite likely.willis936 - Tuesday, August 19, 2014 - link
Correct me if I'm wrong but it doesn't look bootable for consumers yet. It's certainly cool and is a good compromise between SATA SSD and RAM drive for hosting minecraft (see: unreasonable amounts of random I/O read). Still, nothing beats a ram drive if you just have too much memory, don't care about the high risk of data loss, and are willing to go through the effort of making a Proper backup system.ats - Tuesday, August 19, 2014 - link
So here's the problem...You assume VirtMem came about to page memory to disk. That's fine. But it isn't what its been used for for quite some time. Its extremely rare to non-existent for a modern machine to be over committed in memory.
The reality is that VM is used for: program correctness, isolation, security, sharing, et al.
Any attempt to replace a modern VM system is just going to end up looking exactly like a modern VM system. There's basically only 1 alternative out there and it is actually significantly more complex and prone to significantly more issues.
And trying to replace DRAM with NAND is pretty much a losing proposition. You will be replacing something with order nS latency with uS latency. Its never going to be a good trade-off, regardless of how much "cache" a processor has.
ShieTar - Tuesday, August 19, 2014 - link
NAND flash may not become feasible, but doesn't NOR flash provide both shorter latencies and a generally more RAM-like programming logic? I know that NOR is used in some embedded devices to run software directly from the flash, without copying it to RAM first.p1esk - Tuesday, August 19, 2014 - link
(I'm assuming you're responding to my comment above.)I acknowledged the virtmem is being used for program isolation, and I suggested a vastly simpler method to do that. Physical separation of address spaces on disk, as opposed to a hybrid logical separation of RAM/disk, avoids most of the complexity of modern virtual memory implementation. Essentially, address translation becomes as trivial as a few extra logic gates in the program counter. No overhead, no need for complex page tables.
I agree that at present DRAM is much faster, however, first of all, the gap is closing. There's a lot of innovation going on currently in the field of non-volatile memory, so even if not NAND/NOR, some other technology, such as RRAM, might become viable.
Also, you're underestimate the amount of cache that can be placed on die once you start stacking layers of SRAM vertically. In fact, it might be possible in the future to eliminate DRAM by simply having enough SRAM capacity (for example, as 100 layers on top of CPU).
nofumble62 - Friday, August 22, 2014 - link
I think this is a stupid idea to sacrifice a DDR channel for a storage drive. A DDR channel requires over 200 pins, while 4 PCIe channels only require 20-30 pins.DDR channel should be used for memory to improve computing performance. There is not a lot of application that requires constant read and write data out of storage, except upon booting up and shutting down. So this is a very stupid idea.
WizardMerlin - Friday, December 12, 2014 - link
In the limited field in which you work that may be the case. I can think of several use cases where this is an improvement and more memory would not help.