## Estimating 3D XPoint Die Size

By now most of you probably know that I'm a sucker for die sizes and since this is information that the DRAM and NAND vendors are unwilling to share, I've gone as far as developing my own method for estimating the die size (well, it's really just primary school geometry, so I can't take too much credit for it). Die size is the key factor in determining cost efficiency because it directly relates to the number of gigabytes each wafer yields and thus it's a vital metric for comparing different technologies and process nodes.

I'm borrowing the above picture from The SSD Review because to be honest my wafer photos (and photos in general) are quite horrible and wafers are far from being the easiest object given all the reflections. Sean is a professional photographer, so he managed to grab this clear and beautiful photo of the production 3D XPoint wafer Intel and Micron had on display, making it easy to estimate the die size.

I calculated 18 dies horizontally and 22 vertically, which yields 227mm^2 with a normal 300mm wafer. When taking die cuts (i.e. the space between dies) into account, we should be looking at 210-220mm^2. Array efficiency is about 90%, which is much higher than planar NAND because most of the peripheral circuitry lies underneath the memory array.

IMFT 20nm 128Gbit MLC NAND die

For comparison, Intel-Micron's 20nm 128Gbit MLC NAND die measures 202mm^2 and has array efficiency of ~75%. From that we can calculate that the 128Gbit memory array in 3D XPoint takes about 190mm^2, while a similar capacity planar NAND array measures ~150mm^2 (since the 128Gbit 3D XPoint die consists of two layers and 128Gbit MLC NAND die stores two bits per cell, the number of layers and bits stored per cell cancel out). It seems like NAND is denser (about 20-25%) from a memory array perspective given a fixed feature size (i.e. lithography), but at this point it's hard to say whether this is due to the cell design itself or something else. Connecting layers of wordlines and bitlines to the intermetal layers likely takes some extra area compared to a 2D process (at least this is the case with 3D NAND), which might partially explain the lower density compared to NAND.

However we will have to wait for some SEM photos to really see what's happening inside the 3D XPoint array and how it compares to NAND in cell size and overall density efficiency. Of course, there is a lot more in total manufacturing cost than just the cell and die size, but I'll leave the full analysis to those with the proper equipment and deeper knowledge of semiconductor manufacturing processes.

## What Happens to 3D NAND

The above analysis already gives a hint that 3D XPoint isn't about to replace 3D NAND, at least not in the foreseeable future. That's also what Intel and Micron clearly stated when asked about 3D XPoint's impact on 3D NAND because it's really a new class of memory that fills a niche that DRAM and NAND cannot. The companies are still looking forward to rolling out 3D NAND next year and have a strong roadmap of future 3D NAND generations.

As I mentioned earlier, the way 3D XPoint array is built is quite different from 3D NAND and my understanding is that it's less economical, which is one of the reasons why the first generation product is a two-layer design at 20nm rather than dozens of layers at a larger lithography with single patterning like 3D NAND is. Unless there's a way to build 3D XPoint arrays more like 3D NAND (i.e. pattern and etch multiple layers at the same time), I don't see 3D XPoint becoming cost competitive with 3D NAND anytime soon, but then again it's not aimed to be a NAND successor in short-term.

What happens in ten year's time is a different question, though. 3D NAND does have some inherent scaling obstacles with vanishing string current likely being the biggest and most well known at this point. Basically, the channel in each 3D NAND "cell tower" (i.e. a stack of layers, currently 32 for Samsung and Intel-Micron) is a single string that the electrons have to flow through to reach every individual cell in the string. The problem is that as the length of the string increases (i.e. more layers are added), it becomes harder to reach the top cells because the cells on the way cause disturbance, reducing the overall string current (hence the name "vanishing string current"). For those who are interested in a more detailed explanation of this issue along with some experimental data, I suggest you head over to 3D Incites and read Andrew Walker's post on the topic.

Since most vendors haven't even started 3D NAND mass production, it's not like the technology is going to hit a wall anytime soon and e.g. Toshiba-SanDisk's 15nm NAND has strings consisting of 128 cells, but like any semiconductor technology 3D NAND will reach a scaling limit at some point. Whether that is in five, ten or twenty years is unknown, but having a mature and scalable technology like what 3D XPoint should be at that point is important.

The Technology Products & Applications

• #### FunBunny2 - Friday, July 31, 2015 - link

If you want to know what's being sold, go back and look up Unity Semiconductor's CMOx tech. Rambus bought them, then Rambus and Micron settled, including a patent sharing arrangement. The last Unity CEO said, just before Rambus bought them, that 2015 was production year. Could be.
• #### nwarawa - Friday, July 31, 2015 - link

I can't wait for this to be a normal conversation:
A:"How much storage do you have?"
B:"256GB"
B:"Yes."
• #### ajp_anton - Friday, July 31, 2015 - link

10^15 P/E cycles for DRAM? How does this work?, as typical DRAM does on the order of 10^16 cycles in a year. I'm assuming a P/E cycle is the same as a clock cycle because of the constant refreshing, is this wrong?
• #### Crazy1 - Saturday, August 1, 2015 - link

I had to look this up, but the DDR3 standard calls for at least 8 refresh commands every 7.8 usec. Rounding down to the nearest 50ns, means to one refresh every 950 ns. When calculated out, that equals roughly 3.32x10^13 cycles/year. That means DDR3 should survive up to 30 years with a 10^15 P/E cycles rating, while never turning off your computer or putting it in hibernate.

In a refresh cycle, the information in a cell is read, then rewritten. There is no erase. I'm not sure the speed a typical P/E cycle occurs when erasing and writing new data is required. If it is significantly quicker than 950ns, there may be a decrease in lifespan from 30 years. However, unless you run intensive programs that delete and write new information to all memory cells every 32ns, you are not going to exceed the 10^15 P/E cycles in a year.
• #### TallestJon96 - Friday, July 31, 2015 - link

Excellent work. Anandtech always has the best information and reviews, even if they are the last.

This is pretty exciting stuff. If storage can become fast enough, then perhaps we will not need memory. Theoretically this would be a massive improvement to efficiency and performance. I would argue that the perfect computer would only have a processor and extremely fast storage. This is not enough to fill the gap, but storage is certainly catching up.

As a gamer, the idea of having my game loaded onto storage that is fast enough to not need to load into the memory is pretty appealing. Zero load time, no texture streaming issues, and potentially larger scale.

I have to wonder about bandwidth with this tech. Latency is clearly between ram and SSDs, but is closer to ram. But I haven't seen any solid bandwidth stats.
• #### Freakie - Friday, July 31, 2015 - link

In the article they mention that gamers already can by-pass slow NAND and HDD speeds by just creating a RAMDisk. If you have 32GB of RAM, you could take 8GB of it for your system memory, turn the other 24GB into a RAM disk, and put all of your game files onto it and then your games will load their resources at the speed of your RAM.

And DDR4 is coming down in price very quickly so it isn't such a crazy idea. The cheapest 32GB DDR4 kit I can find is \$176 which means 64GB will cost you \$350 for games that have 40GB of resources. While not incredibly cheap, it's also not totally unreasonable especially if you're already complaining about SSD's not loading game resources fast enough.
• #### Friendly0Fire - Saturday, August 1, 2015 - link

Sadly, 24GB is a bit short for modern games and 8GB for the OS and the game is also a bit on the low side. Games are finally taking advantage of 64-bit executables (and thus far larger memory cap) and it's showing up as a dramatic increase in asset size, both on disk and in memory.

64GB of RAM might get you there, but I think 32's on the short-ish side. 3D XPoint would side-step the issue by providing far more storage than contemporary games would likely need.
• #### lordken - Sunday, August 2, 2015 - link

As said by Friendly0Fir 24GB is unfortunately nothing today, many games today have 20-50GB disk requirments (not sure if devs are plain lazy to optimize or they really need that much space for stuff)
Plus dont forget that you need to first fetch data into ramdisk after boot, and wait it to flush it out before shutdown. So personally I would not bother with ramdisks, and probably load times doesnt solely depend on read time from storage only. On some games I didnt seen much difference between HDD and SSD load performance (which shows either bad game engine/coding or some other bottleneck, maybe my CPU).
And not to say leaving only 8GB for OS is really not that great.
• #### JKflipflop98 - Monday, August 3, 2015 - link

Not to mention it's a giant pain in the butt to have to create the ram drive, copy all the files over, and then create all the links needed to actually run the game. By the time you're done futzing around with all that crap, you've cost yourself 10x the time you've saved in loading screens.
• #### lordken - Sunday, August 2, 2015 - link

"This is pretty exciting stuff. If storage can become fast enough, then perhaps we will not need memory. "
imho this will "never" be true, RAM will always be faster, no matter how much you make storage faster you can still also improve RAM which in turn will always keep ahead of storage. Plus as shown in article it is much closer to CPU and thus better perf/latencies etc.

Maybe in case when Xpoint v3 reach performance level of DDR3/4 then diminishing returns could start to kick in , but still by that time we will probably have DDR5/6 or HBM3. So I think RAM will stick around, even if it could perhaps shift into CPU L4 like cache with HBM for example.