The Samsung Exynos 7420 Deep Dive - Inside A Modern 14nm SoC

Name: The Samsung Exynos 7420 Deep Dive - Inside A Modern 14nm SoC
Item: The Samsung Exynos 7420 Deep Dive - Inside A Modern 14nm SoC
Author: Andrei Frumusanu

by Andrei Frumusanu on June 29, 2015 6:00 AM EST

114 Comments | Add A Comment

114 Comments

We move on to our benchmarking sections with the CPU’s performance and power consumption. We have already extensively covered ARM’s A5x CPU architectures in our detailed review of the Exynos 5433, and interested readers should definitely have a read of that piece if they want to get a good grasp of how ARM’s CPUs in the SoCs were designed. The Exynos 7420 is identical to the 5433 in terms of CPU configuration: We still have four A53 cores and four A57 cores connected by the CCI-400 interconnect. The only difference is in the clock speeds as Samsung now pushes the frequency slightly higher at 1.5GHz and 2.1GHz for the little and big clusters.

CPU Performance: 64-bit Processing

One interesting benchmark that we weren’t able to measure on the Exynos 5433 due it still coming with a 32-bit software stack was the AArch64 performance of the CPUs. To have a look at the impact 64-bit code has on the device we use SPECint2000 compiled for both 32 and 64bit targets on the Exynos 7420. The scores are estimated results and should in not be considered representative of the device’s performance and only show an architectural view of the CPUs performance.

Developed by the Standard Performance Evaluation Corporation, SPECint2000 is the integer component of their larger SPEC CPU2000 benchmark. Designed around the turn of the century, officially SPEC CPU2000 has been retired for PC processors, but mobile processors are roughly a decade behind their PC counterparts in performance. Keeping that in mind it still provides an excellent benchmark for today's mobile phones and allows us to do single-threaded architectural comparisons between the competing CPU designs out there. The scores we publish are only estimates and should not taken as officially validated numbers.

SPECint2000 base - Estimated Scores Little Cores
	Exynos 5433 (Cortex A53) AArch32	Exynos 7420 (Cortex A53) AArch32	Exynos 7420 (Cortex A53) AArch64	Exynos 7420 64 > 32 bit % Advantage
164.gzip	396	432	496	15%
175.vpr	272	290	283	-2%
176.gcc	597	674	2000	197%
181.mcf	291	300	248	-17%
186.crafty	448	492	343	-30%
197.parser	348	373	360	-3%
252.eon	935	1092	1354	24%
253.perlbmk	529	588	3000	410%
254.gap	544	611	1506	146%
255.vortex	529	552	627	14%
256.bzip2	362	395	426	8%
300.twolf	284	306	297	-3%

Starting off with the A53’s performance benefit (or deficit) for AArch64 code, we see a weird phenomenon as the 64-bit results not always outperform the 32-bit variant of the benchmark. Depending on the sub-test, we’re seeing the effect of having to work with 64-bit integers. Tests such as mcf or crafty visibly suffer from the move as the CPU internally has to deal with larger data sizes. There is increased pressure on the caches which slows down the computation speed in these tests. On the other hand, we have other sub-tests which show very large improvements such as gcc, perlbmk and gap as they are able to take advantage of 64-bit registers and other ISA changes for computational purposes. Running such pieces of code brings 2-4x the speedup on the A53 core.

SPECint2000 base - Estimated Scores Big Cores
	Apple A8 (Typhoon) AArch64	Exynos 5433 (Cortex A57) AArch32	Exynos 7420 (Cortex A57) AArch32	Exynos 7420 (Cortex A57) AArch64	Exynos 7420 64 > 32 bit % Advantage
164.gzip	842	813	909	927	2%
175.vpr	1228	1120	1129	1014	-10%
176.gcc	1810	1549	1617	2000	24%
181.mcf	1420	1192	1276	923	-28%
186.crafty	2021	1149	1282	990	-23%
197.parser	1129	841	904	895	-1%
252.eon	1933	2096	2280	2500	10%
253.perlbmk	1666	1258	1363	4000	193%
254.gap	1821	1466	1506	3437	128%
255.vortex	1716	1652	1596	1681	5%
256.bzip2	1234	1027	1102	1102	0%
300.twolf	1633	1260	1428	1875	31%

Moving on to the A57 numbers, we again see a similar scenario as the 64-bit vpr, mcf, and crafty show a significant performance downgrade compared to the 32-bit variants due to higher memory and cache pressure. Perlbmk and gap are again the largest benefactors of 64-bit register usage. While the performance boost for the gcc compiler test was significant for the A53 cores, the A57 cores come in at a less impressive but still respectable 28% performance boost.

Overall it’s interesting to see what kind of an impact AArch64 has on performance and it’s clear that the advantages are very architecture and use-case dependent. The two most negatively affected benchmarks were 181.mcf and 186.crafty. The former is based on a single-depot vehicle scheduling algorithm with almost exclusive integer arithmetic that doesn’t take advantage of 64-bit data-structures, so most of performance is wasted due to overhead.

The Galaxy S6 most-notably still employs a 32-bit native browser, and although I'm not sure if this was a deliberate decision or carry-over from existing firmwares, this may be a sign that it may not always be worth to switch over to AArch64 compiled applications.

Memory Latency and Performance

LPDDR4 is one of the major specification upgrades for many high-end 2015 SoCs and the 7420 is along with the Snapdragon 810 one of the first mobile SoCs to adopt the new technology. LPDDR4 doubles its operational frequency over LPDDR3, and the Exynos 7420 runs its memory at 1555MHz (3110MT/s). In terms of computational requirements, CPUs are more sensitive to latency while GPUs require more bandwidth to operate at the best efficiency. As a start, we’ll look at how memory latency has changed on the Exynos 7420. For this review I choose to present the results on a logarithmic scale to better depict the latency differences on the L1 and L2 caches.

The A53 cores don’t show any significant variation in the L1 and L2 results that exceeds the expected 15% difference due to the higher clock-speed of the Exynos 7420’s little cores. As transfer size grows beyond 256kB we see our benchmark leaving pre-fetching and caching on the L2 and hit main memory. Here the Exynos 7420 sees a rise in latency to 206ns over the 5433’s 191ns.

The change in main memory latency is also visible in the bandwidth results of the 7420’s little cores as transfer speeds overall drops on average 10% over what we’ve measured on the 5433.

The latency graphs for the big cores looks more interesting as we see a quite large difference in the L1 cache of the Exynos 7420. The new chip is able to offer a 76% improvement in the L1 latency when compared to the Exynos 5433, as the new SoC is able to hold a very steady 1.91ns versus an average 3.36ns on the predecessor A57 implementation. The frequency advantage of the 7420 comes in at only 10%, so Samsung definitely must have made some changes in the cache architecture as I was able to measure much more consistent latency and bandwidth results in our custom benchmark.

The bandwidth results on the L1 and L2 caches are equally significant: The L1 bandwidth improved on average by 89% while the L2 also saw a 46% increase over the Exynos 5433. NEON load instructions in particular seem have gotten a very large improvement as we’re able to measure a 2.4-3.1x bandwidth boost on the L2 and L1 caches compared to the Exynos 5433’s A57 cluster.

The latency and bandwidth differences are smaller when hitting main memory. The A57 cluster on the new chip actually does better than the 5433 as main memory latency slightly improves by 8ns to 172ns, which results in the same average 4% boost in memory bandwidth using various common access methods. The CPU's are certainly not limited by main memory as they're far from saturating the bus bandwidth on the CCI. As previously mentioned in the SoC layout section, Samsung chooses to limit the CCI to 532MHz instead of going higher to match DRAM speeds. This is contrary to other SoCs and Qualcomm's Snapdragon 810 which runs the CCI at up to 787MHz.

All in all, it seems Samsung may have done some optimizations on the A57 cores that manage to significantly improve their memory performance. One could reason that any performance improvements exceeding the 10% / 200MHz frequency boost, and not affected by possible AArch64 instruction set usage may be result of the higher on-core and cluster cache performance boost, and while that’s hard to verify, we see no other architectural difference between the 7420 and its predecessor.

Off-topic - Galaxy S6 Disassembly Process

Before I get into the power numbers and explain our methodology, I would like to take the opportunity to share my experience with dismantling the Galaxy S6 and getting access to the battery, as some readers and eventual device owners might be interested to hear about the feasibility of the battery swapping process. The by far most daunting process and time-consuming procedure is the removal of the glass back-cover.

The Gorilla Glass 4 piece is held in place by very heavy-duty glue surrounding the edges of the device. It’s basically required to have a very strong suction cup and at least a hair dryer if one doesn’t have access to a heat gun. I used a car's GPS mount for the suction cup as it provided a tight hold and also acted as a lever to pull on. The glue needs to reach a high temperature to soften up, and you might need to heat up (along the edge) the device until it’s no longer comfortable or possible to hold. One should have some plastic picks ready – I just cut up a plastic SIM-card holder into pieces to use them as picks. The initial prying should start at the bottom of the device opposite of the speaker. The process takes a lot of force before one is able to put the first pick in and it definitely not for the faint-of-heart. Slowly advancing along the edge of the device with repeated re-heating should get you to remove the glass cover from the main body.

Once the back cover is removed, the rest of the process is very easy as we’re just dealing with ordinary Phillips screws. After removing all visible screws one should apply moderate heat along the front edges of the display. While keeping pressure on the battery one lifts up the whole unibody frame of the device from the screen and motherboard assembly. For the normal Galaxy S6 the process is almost over as the battery is now in direct view and accessible, one can disconnect the connector and slowly and carefully pry it up from the sides to separate it from being glued on the display assembly. S6 Edge owners will require further removing of the motherboard as the battery connector wraps around to the back of the PCB.

Once the new battery is in place and properly connected, the re-assembly process becomes straightforward as it is just a reversal of the disassembly steps. One should make sure that the glue strips on the glass back cover don’t have ridges or overlapping pieces as it will cause the back cover to slightly stick out and no longer be level with the metal frame. Once the phone is back together, I would again recommend applying heat along the edges of the device while forcibly squeezing the back glass and whole assembly back in place.

Overall, the whole procedure of replacing the battery should take up to 30-40 minutes depending how much one struggles to remove the back glass. We’ll have to see how Samsung's new battery chemistry holds up after 1 year of constant usage and fast-charge cycles, but if required to swap out the battery it’s definitely a doable process if one manages to muster up the initial courage.

The Exynos 7420 - Inside a Modern SoC - Part 2 CPU Power Consumption

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

114 Comments

View All Comments

Andrei Frumusanu - Monday, June 29, 2015 - link
Frankly, I don't know. I tried to ask Samsung a similar question but they refused to comment on customer relations. Meizu so far seems to be the only major vendor consistently using Exynos parts but as to why we haven't seen other vendors adopt them can be attributed to anything going from pricing to volume availability. Only the companies themselves know the details of these contracts.
gnx - Monday, June 29, 2015 - link
Thanks! The SoC market is really strange.
id4andrei - Monday, June 29, 2015 - link
This is Samsung's chance to eat Qualcomm's lunch. Close down node manufacturing for others(including Apple) and be like Intel. Either use Exynos or be satisfied with inferior nodes from other fabs.
CiccioB - Monday, June 29, 2015 - link
And that meas start competing with PP only, like Intel did.
That is, if you force others to go to other foundries, you have to be sure you have the best one, or in case TMSC comes up with a better PP (like a 16+nm revision) you have just thrown all your customers to your fab competitors, making double damage (or total one). Or just think if Intel tomorrow suddenly opens to ARM customers in order to saturate it's now rusting 14nm machineries. Samsung would be in great trouble after that eventual (and IMHO stupid) move.
Investing in PP i really expensive and there are other foundries capable of doing so. Samsung can't be sure to always be the best one on the market. And invest tons of billions of dollar every year to make sure to be the number one (for SoC of course).
ZeDestructor - Wednesday, July 1, 2015 - link
Samsung is part of a common platform alliance/agreement with GloFo, so while they could lock down and close others out, GloFo would not, so there's little commercial benefit from doing so.

They could of course coerce GloFo into doing the same, but that lands them into hot water with regulatory watchdogs like the FTC regarding anti-competitive practices and collusion, which while Samsung wouldn't really mind (no, really), GloFo would.
eh_ch - Monday, June 29, 2015 - link
How will it take for Samsung's process to trickle down to AMD via GloFo? Could it bridge the efficiency gap to nvidia / Intel? Holding out hope that ATI/MD will be competitive once more.
eh_ch - Monday, June 29, 2015 - link
How long will it take, that is
Adding-Color - Monday, June 29, 2015 - link
No, AMD won't have a technology advantage to Nvidia on next gen GPUs, currently it looks like nvidia will choose Samsung for their next node, and as Samsung and GloFo jav some kind of alliance and share processes (glofo licenses some Samsung processes AFAIK, the technology should be very similar for both, yet AMD should have a small HBM advantage, they have better relations to hynix (and helped to develop HBM) than nvidia.
jjj - Monday, June 29, 2015 - link
There won't be a HBM advantage from a technological point of view, at best AMD could get slightly better pricing but even that is unlikely since Nvidia has much higher volume. The first gen HBM was late and both Nvidia and AMD had plenty of time to prepare for it.
As for the process, we don't really know what foundry each will use and what version of the process.On the GPU side both are more likely to go TSMC or use both. On the CPU side AMD will likely go GloFo but not this early version of the process and Intel might go 10nm not long after AMD has 14nm. On 10nm TSMC and Samsung do seem to be catching up with Intel but doubt AMD will have 10nm early.
fluxtatic - Tuesday, June 30, 2015 - link
Hell, at this point I'd be happy to see AMD at < 28nm

The Samsung Exynos 7420 Deep Dive - Inside A Modern 14nm SoC

CPU Performance: 64-bit Processing

Memory Latency and Performance

Off-topic - Galaxy S6 Disassembly Process

Post Your Comment

114 Comments

View All Comments

Andrei Frumusanu - Monday, June 29, 2015 - link

gnx - Monday, June 29, 2015 - link

id4andrei - Monday, June 29, 2015 - link

CiccioB - Monday, June 29, 2015 - link

ZeDestructor - Wednesday, July 1, 2015 - link

eh_ch - Monday, June 29, 2015 - link

eh_ch - Monday, June 29, 2015 - link

Adding-Color - Monday, June 29, 2015 - link

jjj - Monday, June 29, 2015 - link

fluxtatic - Tuesday, June 30, 2015 - link

Log in

Don't have an account? Sign up now