<?xml version="1.0" encoding="utf-8"?>

<rss version="2.0">
	<channel>
		<title>Anand Lal Shimpi's Weblog</title>
	    <link>http://www.anandtech.com/weblog</link> 
	    <description>The personal weblog for Anand Lal Shimpi</description>
    	<language>en-us</language>
		<copyright>Copyright 2008 - Anand Lal Shimpi</copyright>
		<pubDate>Fri, 16 May 2008 07:58:29 EDT</pubDate>
		
		<item>
			<title>AMD's K10: a "dead" product or not?</title>
			<link>http://it.anandtech.com/weblog/default.aspx?bid=443</link>
			<description><![CDATA[ <div>A few years ago it was fashionable to bash Intel's Pentium 4 as a braindead architecture. The fact that the Pentium 4 Northwood (533 MHz FSB) was the best performing processor from mid 2002 until late 2003 in many applications, and that the Pentium 4 Northwood remained competitive until early 2004 was conveniently forgotten: nuances do not make good headlines. <br />
</div>
<div>&nbsp;</div>
<div>It is now trendy to bash AMD. One" PC doctor" at ZDNet goes as far to <a href="http://blogs.zdnet.com/hardware/?p=1811">say that</a>: <br />
</div>
<div>&nbsp;</div>
<div style="margin-left: 40px;">"When I look at <strong>AMD&#8217;s current product line,</strong> <strong>all I see is a forest of
deadness</strong>.&nbsp;Intel has products trump every category of products
going.&nbsp;Server, desktop, mobile, low-end, high-end, dual-core,
quad-core.&nbsp;Intel has all these markets stitched up." <br />
</div>
<div>&nbsp;</div>
<div>Nuances, who needs them when you can make&nbsp; a sensational headline? And indeed,<a href="http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3272"> the lastest desktop CPU articles</a> here at Anandtech show that Intel's midrange CPU have a significant lead over the fastest Phenom processors. <br />
</div>
<div>&nbsp;</div>
<div>Like any design, the K10 is a trade-off. And most trade-offs were made in favor of the applications in the server and HPC market, at the expense of games and other desktop applications. </div>
<div>&nbsp;</div>
<div>First take a look <a href="http://www.xbitlabs.com/articles/cpu/display/core2duo-e6420_5.html#sect0">at this page</a> which compares a Core 2 Duo 4400 (2 GHz, <strong>2 MB L2</strong> and 800 MHz FSB) with a slower 1.86 GHz Core 2 Duo E6320 (<strong>4 MB of L2</strong> and a 1066 MHz FSB). One thing is for sure: games prefer the larger L2 cache. Some of the games were up to 10% faster on the CPU which was clocked 7% lower but with twice the L2-cache.&nbsp; The fact that games prefer a 4 MB L2 is not going to change when you run it on a AMD CPU with integrated memory controller. A L2 can deliver the necessary data in 12-20 cycles, an IMC needs about 100 cycles. <br />
</div>
<div>&nbsp;&nbsp; <br />
</div>
<div>Now, take a look at the Cache architecture of AMD's K10/Barcelona. If your run a single threaded game on it, it gets a fast 512 KB L2-cache and after that a relatively slow (44-48 cycles!) 2MB L3. If you know that the same game can benefit from more than 2 MB cache, it is pretty clear that the 512 KB L2 is not going to cope, you'll end up using the L3 a lot. A dual threaded game might need a little less per thread, but the same problem will happen again: it needs to go to that slow L3 cache all too often. Run that same game on Intel Core CPU and each thread of your dual threaded game gets a low latency 4 MB (or 6 MB) L2. </div>
<div>&nbsp;</div>
<div>Now let us now imagine that we run 4 threads of an HPC workload on it. Each thread has a very limited number of instructions, which perfectly fit in each of the L2 caches. You get 4 threads which gets a total of 4x the bandwidth of L2. In case of Intel, each two threads have to share the available bandwidth of the L2. The amount of data is huge, so caching the data is hardly possible. The fast IMC does wonders for the K10 chip.Data that is shared between the 4 cores remains in the L3-cache and all L2 caches are kept coherent at a incredibly fast SRI.&nbsp; So your cache coherency overhead does not increase with the number of caches, it increases per socket. Going from 2 to 4 sockets means that you double the amount of cache coherency traffic. Compare that to the Intel platform where all L2 caches need to be kept coherent. <br />
</div>
<div>&nbsp;&nbsp; <br />
</div>
<div>It is just one example why we could never expect the K10 chip to be a super desktop chip. But how is Barcelona doing in the server world? Is it limited to an HPC niche market? Well, let us see what Intel thinks. First of all, where do most of&nbsp; the 45 nm chips go? Just a few weeks ago, <a href="http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3272&amp;p=4">Anand reported</a> that Intel had no intention of flooding the desktop with 45 nm Core 2 chips quickly. <br />
</div>
<div>&nbsp;</div>
<div><img src="http://images.anandtech.com/iblog/ITgeneral/intelsupply.jpg" alt="" border="0" />&nbsp;</div>
<div>&nbsp;<br />
</div>
<div>Those 45 nm chips are going to the server market. Why? Several reasons. <br />
</div>
<div>&nbsp;</div>
<div>First of all, the server market might be only 20% of Intel's revenue. But look at this:</div>
<div>&nbsp;<blockquote>
<table border="0" cellpadding="2" cellspacing="2" width="500">
    <tbody>
        <tr>
            <td>CPU<br />
            </td>
            <td align="center">ASP <br />
            </td>
            <td align="center">Profit margin (estimate)<br />
            </td>
            <td align="center">Percentage of revenue&nbsp;</td>
        </tr>
        <tr>
            <td>&nbsp;Intel Server CPU</td>
            <td align="center">&nbsp;&gt;$400</td>
            <td align="center">&gt;$300 <br />
            </td>
            <td align="center">&nbsp;+/- 20%<br />
            </td>
        </tr>
        <tr>
            <td>&nbsp;AMD Server CPU<br />
            </td>
            <td align="center">$300-$400 <br />
            </td>
            <td align="center">$220-$330 <br />
            </td>
            <td align="center">&nbsp;+/- 16%<br />
            </td>
        </tr>
        <tr>
            <td>&nbsp;Intel Mobile/Desktop CPU<br />
            </td>
            <td align="center">$100 <br />
            </td>
            <td align="center">$40-$50 <br />
            </td>
            <td align="center">&nbsp;+/- 80%<br />
            </td>
        </tr>
        <tr>
            <td>&nbsp;AMD Mobile/Desktop CPU<br />
            </td>
            <td align="center">$50-65 <br />
            </td>
            <td align="center">$5-$30 <br />
            </td>
            <td align="center">&nbsp;&gt;80%</td>
        </tr>
    </tbody>
</table>
<div>&nbsp;</div>
</blockquote></div>
<div>Secondly, Intel needs <a href="http://www.intel.com/performance/server/xeon/hpcapp3.htm">those 45 nm to be competitive in the HPC market</a>.&nbsp; A 2 GHz Barcelona is capable of keeping up with the best 65 nm Xeons in those applications. <br />
</div>
<div>&nbsp;&nbsp;</div>
<div>It is pretty clear why AMD focused on the server market. Without a complete redesign it is not possible to beat Intel's&nbsp; integer crunching power and the fast and big L2-cache and that is exactly what a modern game needs. Barcelona built further on the K8 architecture and inherited the relatively inflexible integer pipeline. While Core 2 has sophisticated reordering of loads and stores, Barcelona does a limited reordering of loads. While Core 2 offers a 32 entry queue to the integer units, Barcelona has 3 rather inflexible separated 8 entry queues. </div>
<div>&nbsp;</div>
<div>So the right way forward for AMD was to focus on HPC and server applications where it could leverage it's strong points. We can bash AMD for being so late, and coming up with relatively low clocked CPUs, but even a 2.8 GHz Phenom would not have raise AMD's ASP significantly in the desktop market. <br />
</div>
<div>&nbsp;</div>
<div>We are almost done with our first round of quad socket benchmarking and we can tell you that we are having a lot more fun than Anand: it is a good old exciting fight between AMD and Intel. Don't believe us? Let <a href="http://www.intel.com/performance/server/xeon_mp/server.htm?iid=perf_serv_mp_sum+server">Intel do the talking again</a>:</div>
<div><br />
</div>
<div><img src="http://images.anandtech.com/iblog/ITgeneral/SAP.gif" alt="" border="0" />&nbsp;</div>
<div> </div>
<div><br />
</div>
<div>Yes, projecting the bad performance of the desktop chip to say that "AMD's products are a dead forest" is ... just silly.&nbsp; If you have missed the previous entries of our IT blog, just go to it.anandtech.com</div>
<div><br />
</div>
<div><br />
</div>
 ]]></description>
			<author>anand@anandtech.com (Anand Lal Shimpi)</author>
			<category></category>
			<pubDate>Mon, 12 May 2008 00:00:00 EDT</pubDate>
			
		</item>
		
		<item>
			<title>IBM: Sun Fire x4450 has the best performance per Watt</title>
			<link>http://it.anandtech.com/weblog/default.aspx?bid=436</link>
			<description><![CDATA[ <div>IBM has&nbsp; launched a very interesting product lately: the JS22 blade.&nbsp; The IBM JS22 blade server has two dual-core Power6 CPUs, clocked at an impressive 4 GHz. Even more interesting is the honesty of the IBM marketing team about the <a href="http://www.anandtech.com/tradeshows/showdoc.aspx?i=3106&amp;p=3">Sun x4450 server</a>. IBM took the Peak SpecInt2006 of several servers published on spec.org and made a price/performance (using the online price configurators at IBM, Sun and HP) and a performance divided by power (Source: online power calculators) comparison. <br />
</div>
<div>&nbsp;</div>
<div>Now look at the results at IBM's sales presentation:&nbsp;</div>
<div>&nbsp;</div>
<div><img alt="" src="http://images.anandtech.com/iblog/ITgeneral/JS22_x4450.gif" height="504" width="524" /> <br />
</div>
<div>&nbsp;</div>
<div> So according to IBM's (based on SPEC) numbers, Sun has a winner here. The calculated power numbers are probably a bit higher than the real ones. Also, the Sun Fire x4450 did use the power hungry Intel x7350 CPUs. If one would use the 7340, you would get less than 20% performance loss with probably 50% less power. So, the numbers above could get even better for Sun's latest server. IBM feels that "security, reliability, operations and less cabling" should convince buyers to go the blade JS22 route anyway. <br />
</div>
<div>&nbsp;</div>
<div><img src="http://images.anandtech.com/iblog/ITgeneral/IBMJs22ovw.jpg" alt="" border="0" />&nbsp;</div>
<div>&nbsp;</div>
<div>As we are currently testing the x4450, we are very curious about your opinion. Do you feel that density is important? With the exception of the UK, most public datacenters here in Western Europe are not charging a lot for rackspace. The reason is that as more and more larger companies have their own datacenter, public datacenters have a lot of unused rackspace. It really show the difference between reality and what the media is reporting. Most reports in the media are talking about how datacenters are running out of space and power. The latter seems to be true, but the space problems seems to be highly exaggerated. <br />
</div>
<div>&nbsp;</div>
<div> So what is on top of your checklist when you shop for a new server? Performance/Watt? Does "less cabling" make an&nbsp; impression on you? What about the rather vague statements like "reliability and operation"? Let us know. <br />
</div>
<div>&nbsp;</div>
<div>&nbsp;<br />
</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<br />
 ]]></description>
			<author>anand@anandtech.com (Anand Lal Shimpi)</author>
			<category></category>
			<pubDate>Mon, 28 Apr 2008 00:00:00 EDT</pubDate>
			
		</item>
		
		<item>
			<title>AMD back in the quad socket race</title>
			<link>http://it.anandtech.com/weblog/default.aspx?bid=426</link>
			<description><![CDATA[ <p>Finally. 7 months after the introduction of Intel's "Tigerton"
Xeon 73xx series, AMD has an answer to the quad socket, quad-core Intel platform.
The importance of the quad socket market cannot be understated for AMD. The quad
socket market is only 10% of the x86 server CPU market (shipments), but it accounts
for roughly 20% of the revenues! And it has been AMD's stronghold for years
now: at the moment, AMD still holds about 42% of this market. The 4P product line
is probably keeping AMD afloat....</p>
<p>AMD launches the B3 "no-TLB bug" Opterons
today, with clock speeds of 2.3GHz (8356), 2.2GHz (8354) and 2GHz (8350). Hotheaded
(125W) 2.5GHz and 2.4GHz Special Editions will follow. We are preparing a full
AMD vs. Intel 16-core benchmark fest, but the boards and servers that will house
our Opteron 8356 CPUs still haven't arrived.</p>
<div align="center"><img alt="" src="http://images.anandtech.com/iblog/ITgeneral/Opteron8356.jpg" height="644" width="550" /></div>
<br />
<p>Let us take a look at Intel's and AMD's 1K pricing:</p>
<style>
.jwfont
{
font-size:10.0pt;
font-weight:400;
font-family:Arial;
text-align:left;
}
.contentwhite {
font-size:12.0pt;
font-weight:400;
font-family:Arial;
color: #ffffff;
}
</style>
<table align="center" border="1" cellpadding="2" cellspacing="0" width="550">
    <tbody>
        <tr bgcolor="#016a96">
            <td colspan="4" style="text-align: center;" class="contentwhite"><strong>Server CPU
            Pricing</strong></td>
        </tr>
        <tr style="text-align: left;" bgcolor="#cccccc">
            <td><font class="jwfont"><strong>CPU</strong></font></td>
            <td><font class="jwfont"><strong>Price</strong></font></td>
            <td><font class="jwfont"><strong>Intel
            CPU</strong></font></td>
            <td><font class="jwfont"><strong>Price</strong></font></td>
        </tr>
        <tr style="text-align: left;" bgcolor="#f7f7f7">
            <td><font class="jwfont">Opteron 8360
            SE 2.5GHz<br />
            (125W, 4x0.5 MB L2 + 2MB L3)</font></td>
            <td><font class="jwfont">$2149</font></td>
            <td><font class="jwfont">Xeon X7350
            2.93GHz<br />
            (130W, 2x4MB L2)</font></td>
            <td><font class="jwfont">$2301</font></td>
        </tr>
        <tr style="text-align: left;" bgcolor="#eeeeee">
            <td><font class="jwfont">Opteron 8358
            SE 2.4GHz<br />
            (125W, 4x0.5 MB L2 + 2MB L3)</font></td>
            <td><font class="jwfont">$1865</font></td>
            <td><font class="jwfont">Xeon X7340
            2.4GHz<br />
            (80W, 2x4MB L2)</font></td>
            <td><font class="jwfont">$1980</font></td>
        </tr>
        <tr style="text-align: left;" bgcolor="#f7f7f7">
            <td><font class="jwfont">Opteron 8356
            2.3GHz<br />
            (95W, 4x0.5 MB L2 + 2MB L3)</font></td>
            <td><font class="jwfont">$1514</font></td>
            <td><font class="jwfont">Xeon X7330
            2.4GHz<br />
            (80W, 2x3 MB L2)</font></td>
            <td><font class="jwfont">$1391</font></td>
        </tr>
        <tr style="text-align: left;" bgcolor="#eeeeee">
            <td><font class="jwfont">Opteron 8354
            2.2GHz<br />
            (95W, 4x0.5 MB L2 + 2MB L3)</font></td>
            <td><font class="jwfont">$1165</font></td>
            <td><font class="jwfont">Xeon X7310
            2.13GHz<br />
            (80W, 2x2 MB L2)</font></td>
            <td><font class="jwfont">$1177</font></td>
        </tr>
        <tr style="text-align: left;" bgcolor="#f7f7f7">
            <td><font class="jwfont">Opteron 8350
            2.0GHz<br />
            (95W, 4x0.5 MB L2 + 2MB L3)</font></td>
            <td><font class="jwfont">$873</font></td>
            <td><font class="jwfont">Xeon X7310
            1.6GHz<br />
            (80W, 2x2 MB L2)</font></td>
            <td><font class="jwfont">$856</font></td>
        </tr>
    </tbody>
</table>
<br />
<p>The Opteron 8354 and 8350 look like the most
competitive offerings; they have a small clock speed advantage over the
comparable Intel CPUs and about the same amount of cache. As we have discussed
in depth <a href="http://it.anandtech.com/IT/showdoc.aspx?i=3162" target="_blank">in our
2P Opteron 23xx versus Intel Xeon 54xx review</a>, quad-core
Intel is the best processor in all CPU intensive tasks (rendering, chess, SPECint,
financial simulations...). Meanwhile, the quad-core Opteron is best in some
memory and FP intensive workloads (many HPC applications).We don't expect anything
to change with the B3 <em>Barcelona</em> cores, but there are still
two question marks: who will win the server (OLTP, Warehouse) and <a href="http://it.anandtech.com/IT/showdoc.aspx?i=3263" target="_blank">virtualization
benchmarks</a>? We will find out in a few weeks.</p>
<p>AMD also launched their B3 23xx series, but frankly,
we are disappointed that AMD's fastest quad-core is still only at 2.3GHz; AMD promised
2.5GHz months ago! 2.5GHz really is necessary to be competitive with Intel, who
passed the 3GHz quad-core wall back in 2007. Even worse, Intel already has 50W
parts at 2.5GHz. AMD is in defensive mode in the 2P market, and its only remaining
weapon is aggressive pricing.</p>
<p>Things are looking better in the 4P market however.
AMD's platform scales better, at least until Intel's Nehalem arrives - and Xeon
"Nehalem" MP CPUs won't be available until 2009. In addition, AMD's
newest quad-core has to compete with Intel's 65nm CPUs that are limited to 2.4GHz
at 80W TDP for now. AMD has a narrow window to make a good impression in the
quad socket market, ramp up clock speeds, and prepare for Intel's <em>Dunnington</em>
in Q3. With up to 16MB L3 cache and six cores per die, Dunnington looks massive
- but perhaps also a bit expensive.</p>
<p>We're not the only ones that have noticed AMD most
likely has (we're not convinced until we see all our tests J)
a competitive quad socket CPU. HP is the most enthusiastic tier-one OEM with
two quad Opteron models:</p>
<ul>
    <li>A "classic" 4U HP <a href="http://h10010.www1.hp.com/wwpc/us/en/en/WF05a/15351-15351-3328412-241644-3328422-3646081.html">ProLiant DL585 G5</a></li>
    <li>A rather amazing quad socket HP <a href="http://h18004.www1.hp.com/products/servers/proliant-bl/c-class/685c-g5/specifications.html">ProLiant
    DL685 G5 blade</a></li>
</ul>
<p>The fast growing blade market seems to like the third
generation Opteron. As the 5000V chipset with DDR2 support is not available for
the Xeon Tigerton, the latter is a bit harder to cool in a cramped blade
environment. However, HP does have a quad Tigerton blade, the <a href="http://h18004.www1.hp.com/products/servers/proliant-bl/c-class/680c/specifications.html" target="_blank">HP ProLiant
DL680c G5 blade</a>.</p>
<p>Eight blades in a 10U blade chassis (two HDs per
blade) is not bad, but HPC specialist Supermicro does even better with 10 x 16
cores in a 7U enclosure (one HD per blade). Rackable, Appro, and Synnex also
launch their newest Opteron models today.</p>
<p>According to AMD and HP, The HP ProLiant DL585 G5
set a new performance record for 4-socket, x86-based systems in TPC-C
Price/tpmC, and the HP ProLiant BL685c G5 server set a new record for
SPECfp_rate2006. HP's 8356 system scored 147 baseline while the best Intel
based result is around 108. However, it should be noted that Specfp_rate 2006 exaggerates
the importance of memory bandwidth. SPECFP2006 already runs with a rather large
footprint, and if you run 16 instances in parallel....</p>
<p>Although the 83xx series will perform excellently
in HPC, we don't believe that the difference will be this large, even in memory
intensive applications. We definitely need some good independent benchmarking.
Stay tuned and add <a href="http://it.anandtech.com/" target="_blank">http://it.anandtech.com</a> to
your bookmarks!</p>
 ]]></description>
			<author>anand@anandtech.com (Anand Lal Shimpi)</author>
			<category></category>
			<pubDate>Wed, 09 Apr 2008 00:00:00 EDT</pubDate>
			
		</item>
		
		<item>
			<title>Server CPU news, march 2008</title>
			<link>http://it.anandtech.com/weblog/default.aspx?bid=422</link>
			<description><![CDATA[ <div><font face="Times New Roman" size="3"></font></div>
<div><strong>AMD's Shanghai</strong> ... <br />
</div>
<div>is really shaping up well. As I told you in my <a href="http://it.anandtech.com/IT/showdoc.aspx?i=3261">Cebit coverage</a>, <a href="http://it.anandtech.com/IT/showdoc.aspx?i=3261&amp;p=5">several people told us</a> that they have already been testing it. Shanghai is an evolutionary improvement over AMD's Barcelona, and includes several IPC improvements and a 6 instead of 2 MB L3-cache. In 2009, AMD plans to improve performance with a better IOMMU. A new RAS feature will also be available called "L3 Cache index disable". We could not get more information about this RAS feature that sounds more like a performance crippling than a RAS feature...<br />
</div>
<div>&nbsp;</div>
<div>According to IDC, AMD's overall market share in the server CPU space has not decreased in 2007 (about 13 procent). AMD's market share grew in the low budget 1 socket server market (from 9 to 14). It also increased slightly in the lucrative 4 socket market (from 37% to about 42%) but decreased significantly in the high volume 2 socket market (14 to 11%). <br />
</div>
<div>&nbsp;</div>
<div>AMD's <a href="http://it.anandtech.com/IT/showdoc.aspx?i=3162">third generation opteron</a>, now available in B3 revision will be launched in this quarter at 2.3 GHz, so slightly more conservative than the <a href="http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3272">newly launched Phenom</a> (2.4 GHz, 95W). A 125 W SE version (2360SE) at 2.5 GHz will be launched late this quarter. The low power version stays at 1.9 GHz, which is a bit disappointing... <br />
</div>
<div>&nbsp;</div>
<div><strong>Low Voltage Xeon</strong> <br />
</div>
<div>... As Intel launches the L5420, a low power Xeon at 2.5 GHz. This CPU consumes 50 W (TDP), less than 12.5W per core thus, and only 16W (4 W per core) when running idle. The CPU consumes as little power as the previous 65 nm L5335, but performs about 30% better in for example Povray, Sungard and Cinebench. Since Intel has introduced the 5100 chipset, AMD has lost the lower power consumption of DDR-2 too. It seems that AMD will lose performance/watt crown until Shanghai is up and running. &nbsp;</div>
<div>&nbsp;</div>
<div><strong>Nehalem</strong>&nbsp;</div>
<div><font face="Times New Roman" size="3"></font>Nehalem can access 3 memory channels, which can be run as independent or lockstep. Independent is of course the setting for best performance in almost all cases. <br />
</div>
<div>&nbsp;</div>
<div>But the most interesting news is the new TLB architecture of Nehalem. You might remember that we wrote that the TLB architecture can really make a difference when you run a lot of virtual machines on top of your serverCPU. Below you see the number of entries in the TLB. Between brackets is the size of the page. Remember that currently all 32 bit OS make use of 4 KB size, but that most 64 bit OS (Linux and windows) can use 4 KB or 2MB page size. 2MB will become more and more popular (see for example the Specjbb2005 submissions) for memory intensive applications.<br />
</div>
<div>&nbsp;</div>
<div><font face="Times New Roman" size="3"><span style="font-size: 12pt;" lang="EN-GB">
<table border="0" cellpadding="2" cellspacing="2" width="500">
    <tbody>
        <tr>
            <td>TLB Architecture&nbsp;</td>
            <td>AMD Barcelona <br />
            </td>
            <td>Intel Penryn <br />
            </td>
            <td>Intel Nehalem <br />
            </td>
        </tr>
        <tr>
            <td>&nbsp;L1- Instructions<br />
            </td>
            <td>
            <div>&nbsp;48&nbsp; (4KB)</div>
            <div>&nbsp;48&nbsp; (2MB) <br />
            </div>
            </td>
            <td><span style="font-size: 12pt;">
            <div>128 (4KB)</div>
            <div>&nbsp;&nbsp;&nbsp; 8&nbsp; (2MB)</div>
            </span></td>
            <td>&nbsp;?</td>
        </tr>
        <tr>
            <td>&nbsp;L1- Data<br />
            </td>
            <td>&nbsp;<span style="font-size: 12pt;">
            <div>&nbsp;48&nbsp; (4KB)</div>
            <div>&nbsp;48&nbsp; (2MB) </div>
            </span></td>
            <td>
            <div>&nbsp;16 (4 KB)</div>
            <div>&nbsp;16 (2 MB) <br />
            </div>
            </td>
            <td>&nbsp;?</td>
        </tr>
        <tr>
            <td>&nbsp;L2</td>
            <td>
            <div>&nbsp;512 (4 KB)</div>
            <div>&nbsp;128 (2 MB) <br />
            </div>
            <div>Data + instruc. <br />
            </div>
            </td>
            <td>
            <div>&nbsp;256 (4 KB)</div>
            <div>&nbsp;&nbsp; 32 (2 MB)</div>
            <div>Data only<br />
            </div>
            </td>
            <td>&nbsp;<span style="font-size: 12pt;">
            <div>&nbsp;512 (4 KB)</div>
            <div>&nbsp; 64? (2 MB) <br />
            </div>
            <div>Data + instruc. </div>
            </span></td>
        </tr>
    </tbody>
</table>
&nbsp;</span></font></div>
<div>It will be interesting to see what TLB architecture that AMD's 45 nm Opteron (Shanghai) will have. Remember that while the Penryn TLB's might be more than enough for running one machine, with <a href="http://it.anandtech.com/IT/showdoc.aspx?i=3263&amp;p=10">EPT or NPT</a> the TLB is split among a lot of virtual machines. VMWare and DELL for example report that on average no less than 8 virtual machines are run on top of their 2 socket servers. But 12 to 20 virtual machines per server are no exception. If the TLB is big enough, <a href="http://it.anandtech.com/IT/showdoc.aspx?i=3263&amp;p=10">NPT (also called RVI by AMD) and EPT</a> can offer up to 20% performance increase. </div>
 ]]></description>
			<author>anand@anandtech.com (Anand Lal Shimpi)</author>
			<category></category>
			<pubDate>Thu, 27 Mar 2008 00:00:00 EDT</pubDate>
			
		</item>
		
		<item>
			<title>The revenge of AMD Barcelona's TLB?</title>
			<link>http://it.anandtech.com/weblog/default.aspx?bid=415</link>
			<description><![CDATA[ <div><span>So far, the
TLBs of Barcelona have brought AMD quite a bit of bad press. But now that the problems
are fixed in revision B3, the TLB's might actually be one of
the main strong points of AMD's newest platform. To understand this, it helps that you read <a href="http://it.anandtech.com/IT/showdoc.aspx?i=3263">our latest IT article</a> :-).</span></div>
<div>&nbsp;</div>
<div>Barcelona<span> or AMD's K10 supports 4K, 2M and
1GB page sizes. 2MB pages are getting more popular (especially on Linux
servers) as it significantely reduces the memory management overhead. AMD's
TLB architecture:</span></div>
<ul style="margin-top: 0cm;" type="disc">
    <li><span>Low latency L1 TLB (Data and Instructions) 48
    entries, supporting all pagesizes</span></li>
    <li><span>L2 TLB (Data and Instructions):
    512 4k entries, or 128 2M entries</span></li>
</ul>
<p><span>If you
compare this with the Intel Penryn family:</span></p>
<ul style="margin-top: 0cm;" type="disc">
    <li><span>One instruction TLB: 128
    entries (4 KB) but only 8 entries for 2MB pages. </span></li>
    <li>The Data TLB has 2 levels:</li>
</ul>
<p style="margin-left: 72pt; text-indent: -18pt;"><span><span>&#8211;<span style="font-family: &quot;Times New Roman&quot;; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span></span>&nbsp;16 entries&nbsp;(4 KB)</p>
<p style="margin-left: 72pt; text-indent: -18pt;"><span><span>&#8211;<span style="font-family: &quot;Times New Roman&quot;; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span></span><span>256
entries (4 KB), but only 32 for larger pages(2 MB)</span></p>
<p><span>You can see
that AMD&#8217;s K10 family has really massive TLBs compared to the Penryn and
previous Intel CPUs, especially if you want to run with large pages. So while
this will certainly not affect anyone behind a desktop or mobile, it may well
have an impact in the serverworld. </span></p>
<p><span>VMWare&nbsp;3.5 does not yet support Nested Paging, it
will be present in an upcoming update. This kind of paging requires really
massive TLBs as the page tables of each guest OS are cached in the TLB. But
even with shadowpaging, having big TLBs should help when you have a lot of VMs
running.</span></p>
<p><span>We still
have to do quite a bit of benchmarking, but it is clear that the TLB
architecture of Barcelona
deserves some positive light too. It will be very interesting to see what kind
of TLB architecture Nehalem will have, as Nehalem will be the first to support
Intel&#8217;s Extended Page Tables (EPT, Intel&#8217;s version of Nested Pages).</span></p>
<p><span>It is
interesting to note that Nehalem has a NEW second level 512 entry TLB&#8230;</span></p>
 ]]></description>
			<author>anand@anandtech.com (Anand Lal Shimpi)</author>
			<category></category>
			<pubDate>Mon, 17 Mar 2008 11:00:00 EDT</pubDate>
			
		</item>
		
		<item>
			<title>10 Gbit Ethernet, the super I/O pipe for virtual servers? (WMWorld 2008)</title>
			<link>http://it.anandtech.com/weblog/default.aspx?bid=403</link>
			<description><![CDATA[ <div><span lang="EN-US">10 Gbit Ethernet just got even more attractive. You might remember from <a href="http://it.anandtech.com/IT/showdoc.aspx?i=3147">our last storage article that we have high hopes for iSCSI as a high performance</a> but a very cost effective shared storage solution for SME's. Our hopes were&nbsp; based on getting 10&nbsp; Gbit (10 GBase-T) on UTP Cat 6 (or even&nbsp; CAT5e) but unfortunately the only switch </span><span lang="EN-US">that I could find (thanks Ren&#233;e!) </span><span lang="EN-US">that supports 10&nbsp; Gbit this way&nbsp; was the <a href="http://www.smc.com/index.cfm?event=viewProduct&amp;localeCode=EN_USA&amp;cid=8&amp;scid=107&amp;pid=1646">SMC TigerSwitch 10G</a>. With pricing at&nbsp; $1000 per port,&nbsp; not really a budget friendly offering.</span></div>
<div><br />
</div>
<div>
<div>Still, 10 Gbit Ethernet is an incredibly interesting solution for a&nbsp; virtualized server or an iSCSI storage array that is serving data to a lot of (virtualized or not) servers. <br />
</div>
<div>&nbsp;</div>
<div><img height="362" alt="" src="http://images.anandtech.com/iblog/vmworld/Neterion_X3100_Series_Adapter.jpg" width="550" />&nbsp;</div>
<div>&nbsp;</div>
<div>So maybe it is best to give optic cabling another look. Some of the 10&nbsp; Gbit Ethernet&nbsp; NICs are getting quite cheap these days, but <span lang="EN-US">an enthousiastic Ravi Chalaka, Vice President of Neterion&nbsp; told us that&nbsp; it might be wise to invest a bit more in NICs with&nbsp; IOV (I/O virtualization) support. According to Neterion, the newest Neterion X3100 Series is the first adapter to support the new industry-standard, <strong>SR-IOV 1.0</strong> (Single-Root&nbsp; /O Virtualization.)&nbsp; SR-IOV is a PCI-SIG workgroup extension to PCIe. One of the features of such a&nbsp; NIC is that is has <strong>multiple channels</strong>&nbsp; that can accept&nbsp; multiple requests of&nbsp; virtualized servers, which significantly reduces&nbsp; the latency and overhead of&nbsp; multiple servers sharing the same network I/O. Even more important is that the Neterion X3100 is natively supported in VMWare ESX 3.5.<br />
</span></div>
<div><br />
</div>
<div><img height="421" alt="" src="http://images.anandtech.com/iblog/vmworld/iov.png" width="447" />&nbsp;</div>
<div>&nbsp;</div>
<div>We&nbsp; will test the Neterion X3100 in the coming months.&nbsp; It seems like a very promising product as Neterion claims : <span lang="EN-US"><o:p></o:p></span></div>
<div>&nbsp;</div>
<div>
<ul>
    <li>7 times more bandwidth <br />
    <li>50% less latency
    <li>40 % less TCP overhead<br />
    </li>
</ul>
</div>
<div>Than a comparable 1 Gbit solution. So while many of us are probably quite pleased with the bandwidth of 2 GBit (2x 1 Gbit MPIO), especially 50% lower latency sounds great for iSCSI. Fibre Channel, which is moving towards 8 GBit, might just have lost another advantage...<br />
</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
</div>
<p class="MsoNormal"><span lang="EN-US" style="font-size: 10pt; font-family: 'Courier New'">&nbsp;<o:p></o:p></span></p>
 ]]></description>
			<author>anand@anandtech.com (Anand Lal Shimpi)</author>
			<category></category>
			<pubDate>Sat, 01 Mar 2008 00:00:00 EDT</pubDate>
			
		</item>
		
		<item>
			<title>VMWorld 2008 live: in depth Disk I/O monitoring </title>
			<link>http://it.anandtech.com/weblog/default.aspx?bid=394</link>
			<description><![CDATA[ <div>In an absolutely brilliant and (of course!) purely technical session called "Disk Workload charactherization on ESX",&nbsp; Richard McDougall explained how you can solve disk I/O problems by capturing in depth monitoring data and analyzing it. &nbsp;</div>
<div>&nbsp;</div>
<div><img alt="" src="http://images.anandtech.com/iblog/vmworld/VM_diskstats.jpg" height="350" width="550" /><br />
&nbsp;</div>
<div>&nbsp;</div>
<div>Simply type the comment below in the ESX Console (COS):<strong><br />
</strong>
<div><strong>/usr/lib/vmware/bin/vscsiStats -h </strong></div>
<div>&nbsp;</div>
<div>and you can get a wealth of disk I/O stats. This vscsistat software seems to use a light weight probe that is part of the virtualized HBA drivers (Buslogic of LSI Logic). This of course implies that you are measuring at the level of one VM, you can not a complete overview of the usage of your storage arrays this way. It's purpose is to help you understand what the application running in a particular VM is doing. It also works if you use an iSCSI (or FC) iniator on the ESX hypervisor level as all blockcommands are still passing through the virtual HBA. If you use an iniator on your guest OS level, you won't get any stats however. <br />
<div>&nbsp;</div>
</div>
<div>The following data can be gathered:&nbsp;</div>
<div>
<ul>
    <li>I/O block size</li>
    <li>latency</li>
    <li>Seek distance/spatial locality</li>
    <li>
    Outstanding I/Os (queues)</li>
    <li>I/O arrival time/ Latency<br />
    </li>
</ul>
<div>&nbsp;&nbsp;</div>
<div>In other words everything you need to know to get a very good idea about your application's disk access profile.&nbsp; <br />
</div>
You can get histograms of latency, I/O blocksizes and so on. Unless you&nbsp; perform a full trace, Richard assured us that this is a very light weight process that can be run on production systems with very little performance loss.<br />
<div><br />
</div>
<div>Knowing for example which blocksize your application uses most frequently can help a lot, as the blocksize of the guest should align with the blocksize of backend (your storage array). <br />
</div>
<div>&nbsp;</div>
<div>Another tip I picked up:Transactions (commit) on databases consist of update writes on the logs (latency!), while most databases then perform async writes to the actual database. So you need to place your logs on a low latency array, while your disk arrray where your database data resides just needs to have enough bandwidth and write cache memory.<br />
</div>
<br />
</div>
</div>
 ]]></description>
			<author>anand@anandtech.com (Anand Lal Shimpi)</author>
			<category></category>
			<pubDate>Wed, 27 Feb 2008 14:00:00 EDT</pubDate>
			
		</item>
		
		<item>
			<title>VMWorld 2008 live: VMWare vService demonstrated</title>
			<link>http://it.anandtech.com/weblog/default.aspx?bid=392</link>
			<description><![CDATA[ <div>Dr. Mendel Rosenblum, chief scientist and co-founder of VMWare, demonstrated the newest form of virtualization: vService. The ideas is to package several virtual machines together, which together offer a certain service, like for example a CRM application. These kind of applications consist of a DB server, a webserver, the actual business logic server and a VPN access application. Packaging all this together in one file is a lot more convenient than installing several OSes upon which you have to install every component separately. <br />
</div>
<div>&nbsp;</div>
<div><img alt="" src="http://images.anandtech.com/iblog/vmworld/vservice.jpg" height="427" width="550" />&nbsp;</div>
<div><br />
</div>
<div>It is basically works as follows: you download the complete vService file. This file is based on the OVF (open virtual machine) format and is thus virtualized. Next you start up an installation script which allows you to customize the complete multi-tier application. <br />
</div>
<div>&nbsp;</div>
<div><img alt="" src="http://images.anandtech.com/iblog/vmworld/Vser_demo.jpg" height="392" width="550" />&nbsp;</div>
<div>
<div><br />
</div>
<div>The vService looks like one virtual machine in virtual center but it is in fact several virtual machines packaged together. After a few minutes of configuration, the complete CRM application was ready to go: </div>
<div><br />
</div>
<div><img alt="" src="http://images.anandtech.com/iblog/vmworld/vservice_3.jpg" height="414" width="550" />&nbsp;</div>
<br />
In the same keynote, Rosenblum announced "continous availability" (record an replay) which should improve the current rather flaky (in our experience) HA and a "site recovery manager" for those looking for disaster recovery over several datacenters. <br />
<br />
</div>
<div><br />
</div>
<div>&nbsp;</div>
<br />
 ]]></description>
			<author>anand@anandtech.com (Anand Lal Shimpi)</author>
			<category></category>
			<pubDate>Wed, 27 Feb 2008 00:00:00 EDT</pubDate>
			
		</item>
		
		<item>
			<title>A new home for the professionals: the Anandtech IT portal</title>
			<link>http://it.anandtech.com/weblog/default.aspx?bid=387</link>
			<description><![CDATA[ <div align="center">
</div>
<div>Welcome to the new IT portal of Anandtech.com! A higher frequency of in depth IT articles, a thriving IT community who gives us excellent feedback and more exposure for our IT content are our goals in the coming months.<br />
</div>
&nbsp;
<div>As our colleagues at Anandtech.com are producing a flood of new articles on CE, gaming gear and mobile computing, the IT articles were simply drowning. And that is unfortunate, because there are so few places that give the IT professional hard benchmark data and in depth articles. Most IT related sites produces small news items. And the lionshare of those news items consist of the opinon of marketing manager X - basically a hidden advertisement - or CIO Y who boasts how he is driven by "business excellence". Boring!<br />
</div>
<div>&nbsp;</div>
<div>More than ever we'll bring the <span id="st" name="st" class="st">Anandtech</span> way of investigating new technology to the <span id="st" name="st" class="st">IT</span> world: let benchmarking, critical analyzes and first hand experience do the talking. <br />
</div>
<div><strong>&nbsp;</strong></div>
<div><strong>But we need your help!</strong> Analyzing and benchmarking IT related technology is incredibly complex and time consuming. Our current investigation into Virtualization has made that even more clear.</div>
<div>&nbsp;</div>
<div> With your feedback we can make our articles a lot better and smarter. The goal is a constant feedback loop: we&nbsp; launch a new article, you give us your feedback in the forums, we react to the best comments with a blog update and so on. Yes, the main purpose of putting the blog section on top is that we'll use the blogs to quickly react to good feedback.<br />
</div>
<div>&nbsp;</div>
<div>So register in our <a href="http://it.anandtech.com/forums/">IT forum</a> and let us make this work!</div>
<div>
<div align="center"><br />
</div>
<br />
</div>
<div>&nbsp;</div>
<div><br />
</div>
<br />
 ]]></description>
			<author>anand@anandtech.com (Anand Lal Shimpi)</author>
			<category></category>
			<pubDate>Wed, 20 Feb 2008 00:00:00 EDT</pubDate>
			
		</item>
		
	</channel>
</rss>

