When AMD set about bringing their CPU/GPU fusion initiative to life with the Heterogeneous System Architecture (HSA) earlier this decade, one point AMD made early-on was that while they strongly believed in the need for heterogeneous computing and the performance gains it would offer, they would also be pragmatic. When you’re attempting to significantly alter the computing landscape and create architectures that can accommodate massively parallel workloads just as well as they can traditional serial workloads, it means you have to address not only the hardware end of the equation but the software end as well. In other words it means you need an ecosystem, and you cannot build an ecosystem of one.

To that end, in 2012 AMD founded the Heterogeneous System Architecture Foundation to serve as a neutral consortium to control the development of the HSA standard. Joined by co-founders ARM, Imagination, MediaTek, and others, the HSA Foundation would create the ecosystem necessary for HSA to succeed by creating a common standard for heterogeneous computing that multiple hardware vendors would implement. Software developers in turn could then write software targeting the HSA standard, allowing this software to work in a full heterogeneous manner across a wide range of hardware.

Meanwhile the fact that HSA was originally developed by AMD before being handed over to the HSA Foundation meant that while the major founders of the Foundation were all equals at the table, the timeline meant that the initial versions of HSA were to be spearheaded by AMD. AMD’s already-in-development hardware would be the first HSA-capable hardware to be released, and given the long lead times required for processor development, the other partners would be releasing their HSA designs later on, needing a full development cycle to integrate the technology. To that end the first pre-1.0 HSA hardware was 2014’s Kaveri APU, and the first 1.0 HSA hardware is AMD’s 2015 Carrizo APU.

Jumping to the present, a bit over three years since the HSA Foundation was formed, the HSA Foundation is now preparing for the next generation of HSA products. To that end, the Foundation is presenting an update on the state of HSA implementations at this year’s Linley Processor Conference. With the release of Carrizo putting HSA 1.0 into motion, ARM, Imagination, and MediaTek are now discussing their own HSA hardware release plans in greater detail. This is both to demonstrate their continued support for the standard as well as to offer some further detail in how the HSA ecosystem will work in the future with multiple vendors selling products and potentially IP from multiple vendors all in the same SoC.

ARM for their part is not yet announcing any specific IP or dates, but they are reiterating that they have HSA-capable IP in development. For ARM this means the company needs to develop both processor designs (CPU and GPU) that are compliant with HSA, but they also need to develop suitable a suitable interconnect that can comply with HSA’s memory sharing and coherency requirements, which in many ways is the core strength of the standard.

Meanwhile it’s interesting to note that although ARM isn’t at full HSA compliance quite yet, there’s a definite element of doing what they can within the capabilities of current hardware. Current ARM-based SoCs frequently already implement limited forms of heterogeneous computing, particularly when tapping the GPU for image processing (e.g. cameras). This is something ARM and other HSA members see as being an important stepping stone towards getting to full HSA compliance.

Meanwhile SoC designer MediaTek is similarly reiterating the presence of HSA-enabled SoCs in their roadmap. As one of ARM’s biggest customers, the plans of MediaTek and ARM go hand-in-hand – ARM supplies the IP, which MediaTek will then assemble into an SoC design – so at this stage MediaTek’s HSA plans essentially follow ARM’s HSA-complaint IP release schedule. In the meantime, like ARM, MediaTek is talking about how current SoCs and their more limited heterogonous capabilities are an important step on the way to full HSA.

The final HSA member taking part in the Foundation’s presentation is Imagination. Like ARM, Imagination is a pure IP provider, and along with AMD’s x86 and ARM’s ARM, they provide the third CPU architecture supporting HSA, MIPS. Imagination has HSA-capable designs in development both for their PowerVR GPUs and their I-class & P-class CPUs, further stating that “all” future designs for both lineups will support HSA. Imagination’s SoC interconnect/fabric is also being similarly updated to support the necessary memory coherency.

Meanwhile Imagination is also the only other participant putting an ETA on their HSA-capable products. For Imagination it will be a gradual rollout as older products reach the end of their lifetimes and are updated, but the company expects to be releasing their first HSA-capable IP designs in 2016. With roughly a year required to go from IP to hardware, this means we would start seeing the first HSA-capable Imagination-powered hardware in 2017, assuming designs for the CPU, GPU, and fabric are all released in 2016.

Imagination’s position as an IP provider better known for their GPUs than CPUs also brings up an interesting compatibility point that the Foundation is addressing, which is how HSA will work when multiple vendors are providing IP for use in the same SoC. While a top-to-bottom ARM or Imagination solution is relatively straightforward, things get far more interesting mixing and matching, such as an ARM Cortex CPU with an Imagination PowerVR GPU, something that already occurs today with SoCs from multiple vendors. In short the HSA standard does account for this, and as long as all of the necessary components of an SoC are HSA-compliant, then it will be possible to put together an HSA-capable SoC mixing IP from multiple vendors.

Ultimately it is the Foundation’s goal to not just see HSA prosper, but to see it deployed from the bottom to the top, mobile devices right on up to supercomputers. With that said, the bulk of the HSA founders are mobile firms, and it’s entirely likely that outside of AMD’s APUs we’re going to see the mobile market take off first. Though with mobile platforms being the ultimate power-constrained platform – you don’t just need to be efficient to use power wisely, but you have a finite battery capacity to begin with – mobile may very well be the market that stands to gain the most from HSA in the first place.

Comments Locked

20 Comments

View All Comments

  • SleepyFE - Wednesday, October 7, 2015 - link

    If I'm not mistaken Zen will be a CPU without the GPU. Back to basics I guess. You will have to wait for the next generation to see the Zen core used in an APU.
  • Gigaplex - Wednesday, October 7, 2015 - link

    C++ AMP from Microsoft with help from AMD came out quite some time ago in preparation for HSA. Hopefully they invest in it more, with equivalent functionality coming to other languages.
  • name99 - Tuesday, October 6, 2015 - link

    "although ARM isn’t at full HSA compliance"

    What exactly does this mean? The Mediatek slide suggests that today ARM CPUs+GPUs do not share a single coherent address space. I was under the impress that, since forever,
    - SoCs provide a single address space for CPU and GPU
    - that address space is coherent (at least on the CPU side) in the sense that APIs like Metal don't require any sort of "flush the data structures the CPU has created" type calls.

    So what's missing?
    Is it coherence in the other direction (ie GPU can perform a computation, and CPU can trivially read it without the GPU having to perform an explicit flush)?
    And/or is it the ability to interrupt the GPU (thus allowing for time-sharing, and for GPU computation to take arbitrarily long, rather than having to be broken up into chunks of certain maximum duration)?
  • SleepyFE - Tuesday, October 6, 2015 - link

    In SoC's as with integrated GPU's you assign a part of RAM to the GPU. From then on it belongs to the GPU and acts like the RAM on a dGPU. The CPU can't play with it and any data needed by it has to be copied into the CPU RAM. HSA aim's to eliminate that but aside from AMD noone did it yet. As the article says they need the right interconnect for that.
  • Ryan Smith - Tuesday, October 6, 2015 - link

    To add to that, you need the following.

    Shared virtual memory
    Cache coherency
    The ability for a processor to schedule work on another processor
    A software layer to compile HSAIL down to your native instruction set
  • CiccioB - Wednesday, October 7, 2015 - link

    Seen that on SoCs CPU and GPU share the same memory controller (same bus, same memory pool), all the limitations seems into MMU and OS rather than needing added HW resources like new buses.
    Cache coherency is needed only for L1 cache on CPU and GPU, but again that can be solved through the use of the same MC.
    Or I am missing something?
  • name99 - Tuesday, October 6, 2015 - link

    I assumed other ARM SoCs were like Apple's SoCs. Apparently not.
    As far as I can tell, Apple SoCs ARE like I described. To quote the Metal documents:
    "Resources allocated with the shared storage mode are stored in memory that is accessible to both the CPU and the GPU. "
    (The shared storage mode is default for iOS.)

    Again, as far as I can tell, the reason Metal took an additional year to move to OSX is precisely the split memory model for dGPUs on OSX, and the changes that were necessary to the Metal driver (along with API additions like describing memory as shared [old style Metal, default for iOS] vs managed vs private [alternative memory models, both for OSX, private can be used for some purposes on iOS, but is not necessary there].

    The primary point, as far as I and other knowledgeable observers can tell, of the large (but high latency) L3 on the Apple SoCs is not so much to serve as further out larger cache, but to serve as the coherence point between the CPU and the GPU.

    So, to reiterate, and to build on what Ryan said below. As far as I can tell Apple HAVE
    - shared VM
    - coherency (at least one way, CPU to GPU; possibly not the other GPU to CPU)
    - The extent to which the GPUs are controllable by the OS as traditional virtualizable processors (the point I covered by talking about interrupts) I remain unclear on. I don't remember ever seeing anything about limits to how long a Metal computation kernel can run for, which suggests (but does not prove) that the OS can interrupt a kernel.

    - HSAIL compliance is by far the least interesting aspect of this. What matters is the concepts of the technology, not a particular implementation. It's like being interest in 64-bit computing taking off, rather than insisting that the only 64-bit ISA that matters is x64.
  • extide - Wednesday, October 7, 2015 - link

    It is truly exciting to see that other companies are totally on-board with this. I mean reading this article was like reading an example of an engineer's dream solution, one single standard, multiple companies on-board, and the fact that you can even mix and match IP from different vendors and still have it be HSA compliant is just icing on the cake! Hat's off to AMD for doing this the right way.

    On another thought, any word from Intel on this? I don't see their logo in the chart above, but they would seem like a likely candidate, I mean they do build cpu/gpu's and from what I understand they do actually support SOME HSA features, but they haven't really made a big deal about them publicly.
  • Ktracho - Friday, October 9, 2015 - link

    Also, how is HSA different (other than being supported by multiple companies) from what is already available with CUDA 7.5 (e.g., Intel CPU + NVIDIA Maxwell GPU)?
  • jowsjows22 - Wednesday, June 22, 2016 - link

    My children required a form last year and located a business that has a ton of fillable forms . If others require it too , here's http://goo.gl/PJtmFv

Log in

Don't have an account? Sign up now