What's new in AMD Accelerated Parallel Processing SDK 3.0.130.135

Oct 31, 2015
  • AMD APP SDK 3.0 supports OpenCL 2.0 with samples highlighting the new features and benefits of OpenCL 2.0 – the latest compute API standard from Khronos. The SDK also includes samples for accelerated libraries such as the Open Source C++ template library called “Bolt” and the OpenCL accelerated OpenCV (Open Computer Vision) library. This release supports Catalyst Omega 15.7 driver.

New in AMD Accelerated Parallel Processing SDK 2.8 (Jul 9, 2013)

  • Key features supported in SDK 2.8 and the Catalyst 12.8 drivers include:
  • Support for Direct3D 11 sharing.
  • Performance improvement in the OpenCL GPU run-time.
  • Performance improvement in the OpenCL CPU run-time.
  • Support added for 64-bit atomics.
  • Asynchronous dispatch support added for Radeon HD 7000-series devices.
  • Support for LLVM 3.0.
  • Support added for bus-addressable memory.
  • 2D max image size increased to 16k*16k.
  • UVD performance improvements.
  • Samples are now included for BOLT, Aparapi, and C++ AMP.
  • Bolt data parallel primitives C++ template library, preview release for OpenCL
  • Preview includes the following templates:
  • Scan
  • Sort
  • Reduce
  • Transform
  • Transform reduce
  • Counting
  • The CodeXL beta integrated hybrid computing (HC) tool suite is recommended for use with AMD APP SDK and can be downloaded here. Features include:
  • Unified profiling and debugging on AMD CPUs/APUs/GPUs.
  • New user interface with unified look-and-feel, available as a stand-alone Windows or Linux application, and as an extension for Microsoft Visual Studio 2010 on Windows.
  • GPU Debugger:
  • GPU debugging includes OpenCL kernel and API calls.
  • Single-step debugging and breakpoints on a single GPU (running display).
  • Trace API call history (combined C/C++ and OpenCL).
  • Multi-watch view of memory, variable values/types.
  • CPU Profiler:
  • Profiling suite to identify, investigate, and tune application performance on AMD CPUs.
  • Precise application hot-spot analysis with time-based, event-based, and instructionbased sampling.
  • Call graph profiling, which is a butterfly view of function calls.
  • OpenCL source profiling for CPU.
  • GPU Profiler:
  • OpenCL and DirectCompute profiler for AMD Radeon GPUs.
  • API trace, kernel occupancy, and hot spots analysis.
  • Comprehensive visualization: timeline and summary views of host, kernel, and data
  • transfers.
  • Supports GPU performance counters.
  • Static Analyzer:
  • Static analysis of OpenCL or IL kernels for AMD Radeon GPUs, without having to run
  • the application on actual hardware.
  • Estimate accurate performance of OpenCL kernels.
  • View statistics and disassembly of the generated hardware kernel.
  • APP Math Libraries (APPML) 1.8 new features include:
  • FFT:
  • Real-to-complex.
  • Public interfaces for transpose function.
  • BLAS:
  • Completed support for all BLAS L2 and 3 functions by adding:
  • GBMV
  • C/Z-HBMV
  • S/D-SBMV
  • TBMV
  • TBSV
  • Modified error return codes

New in AMD Accelerated Parallel Processing SDK 2.3 (Jan 27, 2011)

  • Increased size of staging buffers.
  • Enhanced Binary Image Format (BIF).
  • Support for UVD video hardware component through OpenCL (Windows 7).
  • Support for AMD E-Series and C-Series platforms (AMD Fusion APUs).
  • Support for Northern Islands family of devices.
  • Support for AMD Radeon HD 6310 and AMD Radeon 6250 devices.
  • Support for OpenCL math libraries: FFT and BLAS-3, available for download at AMD Accelerated Parallel Processing Math Libraries.
  • Preview feature: An optimization pragma for unrolling loops.
  • Preview feature: Support for CPU/X86 image. This enables the support for Image formats, as described in the Khronos specification for OpenCL, to be run on the x86 CPU. It is enabled by the following environment variable in your application: CPU_IMAGE_SUPPORT.
  • Optimized Histogram sample.
  • Added makefiles and implemented changes in samples code to compile with Intel compiler, MinGW + GCC, and MinGWx64 + GCC.
  • Various OpenCL compiler and runtime fixes and enhancements
  • Various samples fixes and enhancements
  • Improved OpenCL runtime performance:
  • Improved kernel launch times.
  • Improved PCIe transfer times.
  • Enabled DRMDMA for the ATI Radeon 5000 Series and AMD Radeon 6800 GPUs that are specified in the Supported Devices.
  • AMD APP Profiler 2.1 enhancements:
  • Timeline visualization and API Trace.
  • Support for 64-bit profiling.
  • Support for Linux (command line version).
  • Support for profiling DirectCompute applications.
  • AMD APP KernelAnalyzer 1.7 enhancements:
  • Support for Northern Islands devices.
  • Support for Catalyst 10.9 to 10.12.
  • New samples:
  • MonteCarloAsianDP
  • DeviceFission
  • FluidSimulation2D

New in AMD Accelerated Parallel Processing SDK 2.2 (Aug 12, 2010)

  • Support for OpenCL 1.1 specification:
  • Please see the OpenCL 1.1 specification for more information about this feature.
  • Support for Ubuntu 10.04 and Red Hat Enterprise Linux 5.5.
  • Support for X86 CPUs with SSE2.x or later (Adds to existing support for X86 CPUs with SSE3.x or later).
  • Support for Microsoft Visual Studio 2010 Professional Edition and Minimalist GNU for Windows (MinGW) [GCC 4.4].
  • Support for GNU Compiler Collection (GCC) 4.1 or later on Linux systems (Adds to existing support for GCC 4.3 or later).
  • Support for single-channel OpenCL image format.
  • Support for OpenCL / DirectX 10 interoperability.
  • Please see this Khronos OpenCL extension registry entry for more information about this extension.
  • Support for additional double-precision floating point routines in OpenCL C kernels.
  • Please see chapter A, section 9 of the ATI Stream SDK OpenCL Programming Guide for more information about the additional routines added in this release.
  • Support for generating and loading binary OpenCL kernels.
  • Please see this knowledge base article for more information about this feature.
  • Support for native OpenCL kernels.
  • Preview Feature: Support for accessing additional physical memory on the GPU from OpenCL applications. 3,7
  • Please see this knowledge base article for more information about this feature.
  • Preview Feature: Support for printf() in OpenCL C kernels.
  • Please see chapter A, section 8 of the ATI Stream SDK OpenCL Programming [b]Guide for more information about this extension.
  • Extension: Support for additional event states when registering event callbacks in OpenCL 1.1.
  • Please see chapter A, section 8 of the ATI Stream SDK OpenCL Programming [b]Guide for more information about this extension.
  • Additional OpenCL samples:
  • ConstantBandwidth (under cl/MicroBenchmarks)
  • GlobalMemoryBandwidth (under cl/MicroBenchmarks)
  • ImageBandwidth (under cl/MicroBenchmarks)
  • LDSBandwidth (under cl/MicroBenchmarks)
  • MemoryOptimizations
  • PCIeBandwidth (under cl/MicroBenchmarks)
  • SimpleDX10
  • SimpleMultiDevice
  • Package Update: ATI Stream Profiler 1.4.
  • Various OpenCL compiler and runtime fixes and enhancements (see developer release notes for more details).
  • Expanded OpenCL performance optimization guidelines in the ATI Stream SDK OpenCL Programming Guide, including:
  • Global memory optimizations
  • LDS optimizations
  • Register and LDS impact on number of active wavefronts
  • Load-balancing across multiple OpenCL devices
  • Instruction bandwidths
  • Key cache sizes and bandwidths for "Evergreen" GPUs