ispc Changelog

What's new in ispc 1.23.0

Feb 16, 2024
  • Language changes:
  • Improved const variables initialization:
  • Variables with const qualifiers can be initialized using the values of previously initialized const variables including arithmetic operations above them.
  • Enum values can be used as constants.
  • One can use the result of selection operator as lvalue now.
  • Compiler switches behavior:
  • --dump-file=<dir> forces now to dump the whole IR modules after each pass.
  • ISPC Runtime improvements:
  • Added ISPCRT_GPU_DRIVER environment variable that allows to choose the specific driver. If more than one supported GPU is present in the system, they may be managed by several GPU drivers. The user can select the GPU driver using this variable.
  • Infrastructure/build changes:
  • Removed the build dependency from llvm-dis.
  • Lock the time zone to UTS to fix build reproducibility.
  • Bug fixes:
  • Fixed ABI compatibility of bool types returned to C/C++ code.
  • Fixed build error when bison emulates POSIX Yacc.
  • Fixed target definition for neon-i16x8, sse2-i32x8 and ps5.
  • Fixed ICE when generating unwind info for aarch64 code on Windows.
  • Recommended versions of Runtime Dependencies when targeting GPU:
  • Linux:
  • Intel(R) Graphics Compute Runtime https://github.com/intel/compute-runtime/releases/tag/23.48.27912.11
  • Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.15.1
  • Threading Building Blocks (TBB)
  • Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™ available at https://dgpu-docs.intel.com/driver/installation.html
  • Windows:
  • Intel(R) Graphics Windows(R) DCH Drivers 31.0.101.5194_101.5252
  • https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html
  • Level Zero Loader
  • https://github.com/oneapi-src/level-zero/releases/tag/v1.15.1
  • OpenCL™ Offline Compiler (OCLOC)
  • https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
  • (this is needed for AoT compilation on Windows only)
  • Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core processor graphics
  • Components revisions used in GPU-enabled build:
  • KhronosGroup/SPIRV-LLVM-Translator@d1c69c33
  • intel/vc-intrinsics@b16218b8
  • oneapi-src/level-zero@ea5be99 (v1.15.1)
  • https://github.com/llvm/llvm-project/commit/7cbf1a25(llvmorg-16.0.6) + patches from llvm_patches folder

New in ispc 1.22.0 (Nov 18, 2023)

  • ISPC distribution changes:
  • ISPC binaries were compiled with LTO by Clang/LLVM toolchain on all supported platforms and architectures using superbuild. ISPC binaries got faster a few percent in average.
  • Examples were excluded from ISPC archives. They are placed alongside as separate archives ispc-examples-v1.22.0.zip and ispc-examples-v1.22.0.tar.gz.
  • Language changes:
  • Added support for template operators.
  • Revised the usage of function specifiers with templates. For more details please refer to Function Templates section of documentation.
  • Infrastructure changes:
  • Release built with LTO (except aarch64 Linux).
  • Supported building ISPC with LLVM 17 although GPU support wasn't tested.
  • New compiler switches:
  • --dwarf-version switch accepts DWARF 5 version.
  • --dwarf-version switch forces DWARF format debug info generation on Windows. It allows to debug ISPC code linked with MinGW generated code (#2129).
  • Bug fixes:
  • Fixed performance regression on GPU caused by missed memory effects for genx intrinsics declarations.
  • Fixed performance regression caused by change in the loop unswitch LLVM pass.
  • Fixed C compatibility of ISPC generated headers (#2650, #2652).
  • Added unwind table to ISPC generated functions for Windows targets. It fixed issues with incorrect backtrace during debugging and profiling (#2345, #1318).
  • Fixed emitted code for negate of short float vectors (#2628).
  • Fixed several issues that were related to the usage of bool in different cases (#2272, #2333, #2367, #2689).

New in ispc 1.22.0 Pre-release (Nov 17, 2023)

  • ISPC distribution changes:
  • ISPC binaries were compiled with LTO by Clang/LLVM toolchain on all supported platforms and architectures using superbuild. ISPC binaries got faster a few percent in average.
  • Examples were excluded from ISPC archives. They are placed alongside as separate archives ispc-examples-v1.22.0.zip and ispc-examples-v1.22.0.tar.gz.
  • Language changes:
  • Added support for template operators.
  • Revised the usage of function specifiers with templates. For more details please refer to Function Templates section of documentation.
  • Infrastructure changes:
  • Release built with LTO (except aarch64 Linux).
  • Supported building ISPC with LLVM 17 although GPU support wasn't tested.
  • New compiler switches:
  • --dwarf-version switch accepts DWARF 5 version.
  • --dwarf-version switch forces DWARF format debug info generation on Windows. It allows to debug ISPC code linked with MinGW generated code (#2129).
  • Bug fixes:
  • Fixed performance regression on GPU caused by missed memory effects for genx intrinsics declarations.
  • Fixed performance regression caused by change in the loop unswitch LLVM pass.
  • Fixed C compatibility of ISPC generated headers (#2650, #2652).
  • Added unwind table to ISPC generated functions for Windows targets. It fixed issues with incorrect backtrace during debugging and profiling (#2345, #1318).
  • Fixed emitted code for negate of short float vectors (#2628).
  • Fixed several issues that were related to the usage of bool in different cases (#2272, #2333, #2367, #2689).

New in ispc 1.22.0 Pre-release (Nov 15, 2023)

  • ISPC distribution changes:
  • ISPC binaries were compiled with LTO by Clang/LLVM toolchain on all supported platforms and architectures using
  • superbuild. ISPC binaries got faster a few percent in average.
  • Examples were excluded from ISPC archives. They are placed alongside as separate archives ispc-examples-v1.22.0.zip
  • and ispc-examples-v1.22.0.tar.gz.
  • Language changes:
  • Added support for template operators.
  • Revised the usage of function specifiers with templates. For more details please refer to Function Templates
  • section of documentation.
  • Infrastructure changes:
  • Release built with LTO (expect aarch64 Linux).
  • Supported building ISPC with LLVM 17.
  • New compiler switches:
  • --dwarf-version switch accepts DWARF 5 version.
  • --dwarf-version switch forces DWARF format debug info generation on Windows. It allows to debug ISPC code linked
  • with MinGW generated code.
  • Bug fixes:
  • Fixed performance regression caused by missed memory effects for genx intrinsics declarations.
  • Fixed performance regression caused by change in the loop unswitch LLVM pass.
  • Fixed C compatibility of ISPC generated headers.
  • Added unwind table to ISPC generated functions for Windows targets. It fixed issues with incorrect backtrace during
  • debugging and profiling.
  • Fixed emitted code for negate of short float vectors.
  • Fixed several issues that were related to the usage of bool in different cases.

New in ispc 1.21.1 (Oct 10, 2023)

  • A minor ISPC update with interop related fixes for ISPCRT needed to oneAPI Render Kit release.
  • This update contains only Linux oneAPI x86, macOS universal and Windows release
  • binaries. Use v1.21.0 binaries for other platforms.

New in ispc 1.21.0 (Aug 19, 2023)

  • Language changes:
  • Added support for function template specializations with explicit template arguments.
  • // Primary template
  • template <typename T, typename C> noinline int goo(T argGooOne, C argGooTwo);
  • // Specialization with explicit template arguments
  • template <> noinline int goo<int, float>(int argGooOne, float argGooTwo);
  • // Not supported yet: specialization with implicit template arguments (requires template arguments type deduction)
  • template <> noinline int goo(int argGooOne, float argGooTwo);
  • Modified behavior for signed integer overflow.
  • Now, in case of signed integer overflow, ispc will assume undefined behavior similar to C and C++. This change may cause compatibility issues. You can manage this behavior using the --[no-]wrap-signed-int compiler switch. The default behavior (before version 1.21.0) can be preserved by using --wrap-signed-int, which maintains defined wraparound behavior for signed integers, though it may limit some compiler optimizations.
  • New hardware support:
  • Added support of Intel Meteor Lake Xe-LPG graphics:
  • added two new ISPC targets: xelpg-x16 and xelpg-x8
  • added two new device names: mtl-m and mtl-p
  • Infrastructure changes:
  • ISPC now uses LLVM's new pass manager. Optimization pipeline was modified by introducing early LoopFullUnrollPass which matches ISPC unrolled loops with manually unrolled loops in many cases.
  • Introduced ISPC superbuild, which facilitates building ISPC with Xe dependencies (LLVM, L0, vc-intrinsics, SPIRV-Translator). It can generate an archive with dependencies or consume a pre-built archive to build ISPC only. It also enables generating LTO or LTO+PGO enabled builds of LLVM and ISPC.
  • Supported building ISPC with LLVM 16.
  • New compiler switches:
  • mcmodel switch, which accepts small and large values. The definition is similar to gcc/clang. When large model is used, it enables programs larger than 2Gb.
  • opt=disable-gathers and --opt=disable-scatters options, which disable generation of gathers and scatters instructions on platforms that support them (for performance experiments).
  • [no-]wrap-signed-int switches, which [does not] preserve(s) wrap-around behavior on signed integer overflow.
  • ISPC Runtime improvements:
  • Added ispcrtSetTaskingCallbacks to the ISPCRT API, allowing the override of default implementations of ISPCLaunch, ISPCAlloc, and ISPCSync.
  • Removed compile-time Level Zero dependency from ISPCRT, no longer necessary after the ISPCRT split into CPU and GPU parts.
  • Recommended versions of Runtime Dependencies when targeting GPU:
  • Linux:
  • Intel(R) Graphics Compute Runtime
  • https://github.com/intel/compute-runtime/releases/tag/23.22.26516.18
  • Level Zero Loader
  • https://github.com/oneapi-src/level-zero/releases/tag/v1.13.5
  • Threading Building Blocks (TBB)
  • Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™ available at https://dgpu-docs.intel.com/driver/installation.html
  • Windows:
  • Intel(R) Graphics Windows(R) DCH Drivers 31.0.101.4644
  • https://www.intel.com/content/www/us/en/download/726609/intel-arc-iris-xe-graphics-whql-windows.html
  • Level Zero Loader
  • https://github.com/oneapi-src/level-zero/releases/tag/v1.13.5
  • OpenCL™ Offline Compiler (OCLOC)
  • https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
  • (this is needed for AoT compilation on Windows only)
  • Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
  • processor graphics
  • Components revisions used in GPU-enabled build:
  • KhronosGroup/SPIRV-LLVM-Translator@e82ecc2
  • intel/vc-intrinsics@910db48
  • oneapi-src/level-zero@e1f09b4 (v1.13.5)
  • llvm/llvm-project@8dfdcc7 (llvmorg-15.0.7) +
  • patches from llvm_patches folder

New in ispc 1.20.0 (May 7, 2023)

  • ISPC release with compile time improvements, enhancements in the ISPC Runtime,
  • and a number of code generation fixes. The release is based on patched LLVM
  • 15.0.7.
  • ISPC distribution changes.
  • ISPC binaries got faster and smaller. ISPC binaries got smaller approximately
  • by 1/3 and a few percent faster. The distribution macOS now includes x86_64,
  • arm64 and Universal Binaries. On Linux a snap package with the latest ISPC is
  • available.

New in ispc 1.19.0 (Feb 28, 2023)

  • ISPC release with long-awaited function templates technical preview; new hardware support for 4th generation Intel® Xeon® Scalable (codename Sapphire Rapids) CPUs, Intel® Data Center GPU Max (codename Ponte Vecchio), and updated support for Intel® Arc™ GPUs; improved performance and compile time; an enhanced ISPC Runtime; a bunch of stability fixes and more. The release is based on patched LLVM 14.0.6.
  • Language changes:
  • Function templates support was introduced in ISPC and it's currently in technical preview, meaning that current language definition might change in future versions. For more details please refer to Function Templates section of documentation.
  • ISPC has got several other language changes needed for ISPC/SYCL interoperability (an experimental feature):
  • Support of __regcall attribute.
  • A new language construct invoke_sycl which is used to call SYCL function from ISPC. The function must be declared on ISPC side with extern "SYCL" __regcall qualifiers.
  • Support of extern "C" functions definitions.
  • New hardware support:
  • Targets for 4th generation Intel® Xeon® Scalable (codename Sapphire Rapids) CPUs were introduced: avx512spr-x4, avx512spr-x8,avx512spr-x16, avx512spr-x32, avx512spr-x64. The key difference with other AVX512 targets is native support for FP16.
  • New xehpc-x16/xehpc-x32 targets were added for Intel® Data Center GPU Max (codename Ponte Vecchio). A new pvc device name was introduced.
  • New device names acm-g10, acm-g11, and acm-g12 were added for Intel® Arc™ Graphics. The dg2 device name has been removed.
  • Support for Aarch64 targets was enabled on Windows.
  • ISPC Runtime:
  • A chunking allocator was introduced that can be enabled with ISPCRT_MEM_POOL (see details are here).
  • An API was added to link input modules through ispcrtStaticLinkModules (using linking on vISA level under the hood) and ispcrtDynamicLinkModules (using binary linking under the hood).
  • Support for creating multiple devices within a single context was added, and an API was added to get a function pointer from a module. It's also possible to construct ISPC RT objects from native handlers now.
  • ISPC RT verbose mode was added that can be enabled through ISPCRT_VERBOSE.
  • Performance:
  • There's a significant performance boost on Xe targets caused by updates in the ISPC optimization pipeline and the usage of the new spill-cost IGC finalizer function, which dramatically reduces spill size.
  • Utilities:
  • ISPC link mode has been introduced, allowing to link several LLVM bitcode or SPIR-V files and output the result as LLVM bitcode or SPIR-V. For example:
  • ispc link test_a.bc test_b.bc --emit-spirv -o test.spv
  • CMake utilities was improved, and support was added for building an ISPC GPU target from multiple ISPC files, linking them with ispc --link. An application's ISPC CMakeLists would look like this:
  • add_ispc_library(my_ispc_lib filea.ispc fileb.ispc)
  • ispc_target_include_directories(my_ispc_lib <some directory path>)
  • ispc_target_compile_definitions(my_ispc_lib -DMY_DEFINE=1)
  • add_ispc_library(my_ispc_kernel filec.ispc)
  • ispc_target_link_libraries(my_ispc_kernel my_ispc_lib)

New in ispc 1.18.0 (May 6, 2022)

  • An ISPC release with a bunch of stability and performance fixes, improvements for ISPC Runtime, and complete stdlib support for `float16` type. This release is based on patched LLVM 13.0.1.
  • `-E` switch was introduced to run preprocessor only. An old bug preventing the compiler to crash in case of preprocessor error was fixed and now the compiler will properly crash. As some users considered an old behavior convenient in some cases, `--ignore-preprocessor-errors` switch was introduced to maintain the old behavior.
  • Targets naming was changed for the targets with native masking support to drop "base type" from the naming scheme, the old naming is accepted for compatibility. This affected AVX512 target names, the new names are `avx512skx-x4`, `avx512skx-x8`, `avx512skx-x16`, `avx512skx-x32`, `avx512skx-x64`, and `avx512knl-x16`.
  • For debugging and for those, who are interested in understanding compiler internals, `--ast-dump` switch was introduced. The produced dump of AST (Abstract Syntax Tree) is intentionally made to look like clang AST dump for convenience.
  • Standard library gained full support for `float16` type. Note that it is fully supported only on the targets with native hardware support. On the other targets emulation is still not guaranteed but may work in some cases.
  • Performance on Xe targets was significantly improved in this release due to optimizations in ISPC and Vector Backend.
  • Among other fixes, it is worth mentioning the following:
  • Fixed a bug #1308 affecting multi-target compilation
  • A bunch of fixes to make it easier to build ISPC on FreeBSD, even though FreeBSD is not officially supported
  • Improvements for the ISPC Runtime in this release:
  • Flexible task system selection during build
  • Support of ISPCRT build separate from ISPC
  • Support of ISPCRT build for CPU only
  • Version check in CMake
  • New API to get the type of allocated memory (`ispcrtGetMemoryViewAllocType` and `ispcrtGetMemoryAllocType`)
  • New API for memory copy on device (`ispcrtCopyMemoryView`)
  • Support of device-only memory without corresponding application memory.

New in ispc 1.17.0 (Jan 15, 2022)

  • Improvements for CPU targets:
  • Performance improvements for `double` type on AVX512 targets - better use of gather/scatter instructions, 2-5x improvements for `rsqrt()` and `rcp()` standard library functions.
  • New `avx512skx-i32x4` target.
  • `aos_to_soa` and `soa_to_aos` performance improvements for `-x8` and `-x16' targets on CPU.
  • `--math-lib=svml` mode was fixed and extended - it requires Intel® C++
  • Compiler (`icc` or `icx`) to link the binary.
  • `zen1`, `zen2`, and `zen3` CPU definitions were added.
  • Added experimental support for PS5 platform.
  • ISPC language got experimental support for IEEE 754 half-precision data type - `float16`. Not all library functions are supported yet with this type. The key
  • Focus in this release was on hardware natively supporting this type.
  • This update includes breaking changes in compiler switches for Xe targets:
  • Graphics targets `genx-x8` and `genx-x16` were renamed to `gen9-x8` and `gen9-x16`.
  • Compiler architectures for graphics target were renamed from `genx32` and `genx64` to `xe32` and `xe64`.
  • Xe targets were renamed from uppercase to lowercase (so instead of SKL/TGLLP it is now skl/tgllp).
  • A new `--device` switch (which is an alias for the existing `--cpu` switch) was introduced. Now the recommended way to specify the required platform for CPU and GPU is: `--device=<platform>`
  • Also this release changes `export` and `task` functions definition on GPU. Now GPU kernel is ISPC `task` function only, `export` functions cannot be invoked from host (i.e. called from ISPC Runtime/L0 Runtime) anymore.
  • Functions are ready to be linked with and called from other GPU modules.
  • Currently, ISPC experimentally supports such interoperability with Explicit
  • SIMD SYCL* Extension (ESIMD).
  • New Xe targets were added:
  • `xelp-x8` and `xelp-x16`. XeLP refers to XeLP generation of hardware (TigerLake chips and alike).
  • `xehpg-x8` and `xehpg-x16`. XeHPG is the architecture name for the forthcoming Intel® Arc™ GPUs codename Alchemist..
  • GPU part has a bunch of stability, performance, and usability improvements
  • Including but not limited to `alloca()` with constant parameter support,
  • `assume()` support, improved performance for double math functions and integer
  • Division.
  • `ISPC Runtime` performance was improved several times by fixing the setting of local group size for kernels, using events as a synchronization mechanism, and utilizing HW compute and copy engines. There is also a new structure
  • `ISPCRTModuleOptions` to pass additional options to VC backend if needed.
  • Currently, `ISPCRTModuleOptions` allows setting of stack size for VC backend
  • Which is used to compile SPIR-V.

New in ispc 1.16.1 (Jul 16, 2021)

  • A minor ISPC update, which has a bug fix for [issue #2111](https://github.com/ispc/ispc/issues/2111) and
  • is based on patched version of LLVM 12.0.1.
  • The bug fix affects x86 targets only and shows up as incorrect code generation
  • for the sequence of `shuffle()` and `reduce_add()` stdlib functions.
  • If you are building `ispc` from the sources, note that the fix is implemented as
  • a patch for LLVM backend and LLVM must be built with this patch applied in order
  • for this fix to take an effect. Stock build of LLVM 12.0.1 will not contain this
  • bug fix.

New in ispc 1.16.0 (Jun 12, 2021)

  • An ISPC release with language extensions for performance fine tuning, cpu definitions for AlderLake and SapphireRapids targets, support for macOS ARM targets, and massive update of Intel GPUs support. Windows and Linux binaries in this release support both CPU and GPU targets, while macOS binary supports only CPU. This release is based on patched LLVM 12.0.0.
  • The language changes include the following:
  • The ability to directly call LLVM intrinsics from ISPC source. This should be handy for performance fine tuning and reaching the hardware instructions not yet covered by the standard library. Note that it is an experimental feature and is enabled only with --enable-llvm-intrinsics switch. Please refer to LLVM Intrinsic Functions section of the user manual for more details.
  • assume() optimization hint, which can be used for communicating assumptions to the optimizer. It will not lead to runtime check, unlike assert() calls. This is intended for optimizations like removing null pointer checks, removing loop reminders, communicating alignment information to the optimizer, and etc. Please refer to Compiler Optimization Hints section of the user manual for more details.
  • Support for stack memory allocations through alloca() calls.
  • trunc() standard library functions.
  • Changes for CPU targets:
  • CPU definitions for AlderLake and SapphireRapids were added: alderlake and sapphirerapids respectively.
  • CPU definition for Apple ARM chips were added: apple-a7, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14.
  • Support for macOS ARM targets was added.
  • Using GPU-enabled binaries you can build ISPC programs and run them on Intel(R) Core(tm) Processors with Gen9 graphics (formerly Skylake, Kaby Lake, Coffee Lake) and Gen12 graphics (TigerLake mobile CPU) using --target options (genx-x8 and genx-x16) and --cpu option for specifying particular platform (e.g. --cpu=TGLLP).
  • The main GPU feature of the current release is Windows support. There are also a bunch of stability and performance improvements. Here are some of them:
  • ISPC Runtime got support of unified shared memory and multi GPU. Also, there is a new TaskQueue::submit() method which allows to start executing, but don't wait for the completion.
  • Thread private memory was mapped to SVM in VC backend. It greatly improves stability of the current release. It may affect performance on Gen9 graphics but we do not expect any significant changes on Gen12.
  • L0 binary generation was reworked through libocloc. Supported on Linux only.

New in ispc 1.15.0 (Mar 10, 2021)

  • An ISPC release with several improvements for CPU and Beta support of Intel graphics hardware architectures. The binaries in this release include CPU versions for Windows, Linux, and macOS, and a GPU-enabled Linux binary, which supports
  • both CPU and GPU. CPU binaries are based on patched LLVM 11.0.0, GPU binary is based on patched LLVM 10.0.1.
  • CPU changes include:
  • New loop unroll pragmas: #pragma unroll and #pragma nounroll directives
  • provide loop unrolling optimization hints to the compiler. This pragma may be used
  • immediately before a loop statement. Currently, this functionality is limited to
  • uniform for and do-while.
  • More efficient packed_[load|store]_active() stdlib functions implementation
  • (up to 2.5x faster), which now supports 64 bit types.
  • New cpus: icelake-server, tigerlake , alderlake, sapphirerapids.
  • Several stability fixes related to SOA types, bool varying type initialization,
  • broken alignment information, type scoping.
  • Compile time improvements.
  • ISPC support was added to CMake 3.19 so now you can use the standard CMake approach to find ISPC on the system and use it in your build.
  • https://cmake.org/cmake/help/latest/release/3.19.html#languages
  • Using GPU-enabled Linux binary you can build ISPC programs and run them on Intel(R) Core(tm) Processors with Gen9 graphics (formerly Skylake, Kaby Lake, Coffee Lake) and Gen12 graphics (TigerLake mobile CPU) using --target options (genx-x8 and
  • genx-x16) and --cpu option for specifying particular platform (e.g. --cpu=TGLLP).
  • Stability and performance were significantly improved in this release. Here is the list of new features:
  • Initial support of ahead of time compilation to oneAPI Level Zero binary format using
  • emit-zebin switch. You can use this binary from ISPC Runtime by setting
  • ISPCRT_USE_ZEBIN env variable to 1. Please note that SPIR-V format is still a recommended and default way.
  • Initial function pointers implementation.
  • Global atomics support.
  • Double math functions support.
  • Memory functions support.
  • Reworked masking approach. We disabled genx hardware mask by default and use
  • a software mask by default.
  • Improved address spaces differentiation.
  • Initial debug support.
  • TGLLP (TigerLake mobile CPU) support (--cpu=TGLLP).

New in ispc 1.3.0 (Oct 15, 2012)

  • New targets:
  • This release provides "beta" support for compiling to Intel Xeon Phi (the "Many Integrated Core" arthiecture). See Http://ispc.github.com ispc.html#compiling-for-the-intel-xeon-phi-architecture for more details on this support.
  • This release also has an "avx1.1" target, which provides support for the new nstructions in the Intel Ivy Bridge microarchitecutre.
  • New language features:
  • The foreach_active statement allows iteration over the active program instances in a gang. (See http://ispc.github.com/ispc.html#iteration-over-active-program-instances-foreach-active)
  • foreach_unique allows iterating over subsets of program instances in a gang that share the same value of a variable. (See http://ispc.github.com/ispc.html#iteration-over-unique-elements-foreach-unique)
  • An "unmasked" function qualifier and statement in the language allow re-activating execution of all program instances in a gang. (See http://ispc.github.com/ispc.html#re-establishing-the-execution-mask
  • Standard library updates:
  • The seed_rng() function has been modified to take a "varying" seed value when a varying RNGState is being initialized.
  • An isnan() function has been added, to check for floating-point "not a number" values.
  • The float_to_srgb8() routine does high performance conversion of floating-point color values to SRGB8 format.
  • Other changes:
  • A number of bugfixes have been made for compiler crashes with malformed programs.
  • Floating-point comparisons are now "unordered", so that any comparison where one of the operands is a "not a number" value returns false. (This matches standard IEEE floating-point behavior.)
  • The code generated for 'break' statements in "varying" loops has been improved for some common cases.
  • Compile time and compiler memory use have both been improved, particularly for large input programs.
  • A number of bugs have been fixed in the debugging information generated
  • by the compiler when the "-g" command-line flag is used.

New in ispc 1.2.2 (Oct 15, 2012)

  • This release includes a number of small additions to functionality and a number of bugfixes. New functionality includes:
  • It's now possible to forward declare structures as in C/C++: "struct Foo;". After such a declaration, structs with pointers to "Foo" and functions that take pointers or references to Foo structs can be declared without the entire definition of Foo being available.
  • New built-in types size_t, ptrdiff_t, and [u]intptr_t are now available, corresponding to the equivalent types in C.
  • The standard library now provides atomic_swap*() and atomic_compare_exchange*() functions for void * types.
  • The C++ backend has seen a number of improvements to the quality and readability of generated code.
  • A number of bugs have been fixed in this release as well. The most
  • significant are:
  • Fixed a bug where nested loops could cause a compiler crash in some circumstances (issues #240, and #229)
  • Gathers could access invlaid mamory (and cause the program to crash) in some circumstances (#235)
  • References to temporary values are now handled properly when passed to a function that takes a reference typed parameter.
  • A case where incorrect code could be generated for compile-time-constant initializers has been fixed (#234).

New in ispc 1.2.1 (Oct 15, 2012)

  • This release contains only minor new functionality and is mostly for many
  • small bugfixes and improvements to error handling and error reporting.
  • The new functionality that is present is:
  • Significantly more efficient versions of the float / half conversion routines are now available in the standard library, thanks to Fabian Giesen.
  • The last member of a struct can now be a zero-length array; this allows the trick of dynamically allocating enough storage for the struct and some number of array elements at the end of it.
  • Significant bugs fixed include:
  • When a target ISA isn't specified, use the host system's capabilities to choose a target for which it will be able to run the generated code.
  • Don't allocate storage for global variables that are declared "extern".
  • Allow NULL as a default argument value in a function declaration.
  • Fix bugs where taking the address of a function wouldn't work as expected.
  • When there are overloaded variants of a function that take both reference and const reference parameters, give the non-const reference preference when matching values of that underlying type.
  • An error is issed when a varying lvalue is assigned to a reference type (rather than crashing).
  • Permit conversions from array types to void *, not just the pointer type of the underlying array element.
  • Still evaluate expressions that are cast to (void).
  • The documentation has also been improved, with FAQs added to clarify some aspects of the ispc pointer model.

New in ispc 1.2.0 (Oct 15, 2012)

  • This is a major new release of ispc, with a number of significant improvements to functionality, performance, and compiler robustness. It does, however, include three small changes to language syntax and emantics that may require changes to existing programs:
  • Syntax for the "launch" keyword has been cleaned up; it's now no longer necessary to bracket the launched function call with angle brackets. (In other words, now use "launch foo();", rather than "launch < foo() >;".
  • When using pointers, the pointed-to data type is now "uniform" by default. Use the varying keyword to specify varying pointed-to types when needed. (i.e. "float *ptr" is a varying pointer to uniform float data, whereas previously it was a varying pointer to varying float values.)
  • Use "varying float *" to specify a varying pointer to varying float data, and so forth.
  • The details of "uniform" and "varying" and how they interact with struct types have been cleaned up. Now, when a struct type is declared, if the struct elements don't have explicit "uniform" or "varying" qualifiers, they are said to have "unbound" variability. When a struct type is instantiated, any unbound variability elements inherit the variability of the parent struct type. See http://ispc.github.com/ispc.html#struct-types for more details.
  • ispc has a new language feature that makes it much easier to use the efficient "(array of) structure of arrays" (AoSoA, or SoA) memory layout of data. A new "soa" qualifier can be applied to structure types to specify an n-wide SoA version of the corresponding type. Array indexing and pointer operations with arrays SoA types automatically handles the two-stage indexing calculation to access the data. See http://ispc.github.com/ispc.html#structure-of-array-types for more details.
  • For more efficient access of data that is still in "array of structures" (AoS) format, ispc has a new "memory coalescing" optimization that automatically detects series of strided loads and/or gathers that can be transformed into a more efficient set of vector loads and shuffles. A diagnostic is emitted when this optimization is successfully applied.
  • Smaller changes in this release:
  • The standard library now provides memcpy(), memmove() and memset() functions, as well as single-precision asin() and acos() functions.
  • I can now be specified on the command-line to specify a search path for include files.
  • A number of improvements have been made to error reporting from the parser, and a number of cases where malformed programs could cause the compiler to crash have been fixed.
  • A number of small improvements to the quality and performance of generated code have been made, including finding more cases where 32-bit addressing calculations can be safely done on 64-bit systems and generating better code for initializer expressions.

New in ispc 1.1.4 (Oct 15, 2012)

  • There are two major bugfixes for Windows in this release. First, a number of failures in AVX code generation on Windows have been fixed; AVX on Windows now has no known issues. Second, a longstanding bug in parsing 64-bit integer constants on Windows has been fixed.
  • This release features a new experimental scalar target, contributed by Gabe Weisz . This target ("--target=generic-1") compiles gangs of single program instances (i.e. programCount == 1); it can be useful for debugging ispc programs.
  • The compiler now supports dynamic memory allocation in ispc programs (with "new" and "delete" operators based on C++). See http://ispc.github.com/ispc.html#dynamic-memory-allocation in the documentation for more information.
  • ispc now performs "short circuit" evaluation of the || and && logical operators and the ? : selection operator. (This represents the correction of a major incompatibility with C.) Code like "(index < arraySize && array[index] == 1)" thus now executes as in C, where "array[index]" won't be evaluated unless "index" is less than "arraySize".
  • The standard library now provides "local" atomic operations, which are atomic across the gang of program instances (but not across other gangs or other hardware threads. See the updated documentation on atomics for more
  • information: http://ispc.github.com/ispc.html#atomic-operations-and-memory-fences.
  • The standard library now offers a clock() function, which returns a uniform int64 value that counts processor cycles; it can be used for fine-resolution timing measurements.
  • Finally (of limited interest now): ispc now supports the forthcoming AVX2 instruction set, due with Haswell-generation CPUs. All tests and examples compile and execute correctly with AVX2. (Thanks specifically to Craig Topper and Nadav Rotem for work on AVX2 support in LLVM, which made this possible.)

New in ispc 1.1.3 (Oct 15, 2012)

  • With this release, the language now supports "switch" statements, with the same semantics and syntax as in C.
  • This release includes fixes for two important performance related issues: the quality of code generated for "foreach" statements has been substantially improved (https://github.com/ispc/ispc/issues/151), and a performance regression with code for "gathers" that was introduced in v1.1.2 has been fixed in this release.
  • A number of other small bugs were fixed in this release as well, including one where invalid memory would sometimes be incorrectly accessed (https://github.com/ispc/ispc/issues/160).
  • Thanks to Jean-Luc Duprat for a number of patches that improve support for building on various platforms, and to Pierre-Antoine Lacaze for patches so that ispc builds under MinGW.

New in ispc 1.1.2 (Oct 15, 2012)

  • The major new feature in this release is support for "generic" C++
  • vectorized output; in other words, ispc can emit C++ code that corresponds
  • to the vectorized computation that the ispc program represents. See the
  • examples/intrinsics directory in the ispc distribution for two example
  • implementations of the set of functions that must be provided map the
  • vector calls generated by ispc to target specific functions.
  • ispc now has partial support for 'goto' statements; specifically, goto is
  • allowed if any enclosing control flow statements (if/for/while/do) have
  • 'uniform' test expressions, but not if they have 'varying' tests.
  • A number of improvements have been made to the code generated for gathers
  • and scatters--one of them (better matching x86's "free" scale by 2/4/8 for
  • addressing calculations) improved the performance of the noise example by
  • 14%.
  • Many small bugs have been fixed in this release as well, including issue
  • numbers 138, 129, 135, 127, 149, and 142.

New in ispc 1.1.1 (Oct 15, 2012)

  • This release doesn't include any significant new functionality, but does include a small improvements in generated code and a number of bug fixes.
  • The one user-visible language change is that integer constants may be specified with 'u' and 'l' suffixes, like in C. For example, "1024llu" defines the constant with unsigned 64-bit type.
  • More informative and useful error messages are printed when function overload resolution fails.
  • Masking is avoided in additional cases when the mask can be statically-determined to be all on.
  • A number of small bugs have been fixed:
  • Under some circumstances, incorrect masks were used when assigning a value to a reference and when doing gathers/scatters.
  • Incorrect code could be generated in some cases when some instances returned part way through a function but others contineud executing.
  • Type checking wasn't being performed for calls through function pointers; now an error is issued if the arguments don't match up, etc.
  • Incorrect code was being generated for gather/scatter to structs that had elements with varying short-vector types.
  • Typechecking wasn't being performed for "foreach" statements; this led to problems like function overload resolution not being performed if an overloaded function call was used to determine the iteration range..
  • A number of symbols would be multiply-defined when compiling to multiple targets and using the sse2-x2 target as one of them (issue #131).