What's new in Intel Math Kernel Library 11.2

Aug 29, 2014
  • Intel MKL now provides optimizations for all Intel® Atom™ processors that support Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1) and Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) instruction sets
  • Introduced support for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set with limited optimizations in BLAS, DFT and VML
  • Introduced Verbose support for BLAS and LAPACK domains, which enables users to capture the input parameters to Intel MKL function calls
  • Introduced support for Intel® MPI Library 5.0
  • Introduced the Intel Math Kernel Library Cookbook (http://software.intel.com/en-us/mkl_cookbook) , a new document that describes how to use Intel MKL routines to solve certain complex problems
  • Introduced the MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ compilation feature that provides ?GEMM small matrix performance improvements for all processors (see the Intel® Math Kernel Library User's Guide for more details)
  • Added the ability to link a Single Dynamic Library (mkl_rt) on Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
  • Added a customizable error handler.See the Intel Math Kernel Library Reference Manual description of mkl_set_exit_handler() for further details
  • Extended the Intel® Xeon Phi™ coprocessor Automatic Offload feature with a resource sharing mechanism.See the Intel Math Kernel Library Reference Manual for the description of mkl_mic_set_resource_limit() function and the MKL_MIC_RESOURCE_LIMIT environment variable for further details
  • Parallel Direct Sparse Solver for Clusters:
  • Introduced Parallel Direct Sparse Solver for Clusters, a distributed memory version of Intel MKL PARDISO direct sparse solver
  • Improved performance of the matrix gather step for distributed matrices
  • Enabled reuse of reordering information on multiple factorization steps
  • Added distributed CSR format, support of distributed matrices, RHS, and distributed solutions
  • Added support of solving of systems with multiple right hand sides
  • Added cluster support of factorization and solving steps
  • Added support for pure MPI mode and support for single OpenMP thread in hybrid configurations
  • BLAS:
  • Improved threaded performance of ?GEMM for all 64-bit architectures supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2)
  • Optimized ?GEMM, ?TRSM, DTRMM for the Intel AVX-512 instruction set
  • Improved performance of ?GEMM for outer product [large m, large n, small k] and tall skinny matrices [large m, medium n, small k] on Intel MIC Architecture
  • Improved performance of ?TRSM and ?SYMM in Automatic Offload mode on Intel MIC Architecture
  • Improved performance of Level 3 BLAS functions for 64-bit processors supporting Intel AVX2
  • Improved ?GEMM performance on small matrices for all processors when MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ is defined during compilation (see the Intel® Math Kernel Library User’s Guide for more details )
  • Improved performance of DGER and DGEMM for the beta=1, k=1 case for 64-bit processors supporting Intel SSE4.2, Intel® Advanced Vector Extensions (Intel® AVX), and Intel AVX2 instruction sets
  • Optimized (D/Z)AXPY for the Intel AVX-512 instruction set
  • Optimized ?COPY for Intel AVX2 and AVX512 instruction sets
  • Optimized DGEMV for Intel AVX-512 instruction set
  • Improved performance of SSYR2K for 64-bit processors supporting Intel AVX and Intel AVX2
  • Improved threaded performance of ?AXPBY for all Intel processors
  • Improved DTRMM performance for the side=R, uplo={U,L}, transa=N, diag={N,U} cases for Intel AVX-512
  • LINPACK:
  • Improved performance of matrix generation in the heterogeneous Intel® Optimized MP LINPACK Benchmark for Clusters
  • Intel MIC Architecture offload option of the Intel Optimized MP LINPACK Benchmark for Clusters package now supports Intel AVX2 hosts
  • Improved performance of the Intel Optimized MP LINPACK for Clusters package for 64-bit processors supporting Intel AVX2
  • LAPACK:
  • Improved performance of ?(SY/HE)RDB
  • Improved performance of ?(SY/HE)(EV/EVD) when eigenvectors are needed
  • Improved performance of ?(SY/HE)(EV/EVR/EVD) when eigenvectors are not needed
  • Improved performance of ?GELQF,?GELS and ?GELSS for underdetermined case (M less than N)
  • Improved performance of ?GEHRD,?GEEV and ?GEES
  • Improved performance of NaN checkers in LAPACKE interfaces
  • Improved performance of ?GELSX, ?GGSVP
  • Improved performance of ?GETRF
  • Improved performance of (S/D)GE(SVD/SDD) when M>=N and singular vectors are not needed
  • Improved performance of ?POTRF UPLO=U in Automatic Offload mode on Intel MIC Architecture
  • Added Automatic Offload for ?SYRDB on Intel MIC Architecture, which speeds up ?SY(EV/EVD/EVR) when eigenvectors are not needed
  • PBLAS and ScaLAPACK:
  • Enabled Automatic Offload in P?GEMM routines for large distribution blocking factors
  • Sparse BLAS:
  • Optimized SpMV kernels for Intel AVX-512 instruction set
  • Added release example for diagonal format use in Sparse BLAS
  • Improved Sparse BLAS level 2 and 3 performance for systems supporting Intel SSE4.2, Intel AVX and Intel AVX2 instruction sets
  • Intel MKL PARDISO:
  • Added the ability to store Intel MKL PARDISO handle to the disk for future use at any solver stage
  • Added pivot control support for unsymmetric matrices and out-of-core mode
  • Added diagonal extraction support for unsymmetric matrices and out-of-core mode
  • Added example demonstrating use of Intel MKL PARDISO as iterative solver for non-linear systems
  • Added capability to free memory taken by original matrix after factorization stage if iterative refinement is disabled
  • Improved memory estimation of out-of-core (OOC) portion size for reordering algorithm leading to improved factorization-solve performance in OOC mode
  • Improved message output from Intel MKL PARDISO
  • Added support of zero pivot during factorization for structurally symmetric cases
  • Poisson library:
  • Added example demonstrating use of the Intel MKL Poisson library as a preconditioner for linear systems solves
  • Extended Eigensolver:
  • Improved message output
  • Improved examples
  • Added input and output iparm parameters in predefined interfaces for solving sparse problems
  • FFT:
  • Optimized FFTs for the Intel AVX-512 instruction set
  • Improved performance for non-power-of-2 length on Intel® MIC Architecture
  • VML: Added v[d|s]Frac function computing fractional part for each vector element
  • VSL RNG:
  • Added support of ntrial=0 in Binomial Random Number Generator
  • Improved performance of MRG32K3A and MT2203 BRNGs on Intel MIC Architecture
  • Improved performance of MT2203 BRNG on CPUs supporting Intel AVX and Intel AVX2 instruction sets
  • VSL Summary Statistics:
  • Added support for group/pooled mean estimates (VSL_SS_GROUP_MEAN/VSL_SS_POOLED_MEAN)
  • Data Fitting: Fixed incorrect behavior of the natural cubic spline construction function when number of breakpoints is 2 or 3
  • Introduced an Intel MKL mode that ignores all settings specified by Intel MKL environment variables
  • User can set up the mode by calling mkl_set_env_mode() routine which directs Intel MKL to ignore all environment settings specific to Intel MKL so that all Intel MKL related environment variables such as MKL_NUM_THREADS, MKL_DYNAMIC, MKL_MIC_ENABLE and others are ignored; users can instead set needed parameters via Intel MKL service routines such as mkl_set_num_threads() and mkl_mic_enable()

New in Intel Math Kernel Library 9.1.022 (Jun 25, 2007)

  • Optimizations for the new Quad-Core Intel Xeon processor 5300 series
  • Improvements in Version 9.1� below.
  • 64-Bit for Mac OS: 32- and 64-bit binaries are now available for Mac OS*
  • Universal binaries are also available
  • 64-Bit Integer (ILP64): A 64-bit integer (ILP64) interface for the library is now provided through addition of new library files in the main product package
  • ILP64 version of the PARDISO direct sparse solver is now available also
  • LAPACK 3.1 Support: Intel MKL is compliant with new LAPACK 3.1 specification.
  • Spare BLAS Threading Support: The following sparse BLAS triangular solvers were threaded with OpenMP in 9.0 release:
  • mkl_dcsrmm - Level 3 triangular solver for the compressed sparse row format
  • mkl_dcscmm - Level 3 triangular solver for the compressed sparse column format
  • mkl_dcoomm - Level 3 triangular solver for the coordinate format
  • New Iterative Solver: New Conjugate Gradient solver with Multiple Right-Hand Sides (MRHS)
  • New ILU(0) accelerator/preconditioner for the RCI FGMRES iterative solver
  • FGMRES interative solver added in Intel MKL 9.0 release
  • New Optimization Solvers: New solvers for nonlinear least square problems with and without boundary constraints
  • VML Functions and Threading Support: New nearest integer functions: Trunc, Ceil, Floor, Round, NearbyInt, Rint
  • All VML functions are now threaded (with OpenMP)
  • Partial Differential Equations: Added new fast Helmholtz and Poisson solvers for spherical coordinates to our existing solvers for cartesian coordinte.