Intel Math Kernel Library Changelog
What's new in Intel Math Kernel Library 11.2
Aug 29, 2014- Intel MKL now provides optimizations for all Intel® Atom™ processors that support Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1) and Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) instruction sets
- Introduced support for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set with limited optimizations in BLAS, DFT and VML
- Introduced Verbose support for BLAS and LAPACK domains, which enables users to capture the input parameters to Intel MKL function calls
- Introduced support for Intel® MPI Library 5.0
- Introduced the Intel Math Kernel Library Cookbook (http://software.intel.com/en-us/mkl_cookbook) , a new document that describes how to use Intel MKL routines to solve certain complex problems
- Introduced the MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ compilation feature that provides ?GEMM small matrix performance improvements for all processors (see the Intel® Math Kernel Library User's Guide for more details)
- Added the ability to link a Single Dynamic Library (mkl_rt) on Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
- Added a customizable error handler.See the Intel Math Kernel Library Reference Manual description of mkl_set_exit_handler() for further details
- Extended the Intel® Xeon Phi™ coprocessor Automatic Offload feature with a resource sharing mechanism.See the Intel Math Kernel Library Reference Manual for the description of mkl_mic_set_resource_limit() function and the MKL_MIC_RESOURCE_LIMIT environment variable for further details
- Parallel Direct Sparse Solver for Clusters:
- Introduced Parallel Direct Sparse Solver for Clusters, a distributed memory version of Intel MKL PARDISO direct sparse solver
- Improved performance of the matrix gather step for distributed matrices
- Enabled reuse of reordering information on multiple factorization steps
- Added distributed CSR format, support of distributed matrices, RHS, and distributed solutions
- Added support of solving of systems with multiple right hand sides
- Added cluster support of factorization and solving steps
- Added support for pure MPI mode and support for single OpenMP thread in hybrid configurations
- BLAS:
- Improved threaded performance of ?GEMM for all 64-bit architectures supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2)
- Optimized ?GEMM, ?TRSM, DTRMM for the Intel AVX-512 instruction set
- Improved performance of ?GEMM for outer product [large m, large n, small k] and tall skinny matrices [large m, medium n, small k] on Intel MIC Architecture
- Improved performance of ?TRSM and ?SYMM in Automatic Offload mode on Intel MIC Architecture
- Improved performance of Level 3 BLAS functions for 64-bit processors supporting Intel AVX2
- Improved ?GEMM performance on small matrices for all processors when MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ is defined during compilation (see the Intel® Math Kernel Library User’s Guide for more details )
- Improved performance of DGER and DGEMM for the beta=1, k=1 case for 64-bit processors supporting Intel SSE4.2, Intel® Advanced Vector Extensions (Intel® AVX), and Intel AVX2 instruction sets
- Optimized (D/Z)AXPY for the Intel AVX-512 instruction set
- Optimized ?COPY for Intel AVX2 and AVX512 instruction sets
- Optimized DGEMV for Intel AVX-512 instruction set
- Improved performance of SSYR2K for 64-bit processors supporting Intel AVX and Intel AVX2
- Improved threaded performance of ?AXPBY for all Intel processors
- Improved DTRMM performance for the side=R, uplo={U,L}, transa=N, diag={N,U} cases for Intel AVX-512
- LINPACK:
- Improved performance of matrix generation in the heterogeneous Intel® Optimized MP LINPACK Benchmark for Clusters
- Intel MIC Architecture offload option of the Intel Optimized MP LINPACK Benchmark for Clusters package now supports Intel AVX2 hosts
- Improved performance of the Intel Optimized MP LINPACK for Clusters package for 64-bit processors supporting Intel AVX2
- LAPACK:
- Improved performance of ?(SY/HE)RDB
- Improved performance of ?(SY/HE)(EV/EVD) when eigenvectors are needed
- Improved performance of ?(SY/HE)(EV/EVR/EVD) when eigenvectors are not needed
- Improved performance of ?GELQF,?GELS and ?GELSS for underdetermined case (M less than N)
- Improved performance of ?GEHRD,?GEEV and ?GEES
- Improved performance of NaN checkers in LAPACKE interfaces
- Improved performance of ?GELSX, ?GGSVP
- Improved performance of ?GETRF
- Improved performance of (S/D)GE(SVD/SDD) when M>=N and singular vectors are not needed
- Improved performance of ?POTRF UPLO=U in Automatic Offload mode on Intel MIC Architecture
- Added Automatic Offload for ?SYRDB on Intel MIC Architecture, which speeds up ?SY(EV/EVD/EVR) when eigenvectors are not needed
- PBLAS and ScaLAPACK:
- Enabled Automatic Offload in P?GEMM routines for large distribution blocking factors
- Sparse BLAS:
- Optimized SpMV kernels for Intel AVX-512 instruction set
- Added release example for diagonal format use in Sparse BLAS
- Improved Sparse BLAS level 2 and 3 performance for systems supporting Intel SSE4.2, Intel AVX and Intel AVX2 instruction sets
- Intel MKL PARDISO:
- Added the ability to store Intel MKL PARDISO handle to the disk for future use at any solver stage
- Added pivot control support for unsymmetric matrices and out-of-core mode
- Added diagonal extraction support for unsymmetric matrices and out-of-core mode
- Added example demonstrating use of Intel MKL PARDISO as iterative solver for non-linear systems
- Added capability to free memory taken by original matrix after factorization stage if iterative refinement is disabled
- Improved memory estimation of out-of-core (OOC) portion size for reordering algorithm leading to improved factorization-solve performance in OOC mode
- Improved message output from Intel MKL PARDISO
- Added support of zero pivot during factorization for structurally symmetric cases
- Poisson library:
- Added example demonstrating use of the Intel MKL Poisson library as a preconditioner for linear systems solves
- Extended Eigensolver:
- Improved message output
- Improved examples
- Added input and output iparm parameters in predefined interfaces for solving sparse problems
- FFT:
- Optimized FFTs for the Intel AVX-512 instruction set
- Improved performance for non-power-of-2 length on Intel® MIC Architecture
- VML: Added v[d|s]Frac function computing fractional part for each vector element
- VSL RNG:
- Added support of ntrial=0 in Binomial Random Number Generator
- Improved performance of MRG32K3A and MT2203 BRNGs on Intel MIC Architecture
- Improved performance of MT2203 BRNG on CPUs supporting Intel AVX and Intel AVX2 instruction sets
- VSL Summary Statistics:
- Added support for group/pooled mean estimates (VSL_SS_GROUP_MEAN/VSL_SS_POOLED_MEAN)
- Data Fitting: Fixed incorrect behavior of the natural cubic spline construction function when number of breakpoints is 2 or 3
- Introduced an Intel MKL mode that ignores all settings specified by Intel MKL environment variables
- User can set up the mode by calling mkl_set_env_mode() routine which directs Intel MKL to ignore all environment settings specific to Intel MKL so that all Intel MKL related environment variables such as MKL_NUM_THREADS, MKL_DYNAMIC, MKL_MIC_ENABLE and others are ignored; users can instead set needed parameters via Intel MKL service routines such as mkl_set_num_threads() and mkl_mic_enable()
New in Intel Math Kernel Library 9.1.022 (Jun 25, 2007)
- Optimizations for the new Quad-Core Intel Xeon processor 5300 series
- Improvements in Version 9.1� below.
- 64-Bit for Mac OS: 32- and 64-bit binaries are now available for Mac OS*
- Universal binaries are also available
- 64-Bit Integer (ILP64): A 64-bit integer (ILP64) interface for the library is now provided through addition of new library files in the main product package
- ILP64 version of the PARDISO direct sparse solver is now available also
- LAPACK 3.1 Support: Intel MKL is compliant with new LAPACK 3.1 specification.
- Spare BLAS Threading Support: The following sparse BLAS triangular solvers were threaded with OpenMP in 9.0 release:
- mkl_dcsrmm - Level 3 triangular solver for the compressed sparse row format
- mkl_dcscmm - Level 3 triangular solver for the compressed sparse column format
- mkl_dcoomm - Level 3 triangular solver for the coordinate format
- New Iterative Solver: New Conjugate Gradient solver with Multiple Right-Hand Sides (MRHS)
- New ILU(0) accelerator/preconditioner for the RCI FGMRES iterative solver
- FGMRES interative solver added in Intel MKL 9.0 release
- New Optimization Solvers: New solvers for nonlinear least square problems with and without boundary constraints
- VML Functions and Threading Support: New nearest integer functions: Trunc, Ceil, Floor, Round, NearbyInt, Rint
- All VML functions are now threaded (with OpenMP)
- Partial Differential Equations: Added new fast Helmholtz and Poisson solvers for spherical coordinates to our existing solvers for cartesian coordinte.