ViennaCL Changelog

What's new in ViennaCL 1.6.2 Beta

Dec 12, 2014
  • The focus of this bugfix release is to further improve the portability of the library and some performance improvements:
  • sliced_ell_matrix: Using better default values for NVIDIA GPUs for better performance
  • pipelined CG/BiCGStab/GMRES: Improved parameters and one kernel for NVIDIA GPUs.
  • pipelined CG/BiCGStab/GMRES: Improved function overloads so that a user call does not accidentally use the non-pipelined implementation.
  • compressed_matrix: Fixed runtime error when switching the memory location at runtime.
  • compressed_compressed_matrix: Fixed a wrong buffer size in clear().
  • hyb_matrix: Using more portable and faster default settings with OpenCL.
  • coordinate_matrix: Fixed incorrect CUDA and OpenCL kernels for the extraction of diagonals and row-norms.
  • CUDA with Visual Studio: Fixed compilation errors and warnings (thanks to Andreas Rost for the hint).
  • direct solve benchmark: Removed unnecessary Boost.uBLAS dependency.
  • OpenMP: Fixed unspecified behavior for private and shared variables as well as reductions if the type is deduced from a template parameter (thanks to GitHub user aokomoriuta for precious input).
  • OpenMP: Ensured compatibility with version 2.0 (no unsigned integers as loop variables).
  • AMG: Fixed incorrect detection of coarse level operator dimensions (thanks again to Andreas Rost).

New in ViennaCL 1.6.1 Beta (Nov 21, 2014)

  • compressed_matrix: Implemented fast CSR-adaptive kernel as presented by Greathouse and Daga from AMD at the Supercomputing Conference 2014, leading to substantial performance improvements on average.
  • matrix: Improved performance of tridiagonal solves for transposed system matrix or right hand side.
  • OpenCL: Added missing kernel default settings for accelerators such as Xeon Phi
  • GMRES: Improved the numerical robustness of the pipelined implementation
  • SPAI: Fixed an incorrect buffer size
  • SVD: Fixed a bug which caused incorrect results.
  • matrix: Fixed an incorrect matrix transposition for large matrices (thanks to Dominic Meiser for the Travis CI integration which revealed this flaw).
  • compressed_matrix: Fixed an invalid memory access for triangular solves for some sparsity patterns.
  • Self-assignments: Corner cases such as A = prod(A, A); are now handled correctly.
  • Documentation: Various improvements.
  • Visual Studio: Fixed spurious performance warnings when using certain sparse matrix types.

New in ViennaCL 1.6.0 Beta (Nov 12, 2014)

  • Major update of the internal OpenCL kernel generator, which is now used for all BLAS operations with device-specific parameters and greatly improves performance particularly on older AMD GPUs.
  • Iterative solvers: Added pipelined implementations of CG, BiCGStab and GMRES, which are up to three times faster than other GPU-enabled solver libraries
  • sliced_ell_matrix: Added new sparse matrix type implementing the sliced ELLPACK format for all three compute backends as proposed in the paper A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units by Kreutzer et al.
  • CUDA: Added option for specifying the CUDA architecture through CMake in the CUDA_ARCH_FLAG variable.
  • Added viennacl/version.hpp for identifying the ViennaCL version (thanks to GitHub user vigsterkr for bringing this up).
  • Added viennacl::min() and viennacl::max() to obtain the minimum and maximum element in a vector (thanks to sourceforge user cpp1 for the suggestion).
  • Matrix-matrix products on CPU: Improved performance by about an order of magnitude.
  • matrix: Added constructor and example for wrapping user-provided CUDA buffers.
  • Eigenvalues: Implementation of bisection algorithm for tridiagonal symmetric matrices. API still experimental.
  • Eigenvalues: Extended current implementation of the QR method to OpenMP and CUDA backends. API remains experimental.
  • -Scan: Implemented inclusive and exclusive scans for all three compute backends . API still experimental.
  • Triangular solvers: Improved performance particularly for multiple right hand sides (i.e. BLAS level 3 solves).
  • matrix: Improved performance for matrix transposition
  • FFT: Now also supports OpenMP and CUDA backends
  • Nonnegative matrix factorization: Now also supports OpenMP and CUDA backends
  • Nonnegative matrix factorization: Now using correct contexts and better robustness for initial guesses being all-zero.
  • Documentation: Integrated former LaTeX manual into Doxygen, resulting in an all-HTML documentation. Enables much better cross-referencing.
  • Benchmarks: Merged separate BLAS level 1/2/3 benchmarks into a single benchmark printing performance for all three levels.
  • matrix_range: Fixed incorrect copying of data back to the host
  • hyb_matrix: Fixed a bug in the constructor for the non-square case
  • OpenMP: Added linker flags when building with OpenMP on MinGW
  • OpenCL: Added optional caching of OpenCL kernels at a user-defined location in the file system if the environment variable VIENNACL_CACHE_PATH is defined.
  • CMake: Enclosing variables in double quotes where necessary
  • matrix: Better reuse of existing memory buffers to better support user-provided buffers.
  • Sparse matrices: Added .clear() member function for freeing internal memory buffers
  • - Better support for passing host scalars of non-matching scalar type for operations on vectors and matrices
  • Vector: Passing an empty vector as the destination of a two-parameter version of viennacl::copy() now resizes the vector automatically for higher convenience.
  • AMG: Fixed invalid decrement of auxiliary iterator
  • Tools: Added sparse matrix generation routine for matrices obtained from finite difference discretizations in 2D.
  • uBLAS: Including necessary header when working with compressed_matrix
  • Visual Studio 2012: Fixed compilation problems for some new submodules
  • Iterators: Fixed incorrect index calculations when using iterators on vectors
  • matrix_base: The storage layout of matrices is internally managed as a runtime parameter, which allows for a better internal dispatch in order to use faster compute kernels.
  • Random numbers: Removed experimental random number generation in viennacl/rand/

New in ViennaCL 1.5.2 Beta (May 16, 2014)

  • Fixed compilation problems on Visual Studio for the operations y += prod(A, x) and y -= prod(A, x) with dense matrix A.
  • Added a better performance profile for NVIDIA Kepler GPUs. For example, this increases the performance of matrix-matrix multiplications to 600 GFLOPs in single precision on a GeForce GTX 680. Thanks to Paul Dufort for bringing this to our attention.
  • Added support for the operation A = trans(B) for matrices A and B to the scheduler.
  • Fixed compilation problems in block-ILU preconditioners when passing block boundaries manually.
  • Ensured compatibility with OpenCL 1.0, which may still be available on older devices.

New in ViennaCL 1.5.1 Beta (Jan 22, 2014)

  • This maintenance release fixes a few nasty bugs:
  • Fixed a memory leak in the OpenCL kernel generator. Thanks to GitHub user dxyzab for spotting this.
  • Added compatibility of the mixed precision CG implementation with older AMD GPUs. Thanks to Andreas Rost for the input.
  • Fixed an error when running the QR factorization for matrices with less rows than columns. Thanks to Karol Polko for reporting.
  • Readded accidentally removed chapters on additional algorithms and structured matrices to the manual. Thanks to Sajjadul Islam for the hint.
  • Fixed buggy OpenCL kernels for matrix additions and subtractions for column-major matrices. Thanks to Tom Nicholson for reporting.
  • Fixed an invalid default kernel parameter set for matrix-matrix multiplications on CPUs when using the OpenCL backend. Thanks again to Tom Nicholson.
  • Corrected a weak check used in two tests. Thanks to Walter Mascarenhas for providing a fix.
  • Fixed a wrong global work size inside the SPAI preconditioner. Thanks to Andreas Rost.

New in ViennaCL 1.5.0 Beta (Jan 22, 2014)

  • This new minor release number update focuses on a more powerful API, and on first steps in making ViennaCL more accessible from languages other than C++.
  • In addition to many internal improvements both in terms of performance and flexibility, the following changes are visible to users:
  • API-change: User-provided OpenCL kernels extract their kernels automatically. A call to add_kernel() is now obsolete, hence the function was removed.
  • API-change: Device class has been extend and supports all informations defined in the OpenCL 1.1 standard through member functions. Duplicate compute_units() and max_work_group_size() have been removed (thanks for Shantanu Agarwal for the input).
  • API-change: viennacl::copy() from a ViennaCL object to an object of non-ViennaCL type no longer tries to resize the object accordingly. An assertion is thrown if the sizes are incorrect in order to provide a consistent behavior across many different types.
  • Datastructure change: Vectors and matrices are now padded with zeros by default, resulting in higher performance particularly for matrix operations. This padding needs to be taken into account when using fast_copy(), particularly for matrices.
  • Fixed problems with CUDA and CMake+CUDA on Visual Studio.
  • coordinate_matrix now also behaves correctly for tiny matrix dimensions.
  • CMake 2.6 as new minimum requirement instead of CMake 2.8.
  • Vectors and matrices can be instantiated with integer template types (long, int, short, char).
  • Added support for element_prod() and element_div() for dense matrices.
  • Added element_pow() for vectors and matrices.
  • Added norm_frobenius() for computing the Frobenius norm of dense matrices.
  • Added unary element-wise operations for vectors and dense matrices: element_sin(), element_sqrt(), etc.
  • Multiple OpenCL contexts can now be used in a multi-threaded setting (one thread per context).
  • Multiple inner products with a common vector can now be computed efficiently via e.g.~inner_prod(x, tie(y, z));
  • Added support for prod(A, B), where A is a sparse matrix type and B is a dense matrix (thanks to Albert Zaharovits for providing parts of the implementation).
  • Added diag() function for extracting the diagonal of a vector to a matrix, or for generating a square matrix from a vector with the vector elements on a diagonal (similar to MATLAB).
  • Added row() and column() functions for extracting a certain row or column of a matrix to a vector.
  • Sparse matrix-vector products now also work with vector strides and ranges.
  • Added async_copy() for vectors to allow for a better overlap of computation and communication.
  • Added compressed_compressed_matrix type for the efficient representation of CSR matrices with only few nonzero rows.
  • Added possibility to switch command queues in OpenCL contexts.
  • Improved performance of Block-ILU by removing one spurious conversion step.
  • Improved performance of Cuthill-McKee algorithm by about 40 percent.
  • Improved performance of power iteration by avoiding the creation of temporaries in each step.
  • Removed spurious status message to cout in matrix market reader and nonnegative matrix factorization.
  • The OpenCL kernel launch logic no longer attempts to re-launch the kernel with smaller work sizes if an error is encountered (thanks to Peter Burka for pointing this out).
  • Reduced overhead for lenghty expressions involving temporaries (at the cost of increased compilation times).
  • vector and matrix are now padded to dimensions being multiples of 128 per default. This greatly improves GEMM performance for arbitrary sizes.
  • Loop indices for OpenMP parallelization are now all signed, increasing compatibility with older OpenMP implementations (thanks to Mrinal Deo for the hint).
  • Complete rewrite of the generator. Now uses the scheduler for specifying the operation. Includes a full device database for portable high performance of GEMM kernels.
  • Added micro-scheduler for attaching the OpenCL kernel generator to the user API.
  • Certain BLAS functionality in ViennaCL is now also available through a shared library (libviennacl).
  • Removed the external kernel parameter tuning factility, which is to be replaced by an internal device database through the kernel generator.
  • Completely eliminated the OpenCL kernel conversion step in the developer repository and the source-release. One can now use the developer version without the need for a Boost installation.

New in ViennaCL 1.4.2 Beta (Apr 29, 2013)

  • This is a maintenance release, particularly resolving compilation problems with Visual Studio 2012.
  • Largely refactored the internal code base, unifying code for vector, vector_range, and vector_slice.
  • Similar code refactoring was applied to matrix, matrix_range, and matrix_slice.
  • This not only resolves the problems in VS 2012, but also leads to shorter compilation times and a smaller code base.
  • Improved performance of matrix-vector products of compressed_matrix on CPUs using OpenCL.
  • Resolved a bug which shows up if certain rows and columns of a compressed_matrix are empty and the matrix is copied back to host.
  • Fixed a bug and improved performance of GMRES.
  • Added additional Doxygen documentation.

New in ViennaCL 1.4.1 Beta (Feb 22, 2013)

  • This release focuses on improved stability and performance on AMD devices rather than introducing new features:
  • Included fast matrix-matrix multiplication kernel for AMD's Tahiti GPUs if matrix dimensions are a multiple of 128.
  • Our sample HD7970 reaches over 1.3 TFLOPs in single precision and 200 GFLOPs in double precision (counting multiplications and additions as separate operations).
  • All benchmark FLOPs are now using the common convention of counting multiplications and additions separately (ignoring fused multiply-add).
  • Fixed a bug for matrix-matrix multiplication with matrix_slice when slice dimensions are multiples of 64.
  • Improved detection logic for Intel OpenCL SDK.
  • Fixed issues when resizing an empty compressed_matrix.
  • Fixes and improved support for BLAS-1-type operations on dense matrices and vectors.
  • Vector expressions can now be passed to inner_prod() and norm_1(), norm_2() and norm_inf() directly.
  • Improved performance when using OpenMP.
  • Better support for Intel Xeon Phi (MIC).
  • Resolved problems when using OpenCL for CPUs if the number of cores is not a power of 2.
  • Fixed a flaw when using AMG in debug mode.
  • Removed accidental external linkage (invalidating header-only model) of SPAI-related functions
  • Fixed issues with copy back to host when OpenCL handles are passed to CTORs of vector, matrix, or compressed_matrix.
  • Added fix for segfaults on program exit when providing custom OpenCL queues.
  • Fixed bug in copy() to hyb_matrix.
  • Added an overload for result_of::alignment for vector_expression.
  • Added SSE-enabled code

New in ViennaCL 1.4.0 Beta (Dec 3, 2012)

  • Added host-based and CUDA-enabled operations on ViennaCL objects. The default is now a host-based execution for reasons of compatibility. Enable OpenCL- or CUDAbased
  • execution by defining the preprocessor constant VIENNACL_WITH_OPENCL and VIENNACL_WITH_CUDA respectively. Note that CUDA-based execution requires the use of nvcc.
  • Added mixed-precision CG solver (OpenCL-based).
  • Greatly improved performance of ILU0 and ILUT preconditioners (up to 10-fold). Also fixed a bug in ILUT.
  • Added initializer types fromBoost.uBLAS (unit_vector, zero_vector, scalar_vector, identity_matrix, zero_matrix, scalar_matrix).
  • Added incomplete Cholesky factorization preconditioner.
  • Added element-wise operations for vectors as available in Boost.uBLAS (element_prod, element_div).
  • Added restart-after-N-cycles option to BiCGStab.
  • Added level-scheduling for ILU-preconditioners. Performance strongly depends on matrix pattern.
  • Added least-squares example including a function inplace_qr_apply_trans_Q() to
  • compute the right hand side vector QT b without rebuilding Q.
  • Improved performance of LU-factorization of dense matrices.
  • Improved dense matrix-vector multiplication performance
  • Reduced overhead when copying to/from ublas::compressed_matrix.
  • ViennaCL objects (scalar, vector, etc.) can now be used as global variables
  • Refurbished OpenCL vector kernels backend. All operations of the type v1 = a v2 @b v3 with vectors v1, v2, v3 and scalars a and b including += and -= instead of = are now temporary-free. Similarly for matrices.
  • matrix_range and matrix_slice as well as vector_range and vector_slice can
  • now be used and mixed completely seamlessly with all standard operations except lu_factorize().
  • Fixed a bug when using copy() with iterators on vector proxy objects.
  • Final reduction step in inner_prod() and norms is now computed on CPU if the
  • result is a CPU scalar.
  • Reduced kernel launch overhead of simple vector kernels by packing multiple kernel arguments together.
  • Updated SVD code and added routines for the computation of symmetric eigenvalues using OpenCL.
  • custom_operation’s constructor now support multiple arguments, allowing multiple expression to be packed in the same kernel for improved performances. However, all the datastructures in the multiple operations must have the same size.
  • Further improvements to the OpenCL kernel generator: Added a repeat feature for generating loops inside a kernel, added element-wise products and division, added support for every one-argument OpenCL function.
  • The name of the operation is now amandatory argument of the constructor of custom_operation
  • Improved performances of the generated matrix-vector product code.
  • Updated interfacing code for the Eigen library, now working with Eigen 3.x.y.
  • Converter in source-release now depends on Boost.filesystem3 instead of Boost.filesystem2, thus requiring Boost 1.44 or above.

New in ViennaCL 1.3.1 Beta (Aug 10, 2012)

  • Fixed a compilation problem with GCC 4.7 caused by the wrong order of function declarations. Also removed unnecessary indirections and unused variables.
  • Improved out-of-source build in the src-version (for packagers).
  • Added virtual destructor in the runtime_wrapper-class in the kernel generator.
  • Extended flexibility of submatrix and subvector proxies (ranges, slices).
  • Block-ILU for compressed_matrix is now applied on the GPU during the solver cycle phase. However, for the moment the implementation file in viennacl/linalg/detail/ilu/opencl block ilu.hpp needs to be included separately in order to avoid an OpenCL dependency for all ILU implementations.
  • SVD now supports double precision.
  • Slighly adjusted the interface for NMF. The approximation rank is now specified by the supplied matrices W and H.
  • Fixed a problem with matrix-matrix products if the result matrix is not initialized properly (thanks to Laszlo Marak for finding the issue and a fix).
  • The operations C += prod(A, B) and C -= prod(A, B) for matrices A, B, and C no longer introduce temporaries if the three matrices are distinct.

New in ViennaCL 1.3.0 Beta (May 14, 2012)

  • Several new features enter this new minor version release.
  • Some of the experimental features introduced in 1.2.0 keep their experimental state in 1.3.x due to the short time since 1.2.0, with exceptions listed below along with the new features:
  • Full support for ranges and slices for dense matrices and vectors (no longer experimental)
  • QR factorization now possible for arbitrary matrix sizes (no longer experimental)
  • Further improved matrix-matrix multiplication performance for matrix dimensions which are a multiple of 64 (particularly improves performance for NVIDIA GPUs)
  • Added Lanczos and power iteration method for eigenvalue computations of dense and sparse matrices (experimental)
  • Added singular value decomposition in single precision (experimental)
  • Two new ILU-preconditioners added: ILU0 and a block-diagonal ILU preconditioner using either ILUT or ILU0 for each block. Both preconditioners are computed entirely on the CPU.
  • Automated OpenCL kernel generator based on high-level operation specifications added (many thanks to Philippe Tillet who had a lot of /fun fun fun/ working on this)
  • Two new sparse matrix type: ell_matrix for the ELL format and hyb_matrix for a hybrid format.
  • Added possibility to specify the OpenCL platform used by a context
  • Build options for the OpenCL compiler can now be supplied to a context
  • Added nonnegative matrix factorization by Lee and Seoung

New in ViennaCL 1.2.1 Beta (Mar 23, 2012)

  • Many new features from the Google Summer of Code and the IuE Summer of Code enter this release.
  • Due to their complexity, they are for the moment still in experimental state (see the respective chapters for details) and are expected to reach maturity with the 1.3.0 release.
  • Shorter release cycles are planned for the near future.
  • Added a bunch of algebraic multigrid preconditioner variants
  • Added (factored) sparse approximate inverse preconditioner
  • Added fast Fourier transform (FFT) for vector sizes with a power of two, tandard Fourier transform for other sizes
  • Additional structured matrix classes for circulant matrices, Hankel matrices, Toeplitz matrices and Vandermonde matrices
  • Added reordering algorithm
  • Refurbished CMake build system
  • Added matrix and vector proxy objects for submatrix and subvector manipulation
  • Added (possibly GPU-assisted) QR factorization
  • Per default, a viennacl::ocl::context now consists of one device only. The rationale is to provide better out-of-the-box support for machines with hybrid graphics (two GPUs), where one GPU may not be capable of double precision support.
  • Fixed problems with viennacl::compressed_matrix which occurred if the number of rows and columns differed
  • Improved documentation for the case of multiple custom kernels within a program
  • Improved matrix-matrix multiplication kernels (may lead to up to 20 percent performance gains)
  • Fixed problems in GMRES for small matrices (dimensions smaller than the maximum number of Krylov vectors)

New in ViennaCL 1.1.2 Beta (Sep 15, 2011)

  • Fixed a bug with partial vector copies from CPU to GPU
  • Corrected error estimations in CG and BiCGStab iterative solvers
  • Improved performance of CG and BiCGStab as well as Jacobi and row-scaling preconditioners considerably
  • Corrected linker statements in CMakeLists.txt for MacOS
  • Improved handling of ViennaCL types (direct construction, output streaming ofmatrixand vector-expressions, etc.).
  • Updated old code in the coordinate matrix type and improved performance
  • Using size_t instead of unsigned int for the size type on the host.
  • Updated double precision support detection for AMD hardware.
  • Fixed a name clash in direct solve.hpp and ilu.hpp
  • Prevented unsupported assignments and copies of sparse matrix types

New in ViennaCL 1.1.1 Beta (Sep 15, 2011)

  • This new revision release has a focus on better interaction with other linear algebra libraries.
  • The few known glitches with version 1.1.0 are now removed.
  • Fixed compilation problems on MacOS X and OpenCL 1.0 header files due to undefined an preprocessor constant
  • Removed the accidental external linkage for three functions
  • New out-of-the-box support for Eigen [3] and MTL 4 [4] libraries. Iterative solvers from ViennaCL can now directly be used with both libraries.
  • Fixed a problem with GMRES when system matrix is smaller than the maximum
  • Krylov space dimension.
  • Better default parameter for BLAS3 routines leads to higher performance for matrixmatrix- products.
  • Added benchmark for dense matrix-matrix products (BLAS3 routines).
  • Added viennacl-info example that displays infos about the OpenCL backend used by ViennaCL.
  • Cleaned up CMakeLists.txt in order to selectively enable builds that rely on external libraries.
  • More than one installed OpenCL platform is now allowed

New in ViennaCL 1.1.0 Beta (Sep 15, 2011)

  • The completely rewritten OpenCL back-end allows for multiple contexts, multiple devices and even to wrap existing OpenCL resources into ViennaCL objects. A tutorial demonstrates the new functionality.
  • The tutorials are now named according to their purpose.
  • The dense matrix type now supports both row-major and column-major storage.
  • Dense and sparse matrix types now now be filled using STL-emulated types (std::
  • vector< std::vector > and std::vector< std::map< unsigned int,
  • NumericT> >)
  • BLAS level 3 functionality is now complete. We are very happy with the general outof-
  • the-box performance of matrix-matrix-products, even though it cannot beat the extremely tuned implementations tailored to certain matrix sizes on a particular device yet.
  • An automated performance tuning environment allows an optimization of the kernel parameters for the library user’s machine. Best parameters can be obtained from a tuning run and stored in a XML file and read at program startup using pugixml.
  • Two now preconditioners are now included: A Jacobi preconditioner and a row-scaling preconditioner. In contrast to ILUT, they are applied on the OpenCL device directly.
  • Clean compilation of all examples under Visual Studio 2005 (we recommend newer compilers though).
  • Error handling is now carried out using C++ exceptions.
  • Matrix Market now uses index base 1 per default
  • Improved performance of norm X kernels.
  • Iterative solver tags now have consistent constructors: First argument is the relative tolerance, second argument is the maximum number of total iterations. Other arguments depend on the respective solver.
  • A few minor improvements here and there

New in ViennaCL 1.0.5 Beta (Sep 15, 2011)

  • Added a reader and writer for MatrixMarket files
  • Eliminated a bug that caused the upper triangular direct solver to fail on NVIDIA
  • hardware for large matrices (thanks to Andrew Melfi for finding that)
  • The number of iterations and the final estimated error can now be obtained from iterative solver tags.
  • Improvements are included in the developer converter script (OpenCL kernels to C++ header)
  • Disabled the use of reference counting for OpenCL handles on Mac OS X (caused seg faults on program exit)

New in ViennaCL 1.0.4 Beta (Sep 15, 2011)

  • All tutorials now work out-of the box with Visual Studio 2008.
  • Eliminated all ViennaCL related warnings when compiling with Visual Studio 2008.
  • Better (experimental) support for double precision on ATI GPUs, but no norm 1,
  • norm 2, norm inf and index norm inf functions using ATI Stream SDK on GPUs
  • in double precision.
  • Fixed a bug in GMRES that caused segmentation faults under Windows.
  • Fixed a bug in const sparse matrix adapter
  • Corrected incorrect return values in the sparse matrix regression test suite

New in ViennaCL 1.0.3 Beta (Sep 15, 2011)

  • Support for multi-core CPUs with ATI Stream SDK
  • inner_prod is now up to a factor of four faster
  • ETH,for pointing the poor performance of the old implementation out)
  • Fixed a bug with plane_rotation that caused system freezes with ATI GPUs.
  • Extended the doxygen generated reference documentation

New in ViennaCL 1.0.2 Beta (Sep 15, 2011)

  • A bug-fix release that resolves some problems with the Visual C++ compiler.
  • Fixed some compilation problems under Visual C++ (version 2005 and 2008).
  • All tutorials accidentally relied on ublas. Now tut1 and tut5 can be compiled without ublas
  • Renamed aux/ folder to auxiliary/ (caused some problems on windows machines)

New in ViennaCL 1.0.1 Beta (Sep 15, 2011)

  • Fixed a bug in lu substitute for dense matrices
  • Changed iterative solver behavior to stop if a certain relative residual is reached
  • ILU preconditioning is now fully done on the CPU, because this gives best overall
  • performance
  • All OpenCL handles of ViennaCL types can now be accessed via member function handle()
  • Improved GPU performance of GMRES by about a factor of two.
  • Added generic norm 2 function in header file norm 2.hpp
  • Wrapper for clFlush() and clFinish() added
  • Device information can be queried by device.info()
  • Extended documentation and tutorials