RapidMiner Studio Changelog

What's new in RapidMiner Studio 10.3.0

Nov 2, 2023
  • Features:
  • Interactive Decision Tree: Added way to switch between alternative splits when more than one split is possible for the selected node. The splits are cycled through in order of significance for the tree based on the selected measure, from most to least significant.
  • Interactive Decision Tree: Added button in the tree UI to show the split report for the selected split.
  • Interactive Decision Tree: Improved minimap by consolidating control buttons and allowing to hide the minimap
  • Enhancements:
  • Added header row parameter to the Read Excel and Read CSV operators
  • Improved import wizard behaviour w.r.t header row and starting row handling
  • Interactive Decision Tree: Find Split will now overwrite an existing split
  • The open process tracks now the repository changes to prevent losing or corrupting processes
  • CSV import now supports quoted multiline text via an optional flag
  • CSV import has improved date format input field handling
  • Improved the user error messages for fatal expression exceptions
  • Changed HSQLDB default URL prefix to point to the server (jdbc:hsqldb:hsql://)
  • Auto Model: Removed inactive deployment option in results overview
  • Bugfixes:
  • Fixed an issue that caused process files to become corrupted when using certain emojis in parameters
  • Fixed an issue which could break displaying Chinese or Japanese symbols at certain places in the UI
  • Fixed an issue that could lead to having another view active than selected. This happened when a broken view (corrupted file) was selected. Broken views are now deleted on start-up.
  • Database connections for databases which use nvarchar as string column type can now properly create tables
  • Fixed Ingres JDBC driver class default
  • Fixed an issue that could prevent right click on operators when older extensions were installed.
  • Fixed ANOVA Matrix result cell coloring to be in line with the description (colored cells are below the significance threshold)
  • Removed warning message in log during first startup in relation to missing recent data sets file
  • Auto Model: Fixed issue that could cause Auto Model to not show Prediction details for each individual model
  • Interactive Decision Tree: Fixed issue which could cause the node order to switch when editing splits in the tree
  • Interactive Decision Tree: The Preserve Existing Split option in the auto-grow dialog now correctly keeps existing splits below the selected node untouched

New in RapidMiner Studio 10.2.0 (Aug 16, 2023)

  • Features:
  • Added user interaction after a project was cleaned up on AI Hub
  • Ignore it and keep project disconnected
  • Overwrite local version by clean check out from AI Hub
  • Archive local changes and then overwrite local version as above
  • Added Delete Amazon S3 Resource operator
  • Enhancements:
  • Migrated Generate ID and Split Data operators to the new Belt data core, future-proofing them and improving their speed.
  • Added new setting in the preferences to control whether RapidMiner Studio should favour speed over memory footprint or vice-versa. It can be changed to reduce memory footprint while trading runtime if memory is critical. The setting can be found under System and is called Memory Management.
  • Further reduced start-up time of RapidMiner Studio:
  • Introduced lazy loading of operators
  • Improved utilization of operator signature cache
  • Introduced shallow plugin initialization
  • Added repository web action to go to deployment endpoints
  • Improved error recovery and error messages for date_parse_str function of the expression parser.
  • Trailing white spaces are no longer treated as errors in the expression parser.
  • Improved opening URL experience on certain Linux distributions which do not support triggering browsing programmatically
  • The Correlation Matrix operator now uses the new and improved subset selector
  • Bugfixes:
  • Fixed broken error messages in the Edit Expressions dialog that operators like Generate Attributes use to display the expression parser
  • Removed deprecated Stream Database operator (deprecated since version 7.5, six years ago)
  • Fixed bug in data splitting code that prevented empty partitions in some cases.
  • Fixed Synchronize Meta Data with Real Data not working even though it was selected. The selection is now remembered after restart.
  • When Synchronize Meta Data with Real Data is activated and the process has been run, Read operators like Read Excel and Read CSV remember the real metadata even if another operator is added to the process.
  • Fixed parameters stay above value and stay below value of Prescriptive Analytics operator
  • Fixed a possible concurrency issue when writing json IOObjects in parallel
  • Fixed potential access denied error for Read Azure Data Lake Storage Gen2 operator when reading larger files
  • Development:
  • Added com.rapidminer.repository.recent.RecentDataManager to allow global access to the recently used data sets. It comes with a listener mechanism and currently keeps track of data opened in the Results view, as well as used in the Interactive Decision Tree wizard.
  • Removed deprecated classes and methods pertaining to the old concept of managing Perspectives (including MainFrame#getPerspectives())
  • Added DeveloperTools#shouldDeveloperToolsBeShown() to allow for an easy way to check whether you want to offer developer tools of some capacity when appropriate
  • Fixed bug that caused TableMetaData#columns() to return a meta data sub-table with random column order
  • Fixed bug when registering IOObjects from operator signature
  • Plugins now properly also look up resources like icons from the default com/rapidminer/extension/resources path. The old additional lookup for com/rapidminer/resources is kept for compatibility reasons.
  • Deprecated: SwingTools#addIconStoragePath(String), it never worked

New in RapidMiner Studio 10.1.2 (Mar 23, 2023)

  • Enhancements:
  • Excel .xlsx file import is significantly more robust now. This should mean that almost any Excel file can be read successfully now. This applies to both the GUI import wizard and the Read Excel operator.
  • Added link to onboarding dialog to enable using Altair Units license.
  • Improved error message if infinite values are in the data set when displaying a histogram visualization.
  • Details of error messages during process execution now also show up in the log.
  • Altair Units License:
  • Added default limit of 8 logical cores. If you want to utilize more cores, please increase the number in the License tab of the Settings dialog.
  • Added setting to reserve threads for background execution. These count against the logical core limit.
  • Bugfixes:
  • Fixed multiple issues with Auto Model result saving and re-opening
  • Fixed an issue that prevented disconnecting from projects
  • Fixed EULA in Windows installer being displayed with strange characters.

New in RapidMiner Studio 10.0.0 (Nov 8, 2022)

  • Features:
  • RapidMiner Studio now finally uses Java 11 as opposed to Java 8!
  • AI Hub X now also uses Java 11, and as a consequence, RapidMiner Studio X cannot connect to AI Hub 9 or earlier! Both Studio and AI Hub need to be upgraded to version 10!
  • Windows & OS X users will get the updated Java runtime automatically, but Unix users (or anyone using the platform independent release) need to provide Java 11 manually for running Studio.
  • Some extensions might no longer work with Java 11 and require an update, please check the Marketplace for updates.
  • Visualizations: Added ability to sort results when using aggregations in all charts where it makes sense. Sorting can be ascending/descending either on the aggregated result value, or the aggregation column name.
  • Time Series: Added the Windowing Model as a preprocessing model for the Windowing operator.
  • The model can be used to apply the configured windowing operation on any data set (having the same columns) by using Apply Model operator
  • The model can be grouped together with other models using the Group Model operator.
  • Cloud Connectivity: Added Google Drive operators to read, write, delete and loop files, as well as create folders.
  • Connectivity: Added Snowflake as a first-class citizen for database connections
  • Enhancements:
  • Added preprocessing model to Pivot operator
  • Improved High-DPI scaling on Windows
  • The tooltip for date-time entries in the result view now shows the time-stamp in ISO format (including potential nanoseconds)
  • Copy&pasting data from date-time cells is now consistent with what is displayed in the precise tooltip
  • Added setting to disable repository indexing for searching altogether via the Enable repository search indexing setting. This can be used for very large repositories or ones behind a slow network drive or when a virus scanner is involved
  • Time Series: Added the parameter sort time series to all time series operators where an indices column is mandatory or optional
  • If selected the input time series is automatically sorted before the time series operation is applied. The output of original ports will also contain the sorted data set.
  • Time Series: Improved UserError for indices attributes which are not sorted or has non-unique values
  • Bugfixes:
  • Fixed a problem where collections with empty sub-collections might not be readable
  • Fixed problems with empty (sub-) collections not being readable
  • Fixed problems with repeatedly extracting collections because of an incorrectly set timestamp
  • Fixed the storage of the LFS & editable flags in the repositories.xml file for projects
  • Fixed a problem where collections with empty subcollections might not be readable
  • Fixed issue that could cause an error when a better license was installed automatically
  • Fixed creation of new Google Cloud Services connection after the recent Google OAuth flow changes

New in RapidMiner Studio 9.10.10 (Jul 14, 2022)

  • Enhancements:
  • Improved IOObject creation time by making meta data creation in file system repository and projects rely more on in-memory data
  • Getting remote project updates should no longer fail due to a MERGING_RESOLVED error
  • Added new method to create copy of polynominal mapping
  • Improved performance of Obfuscate and Deobfuscate operators
  • Bugfixes:
  • Fixed an issue that caused an unauthorized rc=401 error when trying o create a new snapshot in a project
  • Fixed an issue that could make storing or retrieving legacy IOObjects in new local repositories and projects extremely slow

New in RapidMiner Studio 9.10.8 (May 3, 2022)

  • Enhancements:
  • Dropbox connections now use TLS1.2 for secure transport
  • Rebranded RapidMiner Studio

New in RapidMiner Studio 9.10.7 (Apr 14, 2022)

  • Bugfixes:
  • Fixed an issue with storing data in context locations for some java versions
  • Fixed problems with De-Obfuscate operator if an attribute name or attribute value contained special characters or whitespaces
  • Fixed possible deadlock at startup of 9.10.6

New in RapidMiner Studio 9.10.6 (Mar 28, 2022)

  • Enhancements:
  • Fixed a memory & file leak when using large numbers of repeated JDBC connections
  • Visualizations: Added options to customize Wordcloud word orientations
  • Visualizations: Added Jamaica to the map collection
  • Updated postgres JDBC driver to version 42.3.2
  • Added skip inaccessible parameter for Loop Files to skip inaccessible files/directories, instead of a silent failure. If unchecked, the operator does not loop at all and will throw a proper error.
  • Stopping Loop Files is now always possible in a timely manner, even if you selected a directory with millions of files.
  • Updated H2 DB library due to security advisory
  • Added new parameter fitting error handling to the ARIMA Trainer operator.
  • In case of a fitting error during training, either a proper error is thrown or a fallback Default Forecast Model is provided.
  • Removed meta data warning for number of parameters is too large for the ARIMA Trainer operator.
  • Added new option to Amazon S3 connections that allows for much more flexible authentication schemes, like credential profiles and IAM roles.
  • Bugfixes:
  • Fixed character corruption issue with Read Database and Execute SQL when reading a query via a file from disk on certain operating systems
  • Fixed a memory leak when using database connections
  • Fixed a general file leak when using connections
  • Fixed a problem when creating dynamically suffixed attributes through the AttributeFactory in parallel
  • Fixed side effects for models when executing in parallel
  • Fixed an issue in projects that could sometimes cause Execute Process or Retrieve operators within parallel loops or similar setups to fail with an error message like "Cannot retrieve 'entry', it does not exist"
  • Fixed an issue that could sometimes cause Execute Process operators within parallel loops or similar setups to fail with a error messages like "Cannot connect to the RapidMiner AI Hub repository '_LOCAL'" when running on an AI Hub legacy repository
  • Fixed a wrong error, which was thrown during Apply Forecast when a Multiply operator was used on the Holt-Winters model
  • Fixed calculation errors for Holt-Winters models with additive seasonality

New in RapidMiner Studio 9.10.1 (Oct 25, 2021)

  • Enhancements:
  • Improved potential bias detection by producing less false positives
  • Added further explanations in the bias warning tooltip to help educate users better about why it occurred - and what can be done to mitigate the problem
  • Replaced DBSCAN operator by new version
  • Deprecated Expectation Maximization Clustering operator
  • Improved/minimized operator instantiation for documentation/search, leading to a reduced startup time
  • Bugfixes:
  • Fixed metadata of Apply Model in rare cases
  • Fixed wrong results after applying the Single Rule Induction model in case of a different ordering of the columns
  • Single Rule Induction model can now be stored in the repository
  • Fixed wrong results after applying the Subgroup Discovery model in case of a different ordering of the columns
  • Fixed table capability store/retrieve in signatures
  • Fixed wrong URL when opening the link in project connections when using AI Hub vault injections
  • Time Series: Fixed a bug in Process Windows which caused an Exception for input data which has long gaps and if the parameter "empty window handling" is set to skip
  • Time Series: Fixed a bug in Holt-Winters when the input data contains a section with 0 as values, or if every n.th value in 0 (with n being the period).
  • section with 0 as values will be ignored in the smoothing of the seasonal component in holt-Winters
  • every n.th value is 0 (with n being the period) will result in an UserError for the multiplicative seasonality model

New in RapidMiner Studio 9.10.0 (Aug 12, 2021)

  • Features:
  • Added Function Fitting operator that can optimize parameters in a function of the attributes to fit the label. It can be used to create an optimal function to fit the data points in your data.
  • Bias Awareness: if the use of a specific column is more likely to add unwanted bias to your models, it is highlighted as such. This happens in various places such as in the Statistics view of data, the model simulator, in Turbo Prep, in Auto Model, during model training, in model annotations among others.
  • Enhancements:
  • The De-Normalization operator has a new parameter to also de-normalize predictions.
  • Based on attribute name: prediction(abc) tries to use de-normalization of abc if no explicit de-normalization available
  • The label (or other special attributes) can be included in normalization already in the normalize operator. The changes allow for multiple prediction attributes to be affected
  • Added date format parameter to Write CSV in case format date attributes is selected
  • Improved performance of Append operator
  • Handled yet another case of JDBC drivers ignoring the JDBC standard gracefully (here: Infor Data Lake DatabaseMetaData#getTypeInfo())
  • Introduced operator signatures to improve the startup of Studio
  • Signatures contain meta information that is used in operator registration, global search setup and documentation browser display
  • Signatures are persisted between starts for an improved startup time
  • Signature persistence can be configured or cleared with the setting System -> Local File Cache -> Keep Operator Signatures
  • Time Series: Enabled the usage of constant values for the replace types in the Equalize Numerical Indices and Equalize Time Stamps operators
  • The operators can now be used to fill gaps in non-equal data sets with constant values
  • Time Series: All Time Series operators (except for Multi Horizon Forecast, Multi Horizon Performance) now working with Belt IOTable (as in- and output)
  • Bugfixes:
  • In rare instances, operator parameters did not get saved correctly if a default value was set for it. This e.g. affected date parameters used in extensions.
  • Generate Attributes max and min functions do now always return missing value if any of the values is missing.
  • Fixed missing operator help for Azure Blob Storage and Data Lake Storage operators

New in RapidMiner Studio 9.9.0 (Mar 24, 2021)

  • New Features:
  • Data is the central piece in any RapidMiner process. The way RapidMiner internally deals with data has fundamentally changed in this release with the new Data Core (codename Belt). Its new columnar table representation provides a quantum leap in processing speed and memory efficiency for RapidMiner processes. Multiple operators already use it internally and it becomes fully available now for extension developers to create fast and efficient operators.
  • Added a Set Positive Value operator for the new Data Core which can make nominal attributes binominal or change the positive value of binominal attributes
  • Enhancements:
  • Replaced the Rename by Example Values operator by a new and improved version
  • Replaced the Rename operator by a new one that can additionally handle a renaming dictionary
  • Replaced the Sort operator by one that can sort by multiple attributes (currently already part of the Operator Toolbox extension)
  • Improved the FP-Growth operator so that it only works with explicitly defined positive values (either via binominal attributes or the positive value parameter) for items in dummy coded columns
  • Improved memory consumption of Cross Validation in certain circumstances
  • The operators Read CSV and Read Excel were improved to use the new data core
  • Pivot now supports Least and Mode aggregations for numerical attributes as well
  • Annotate now adds the annotations to the meta data as well
  • Added warning when trying to run a process on an AI Hub with a lower feature version than the current Studio version
  • Added a reason when displaying incompatible extensions in the dialog after startup to show why an extension failed to load. Details available via tooltip.
  • Upgraded integrated Chromium to version 84
  • Improved some metadata transformation w.r.t. nominal value sets
  • The splashscreen no longer shows duplicate extension icons during startup if more than one copy of an extension is installed
  • Visualizations now also support Least and Mode aggregations for numerical attributes
  • Improved concurrent execution in some corner cases
  • Deprecated the Exchange Roles operator
  • Model viewer for Gradient Boosted Tree models now respects the Number format settings in Studio preferences
  • Auto Model uses new clustering algorithms which no longer require one-hot encoding on the data set and therefore reduce the memory footprint for data sets with nominal columns with many values. As a result, users can no longer specify the minimum number of clusters in the X-Means case (automatic determination of the optimal number of clusters). The minimum is now fixed at 2.
  • Time Series: Added the option to ignore invalid values to the Moving Average Filter operator: Invalid values (missing, positive and negative infinity are now ignored when calculating the filtered value
  • This also results in valid values at the beginning and end of the filtered time series
  • As the Classic Decomposition and the Function and Seasonal Component Forecast are based on the Moving Average Filter, the also have now the "ignore invalid values" option
  • Bugfixes:
  • Fixed Data Table reading/writing when LFS light checkout is enabled
  • Fixed a problem where an uncaught exception could go through when using date/time attributes with values in the far future/past
  • Fixed an uncaught exception that could happen when the process run via Execute Process failed, the user opened it via the popup and ran it directly after fixing the problem
  • Fixed wrong attribute weights for Random Forest regression
  • Fixed error in Store operator when used after application of k-Means model
  • Fixed issue that Save dialogs did not accept any selection if a wildcard (.*) filter was provided (e.g. for Write Document)
  • Fixed Pivot meta data column names not matching the real data
  • Fixed missing text for the file restoring confirm dialog in projects
  • Fixed an issue that could cause Studio startup to silently fail
  • Fixed a possible error during startup w.r.t port preconditions on some operators
  • Fixed a bug that could cause project creation to not show an error and appear to do nothing
  • Removed check for preprocessing models in model deployments for custom models. This has been causing certain grouped models to fail if they contained models which have technically been not preprocessing models (e.g. PCA).
  • Time Series: Fixed a bug for the Lag operator, which caused original data to be changed at preceding ports as well
  • Time Series: Fixed some small errors in the description of two tutorial processes for Sliding Window Validation
  • Time Series: Fixed an error, which occurs in time-based windowing, when the end of the last window is equal to the last timestamp in the input data. This effects all windowing operators (Windowing, Process Windows, Forecast Validation, Sliding Window Validation).
  • Cloud Connectivity: File browser now adds the correct path separator character on Windows, and resolves macros properly for AWS, Azure, and Google Cloud file operators

New in RapidMiner Studio 9.8.1 (Dec 3, 2020)

  • New Features:
  • Added new operators to delete data from Azure Cloud:
  • Delete Azure Blob Storage Resource
  • Delete Azure Data Lake Storage Resource
  • Delete Azure Data Lake Storage Gen2 Resource
  • Enhancements:
  • All Loop cloud operators (e.g. Loop Amazon S3, Loop Azure Blob Storage, etc) now only download a file when another operator reads its content. The memory footprint may also decrease by 50%, and unnecessary writes to the disk are avoided.
  • Bugfixes:
  • Continue RapidMiner Studio start if proxy discovery fails
  • Added missing Cluster attribute to metadata when applying a KMeans model via Apply Model
  • Fixed a regression in Generalized Linear Model (GLM) model training. It again accepts weighted training data
  • Auto Model Clustering showed incorrect results, ignoring training data normalization and attribute reordering
  • Fixed AbstractMethodError when using very old JDBC drivers (built for Java 6 and earlier) to connect to SQL databases
  • Fixes inconsistent parameter order and two unused parameter displayed in parameter panel of Loop Google Storage
  • Fixed result view in open source version
  • Time Series: Fixed spelling errors in help texts
  • Time Series: Fixed missing indices attribute in the meta data of Apply Forecast, if a Function and Seasonal Component Forecast model is used
  • Fixed an issue that could cause connection tests to AI Hubs running behind a federated login via KeyCloak to not properly declare credentials as invalid but instead return a weird error message.

New in RapidMiner Studio 9.8.0 (Oct 14, 2020)

  • New Features:
  • Utilize AI Hub 9.8 support for large files in Projects. Files with more than 10MB and stored ExampleSets are automatically handled to be versioned as expected, but stored more efficiently. This is backed by Git LFS, which means Python or R coders can continue to easily work with these projects as long as they have the Git LFS extension installed.
  • Time Series Windowing Update:
  • Added time based (window parameters are specified in time units) and custom windowing (start and stop values of the windows are provided by an additional example set) for all windowing operators (Windowing, Process Windows, Forecast Validation, Sliding Window Validation)
  • Added a few more parameters: expert settings (couples a few expert parameters into not shown, if it is not selected), windows defined (specifies from which point windows are defined), empty window handling
  • Changed the computation of the final model for the Forecast Validation and Sliding Window Validation operators to compute the model on a final window with the same size as the training windows and which ends at the last example of the input series
  • Time Series: Added new aggregation methods (median, maximum, minimum, standard deviation, variance) to Moving Average Filter
  • Cloud Connectivity
  • Added connectivity to Azure Data Lake Storage Gen2:
  • Read Azure Data Lake Storage Gen2
  • Loop Azure Data Lake Storage Gen2
  • Write Azure Data Lake Storage Gen2
  • Enhancements:
  • H2O:
  • New operator: K-Means (H2O), which implements K-Means clustering using the bundled H2O library. Key features include:
  • Estimate the optimal value of k, when a good initial guess is not available from the user
  • Built-in standardization and nominal encoding
  • Quick and memory efficient execution
  • Note: estimate k is strongly preferential to low k values. Make sure to double check results and if they are in line with expectations.
  • Newly created repositories and projects are now by default stored in the current users "Documents" folder. The location continues to be customizable on repository / project creation
  • When opening a process or RapidMiner file using "Open with..." RapidMiner Studio, the process will be loaded from the repository registered for the path. Process files that are not stored in a repository will be imported just like the menu item "Import Process" would
  • IOObject collections are now stored in a new, zip-based file format, ending with .collection
  • Incorporated a new library to better make use of system proxy settings if "system" is selected in the preferences, especially w.r.t. Windows and WPAD/PAC files. This will drastically improve the experience in complex corporate network setups
  • HTML5 safe mode is now way more performant
  • Upgraded Chromium binaries to version 79
  • Improved error message for remote repository creation (central AI Hub repository and projects) when the authentication is mismatched (user/password vs SSO)
  • Added Settings option to optimize internal file browser for mapped network drives
  • Time Series: Moved Moving Average Filter into the Transformation operator group and removed the obsolete Filter operator group
  • Time Series: Reordered the output ports of the Multi Label Performance and Multi Horizon Performance operators
  • Bugfixes:
  • Fixed wrong metadata after renaming in the new repositories and then creating a new entry with the previous name
  • Fixed rare issues that could cause problems when trying to view Visualizations on certain machines
  • Fixed Mixed Euclidean Distance for nominal values and Nominal Distance
  • A JNA library on the Windows PATH no longer results in an error
  • Fixed issue that could cause charts in the Deployments view to not show up.
  • Fixed problem that caused the legacy smtp password setting in the Preferences dialog to become broken when the dialog was saved more than once after changing the value. Note that this setting is not recommended anymore, use the new Send Mail connection instead.
  • Fixed a similar problem with the legacy connection UI encrypting passwords and tokens multiple times
  • Auto Model Results calculated on AI Hub can now be opened via Results view after the folder with all results has been moved/copied
  • Upgraded bundled JRE to 8u265
  • Deployments keep working now after the Server repository has been renamed
  • Fixed a problem where unsigned extensions could not make use of the new connection objects inside operators
  • Fixed potential IllegalArgumentException in Google Storage operators when running on Server
  • ExampleSets with huge nominal values can be retrieved again from the repository
  • Time Series: Fixed a bug in Equalize Time Stamps which caused an infinite loop in some cases when the calendar time was set to 'domain' and the input data consists of already partwise equidistant time stamps

New in RapidMiner Studio 9.7.2 (Aug 4, 2020)

  • Enhancements:
  • Send Mail once again allows multiple comma-separated recipients, and uses UTF-8 encoding again as opposed to UTF-16
  • Added a confirm dialog before repository folders get moved to avoid accidents while dragging the mouse
  • Improved loading of meta data in the repository tooltip
  • Bugfixes:
  • The one class LibSVM can now handle labels with more than one value
  • Fixed rare issue where a file browser might trigger a crash on startup
  • Fixed parameters being less tall than they used to be
  • Made submission of Auto Model processes to Server available for older Server versions again (9.3+)
  • Fixed CSV and XML import wizards not releasing files in some cases
  • Metadata of Join now matches the actual result
  • Fixed an issue that could sometimes cause the connection to the AI Hub repository or a project to drop after a while and show the error "authentication cancelled by user" when using Enterprise Login
  • Trying to connect to a project that already exists no longer destroys the connections inside the existing project
  • Fixed rare error when storing collections
  • Fixed meta data loading loop in auto model and model ops
  • Fixed problem that prevented URLs from being opened on Linux
  • The dates generated by Generate Sales Data are now all at 00:00:00.000 GMT

New in RapidMiner Studio 9.7.0 (Jun 3, 2020)

  • New Features:
  • Added versioned projects which are tied to RapidMiner ServerYou can have as many versioned projects as you like, no limits! The versioning is backed by Git and can be accessed by any regular Git clientsThis means sharing between Python/R coders and RapidMiner users has never been easier!
  • Added dialog to select which version of a file to keep in case of a conflict in the versioned projects while getting Snapshots from Server.Versioning happens on a project levelAs you can now have as many projects as you like, this is the most sensible behavior because most of the time many entries are interconnected in a projectThus the entire state is saved and can be later restored, without having to worry about dependency versions.
  • Projects support ALL files you may have on your computer! You can put your .py scripts, your .md files, your .png files, your .pdf files, etc all into a projectIt will be neatly displayed in RapidMiner Studio.
  • Of course, all those files can be versioned together, so RapidMiner users and Python coders can share the same git repositoryThe Python coders can even use their native Git client to do so, no magic requiredThis will make collaboration between RapidMiner users and Python coders easier than ever before!
  • Processes in versioned projects can also be run and scheduled on RapidMiner Server as they can for an existing Server central repository
  • All the files live locally on your computer, but are also shared via GitThis gives you the performance of a local repository when working with it during prototyping, but also allows for easy collaboration with your colleagues.
  • Added new panel "Snapshot History" which allows to browse the history of your versioned projects, as well as see the changes you've made since the latest snapshotIt can also be used to restore an earlier state of the project, view past versions of individual files, and to restore those past versions.
  • ExampleSets are now written to disk in a new file format: HDF5This is a well-established format used e.gby the NASA to store large amounts of dataThis also means that Python and RapidMiner Studio can exchange data via HDF5 files much more easily and faster than ever before.
  • Local repositories that will be created with RapidMiner Studio 9.7 or later can also take advantage of supporting all files you may have on your computer (.py, .jpeg, .pdf, etc).
  • New operator Target Encoding which can remove nominal attributes with too many values and performs a target encoding (also known as mean encoding) on the remaining attributes
  • Auto Model: some processes (e.gSVM, FLM, or weight calculations) now use the new Target Encoding instead of one-hot encoding which reduces memory usage and run times
  • Time Series: New operator Integrate to integrate time series with different methods (cumulative sum / left and right riemann sum / trapezoidal rule)
  • Enhancements:
  • Both local repositories and versioned projects (tied to RM Server) have been completely rebuilt to get rid of many old limitationsBenefits include:
  • Enhanced throughput and performance
  • Better meta data caching
  • Concurrent access support
  • Displaying all files (no matter what they are, e.gPython scripts, images, ...)
  • Allowing different file types (e.gdata, processes) and folders to share the same name
  • Note: Your existing local repositories have (Legacy) after their name, indicating they still run on the old technology and still have some of the limitations! If you create a new local repository, it will have (Local) after its name and have all the capabilities listed aboveYou can copy your data over via Studio from the old repository to a new one to migrate.
  • It is now possible to have a folder with the same name as a data entry in the repository (might not work for some old repositories)
  • It is now possible to have a process and a data entry with the same name in the repository (might not work for some old repositories)
  • Replaced Send Mail operator with new version which supports file attachments
  • Improved memory usage for Aggregate and Pivot operators for nominal columns with potentially a lot of unused values
  • Improved dealing with whitespaces in repository entry names
  • Improved cleanup of temp files, to reduce disk space clutter when Studio runs for a long time, i.ein a Server environment
  • Made log tables in Result View behave more like other results, adding more actions and a shortcut to the context menu
  • Process background images are now using a relative path to the image if possible, instead of an absolute pathThis only applies for background images set from now on, it does not work retroactively
  • For binominal attributes the Statistics tab shows the positive and the negative value
  • Renamed RapidMiner Server to RapidMiner AI Hub
  • Opening/Moving the Process panel into the foreground when opening a process while in the Design view to make it more obvious something happened
  • Auto Model: remote executions on Server require the central repository as storage location
  • Turbo Prep: only local file based repositories can now be used as temporary repositories for the handover to Auto Model
  • Model Ops: only local repositories or central Server repositories can be used as storage locations for deployed models (also known as "deployment location")
  • Model Ops: keep unused and ID columns in the results after scoring
  • The operators Explain Predictions and Model Simulator now also support grouped models where arbitrary models have been grouped instead of only preprocessing models
  • The operator Explain Predictions now offers a parameter to limit the number of important features also for the "importances" output
  • Time Series
  • Added options to use padding for Fast Fourier Transformation and calculate the frequency of the amplitude value.
  • Added the option to specify negative lags for the Lag operator
  • Added the option to specify a default lag for a set of attributes (selected by an attribute subset selector) to the Lag operator
  • Unfortunately due to parameter key incompatibilities, old version of the Lag operator is deprecated and new version with the same name, but different operator key is added.
  • H2O
  • Updated H2O library to version 3.30.0.1.
  • Added monotonicity constraints to Gradient Boosted Trees
  • Added weights port to Deep Learning
  • Expanded whitelist of accepted expert parameters, now supports all parameters provided by H2O
  • Deep Learning and Logistic Regression now work with datasets that have nominal columns with only one value
  • Bugfixes:
  • Fixed an issue that could cause Studio startup to never complete
  • Made Studio startup more rigid to quit process instead of silently hanging on the splash screen forever
  • Fixed issue that could cause panels to sometimes not open if they had been closed previously in this session
  • Fixed an issue that caused CTAs not working when HTML5 safe mode was enabled
  • Fixed an issue with back propagation of changes to performance vectors
  • Fixed a problem for JDBC drivers that do not implement a certain set of functionality by adding a fallback (e.gSQLite writing)
  • Fixed potential cause for complete UI freeze when interacting with a CTA notification banner
  • Fixed an issue with process navigation and property panel if operator names contain HTML
  • Generate Multi-Label Data does now correctly work in non-regression mode
  • Fixed memory leak caused by the Visualizations
  • Fixed rare issue where data sets could not be downsampled automatically if license limit was exceeded
  • Fixed an issue in Automatic Feature Engineering if all input features have been nominal in the feature selection case
  • Fixed "Edit Access Rights" dialog for Server repositories not getting the permissions correctly when using Enterprise SSO
  • Fixed an issue that caused Studio to lag and increase memory consumption when using the right-click "Insert operator" popup menu in the Process panel.
  • Fixed broken replacing (instead it was duplicated) on move of data entries to a different repository
  • Auto Model: remote executions show new submission screens now which only allows the reset of Auto Model to load the results which avoids problems with multiple remote submissions within the same session
  • Auto Model: reordering the columns in the column selection table no longer lead to graphics problems
  • Time Series: Fixed a bug in Extract Peaks, that causes all "_position" features to have an offset of 1 to the Example number

New in RapidMiner Studio 9.6.0 (Feb 26, 2020)

  • New Features:
  • Added buttons for copying/pasting the active process to the process toolbar. To make some room for it, removed the "Fit to size" button from the process toolbar (it is already in right-click menu)
  • Equalize Time Series
  • Added two new operators (Equalize Numerical Indices and Equalize Time Stamps) which provide the functionality to equalize input time series. The output time series will have new equidistant index values. The operators provide different possibilities to configure the number of examples, the start value and the stop value and the step size of the new index values. The corresponding values of the output time series are computed by using a Replace Missing Values (Series) operation.
  • Equalize Numerical Indices: Equalize numerical indices into equidistant numerical indices with a numerical step size.
  • Equalize Time Stamps: Equalize date-time indices into equidistant date-time indices. Either with an exact duration (with millisecond precision) as the step size, or with a period (multiple of days, weeks, months or years) as the step size.
  • Peak Transformations:
  • Added two new operators (Z-Score Peak Transformation and Highest Peak Transformation) which perform a peak detection and transformation on time series. They detect peaks in a time series and add an indicator peak series (with the values -1,0,1 as peak flag values) and a peaked series (original values if a peak was detected, missing for non-peak areas).
  • Z-Score Peak Transformation: performs the peak detection by calculating the local mean and standard deviation and identifies values as peaks when they have a large deviation to this local mean
  • Highest Peak Transformation: performs the peak detection by dividing the time series in different areas and checking if local minima and maxima are valid peaks or only noise effects.
  • Peak Feature Extraction:
  • New operator Extract Peaks which performs a peak detection (by utilizing one of the new Peak Transformation operators and extracts features describing the peaks)
  • Added optional custom endpoint parameter to Amazon S3 connections. This enables you to use an S3 API compatible storage service other than Amazon S3.
  • Deployments / Model Ops:
  • All custom prediction models are now supported in model ops, i.e. models created with the Design view, in addition to Auto Model models
  • Grouped models are now supported as well which allows combinations of preprocessing models with a prediction model
  • Model Simulator in Deployments now uses raw data columns as input and performs data prep on the fly
  • Offer setting if scores should be explained (about 100x faster without), new deployments will have this disabled per default, existing deployments enabled
  • Show if scores should be explained in overview table
  • Model Ops initialization happens in background now – no longer blocking UI start of RM if a remote location is not available (anymore)
  • Some speed improvements for model ops (less objects are loaded from repos which makes things a bit faster for remote deployments
  • Model Simulator operator now also supports grouped models
  • Enhancements:
  • Connections to external data sources like Cassandra or MongoDB are now properly re-used (within reason) and closed when a process is finished. This should lead to less connections to an external data source when using loop constructs, as well as properly closed connections after a process if finished.
  • Windows and OS X builds now ship with OpenJDK (version 8u232)
  • Added new timezone parameter to JDBC connections. Note: date handling in databases (and generally) is a tricky subject, and there are quite a few ways to make mistakes while doing so. Some databases/JDBC drivers also don't implement date handling properly. Last but not least, keep in mind that a date_time/date is a fixed point in time, but when it is displayed in a more human readable format than "milliseconds since 01-01-1970 UTC", the display string is converting that instant to your display timezone. So even if for example a date is 13th of Jan in UTC, you may see 12th of Jan when viewing it in Australia, due to the display timezone offset. The actual point in time (milliseconds since 01-01-1970 UTC) however would be identical. See documentation for further information.
  • When parsing a string to time with Nominal to Date, the associated timestamp now represents that time on the 1st of January, 1970 instead of 1st of February 1970
  • Added Default User-Agent setting to Preferences / System
  • Updated MariaDB JDBC driver
  • You can now see which Java version is being used when looking at the "About" dialog
  • Improved meta data warning in case the time series attribute selection of time series operators is empty
  • Added option to autodetect S3 region in Amazon S3 connections
  • Improved Google Cloud Services connection UI
  • File chooser icons on OS X are now also supporting HiDPI
  • When removing a repository, the repository.xml file now gets updated immediately
  • Visualizations: Tick interval input field now allows to set much larger values for datetime axes as its using milliseconds as a unit to split the chunks
  • Updated the Step by Step In-Product Tutorial content
  • Added more search tags to various performance and aggregation operators
  • Improved error message when download/deserialization of data from a remote repository occurs
  • Improved error message when SSL certificate was invalid when attempting to connect to a RM Server repository.
  • Improved logging when trying to connect to a RM Server and unusual exceptions occur, e.g. more details about why SSL connection failed, what the network problem is, etc.
  • Bugfixes:
  • Fixed issue that could cause Studio to stop starting and be stuck at the splash screen forever.
  • Fixed an issue where storing datasets in a database using the automatically created primary key was not possible.
  • Declare Missing Value no longer crashes if the expression mode is selected and the expression itself returns a missing value. Instead, it will evaluate to false and thus NOT set a missing value for that row.
  • Fixed models and other IOObjects coming from extensions not being identified correctly in Server repositories.
  • Fixed Auto Model not being able to use results of a Join operator in some cases.
  • Fixed broken properties when storing data tables in rare cases.
  • It is no longer possible to create RapidMiner Server repositories with an invalid name.
  • Filter Examples now correctly resolves all macros in parameters, including in custom filter attribute names.
  • Fixed error that could sometimes cause result tables not being able to move to Auto Model via the button in the Results tab.
  • Fixed an issue that caused Visualizations to not appear on certain Linux systems.
  • Fixed file chooser icons on OS X.
  • Fixed bug for scoring in Deployments: if column types are incompatible, they are actually dropped now (which was documented as such but did not happen)
  • Auto Model will now be restored if the user cancels a deployment by closing the deployment dialog
  • Other:
  • It is no longer possible to create legacy connections and other connections which have been replaced with the new repository connection objects in RapidMiner 9.3. Existing connections can still be edited and used, but this functionality will be removed eventually as well. Make sure to migrate existing legacy connections to repository connection objects! See documentation for reference.
  • Development:
  • Added caching for connections based on ConnectionAdapterHandler to reduce connection count and give possibility to clean connections up after it is no longer needed (e.g. the process is finished).
  • GlobalSearch is no longer available in headless mode (aka command line, job container execution, etc)

New in RapidMiner Studio 9.5.0 (Nov 20, 2019)

  • New Features:
  • Added ability to upgrade RapidMiner Studio independently from Server. You can now connect to and access data and processes on older Server versions (9.0 and above) with any current or future Studio version! Processes and data are stored as-is on Server, which enables effective collaboration with your colleagues. However, you need to be aware that while you are able to store processes with brand-new operators on older Servers, you obviously can only run processes that consist of operators that the old Server knows about.
  • Deployments: Deployments can now be copied from one to another deployment location (for example from a test to a production server)
  • Enhancements:
  • Improved performance of Principal Component Analysis and Weight by PCA operators
  • The Import Data dialog now detects files with non-lowercase file extensions
  • Fixed view order for Deployments view
  • Visualizations: Fixed various issues that could cause Studio startup to fail
  • Visualizations: Fixed various issues that could cause them to not be displayed properly
  • Auto Model: Decision Tree and Random Forest are now using the latest (faster) implementations for regression problems
  • Auto Model: Increased the number of rows for which local explanations are turned on by default
  • Auto Model: Loading results from a folder are now adding them to the result list as well
  • Auto Model: Shows the total number of feature sets and generated features on the overview as well if automatic feature engineering has been turned on
  • Auto Model: The performance tab now shows the gain calculations based on the confusion matrix instead of the predicted data set
  • Auto Model: New deploy button in the overview table for each model
  • Auto Model: Clicking a model in the overview table will show the details for the selected model
  • Auto Model: Prevent another deployment while another deployment is currently performed
  • Turbo Prep: Nominal column handling is now consistent to the default behavior of Auto Model
  • Turbo Prep: Sort first join keys alphabetically instead of by ID-ness
  • Google Storage connection is now replaced by the more general Google Cloud Services connection that can connect to all supported Google services (Google Cloud Storage, Google BigQuery [requires In-Database Processing extension]). Just select the access scopes you want to use
  • Bugfixes:
  • Fixed all predictions being 0 in Transformed Regression when using no transformation and no z scale
  • Fixed the Import Data dialog failing when trying to read an XLSX file which did not have a lowercase file ending
  • Repository location chooser now opens as expected if the process is stored in a read-only repository
  • Visualizations: Exporting as PDF now also works without internet access
  • Visualizations: Fixed broken names in Czech Republic map
  • Time Series: Added error handling if the indices attribute was also selected as a time series attribute, or as the horizon attribute
  • Deployments: Fixed a bug which broke a local installation of Model Ops after the user connected to an existing remote location with email connection
  • Model simulator now works with date / time columns again
  • Development:
  • Improved exception handling when Belt tables cannot be converted to example sets

New in RapidMiner Studio 9.4.1 (Sep 26, 2019)

  • New Automated Model Ops:
  • Follow the fully automated data science path: prepare your data using Turbo Prep, create prediction models via Auto Model and finally put them into production with Model Ops.
  • Deploy the most promising models with one click and score new data via flexible web services or in the UI.
  • Track model performance on an intuitive dashboard and swap easily to the best performing one. Setup an email alert to get notified if a model outperforms the one in production.
  • Evaluate each model with respect to their financial impact instead of pure Data Science metrics.
  • Detect changes in data and their impact on model performance early to address problems.
  • Use our integrated dashboard to keep track of data drift and model performance.
  • New map visualizations:
  • Visualize geospatial data with the new map visualizations. You can choose from multiple map types with many different configuration options, as well as dozens of maps for geographic regions, continents, and countries. Available map types:
  • Choropleth maps: Used to display numeric values associated to regions (e.g. a country or a state) via a color gradient
  • New Automated Model Ops:
  • Follow the fully automated data science path: prepare your data using Turbo Prep, create prediction models via Auto Model and finally put them into production with Model Ops.
  • Deploy the most promising models with one click and score new data via flexible web services or in the UI.
  • Track model performance on an intuitive dashboard and swap easily to the best performing one. Setup an email alert to get notified if a model outperforms the one in production.
  • Evaluate each model with respect to their financial impact instead of pure Data Science metrics.
  • Detect changes in data and their impact on model performance early to address problems.
  • Use our integrated dashboard to keep track of data drift and model performance.
  • New map visualizations:
  • Visualize geospatial data with the new map visualizations. You can choose from multiple map types with many different configuration options, as well as dozens of maps for geographic regions, continents, and countries. Available map types:
  • Choropleth maps: Used to display numeric values associated to regions (e.g. a country or a state) via a color gradient
  • Categorical maps: Used to visualize regions that belong to a number of distinct categories
  • Point maps: These maps offer latitude and longitude support and display a marker for each coordinate on the selected map
  • Improved Auto Model
  • Auto Model features several improvements under the hood as well as a few more visible enhancements:
  • All predictive processes generated by Auto Model are now much cleaner, well-structured, and can be understood way easier.
  • Cost-sensitive learning has been added to show the costs / benefits in the validation result. This allows to solve problems (e.g. fraud detection) that involve highly imbalanced data sets (e.g. credit card transaction data).
  • New data prep and modeling capabilities:
  • Several new operators have been added to ease and enhance data preparation and machine learning:
  • New operators Replace All Missings, Handle Unknown Values, One Hot Encoding and Append (Robust) to easily prepare data for modeling and scoring.
  • New operator Rescale Confidences (Logistic) to rescale confidences even for classification with more than two classes.
  • New operator Cost-Sensitive Scoring: Novel approach for cost-sensitive learning which works for more than two classes.
  • New operators Multi Label Modeling and Multi Label Performance to train and validate a combined model for multiple label columns in a single step.
  • Enhanced time series forecasting:
  • New operators have been added for
  • Forecasting multiple horizons of a time series with any machine learning model (Multi Horizon Forecast)
  • Validating performance of multi horizon forecasts (Multi Horizon Performance)
  • Sliding window validation for time series data science problems
  • Enhanced data source connection framework:
  • All RapidMiner-supported connectivity extensions on the Marketplace now use the new data source connection framework, which includes handling connections to
  • MongoDB
  • Cassandra
  • Splunk
  • Solr
  • Mozenda
  • Enhancements and bug fixes:
  • The following pages describe the enhancements and bug fixes in RapidMiner Studio 9.4.1 releases:
  • Categorical maps: Used to visualize regions that belong to a number of distinct categories
  • Point maps: These maps offer latitude and longitude support and display a marker for each coordinate on the selected map

New in RapidMiner Studio 9.3 (Jun 17, 2019)

  • New Features:
  • Completely reworked how connections (JDBC, as well as any other connections like Twitter, Amazon S3, Dropbox, etc.) work:
  • Connections are now self-contained and stored per repository. This means that when you create a connection, everything you need to use it will become part of the connection entry in the repository.
  • We have added great flexibility when it comes to injecting certain settings of a connection on-the-fly by having added so called Sources for values. The settings can be anything from credentials, to URLs (or part of URLs), and other parameters. For starters, only Macro and RM Server Vault are available as Sources, but the list will grow over time as any extension can add their own Sources!
  • Have a central DB connection where each user should use his own credentials? Create the single connection template on RM Server, indicate that the credentials are injected, and then use our new RM Server Vault as a Source where each user can securely store their credentials!
  • You can now easily share a connection with your colleagues via a Server.
  • They will also work on any execution node, without you having to manually add the JDBC driver to all nodes yourself.
  • To sum it all up, connections are now vastly more powerful than before. They are no longer necessarily statically defined, but instead they can be dynamically altered during runtime to grab the latest credentials, tokens, etc. Of course, you can still put everything that is needed into the connection and be done with it.
  • Not all features of the new connections are accessible through a UI. For extremely advanced and powerful features like chaining different value providers for injection (e.g. Server Vault → CyberArk → DB) or using (injectable) placeholders that build up values of other keys, administrators can create the connection manually (it's a ZIP archive, after all). They can create the configuration JSON to suit their needs, and then upload the ZIP to RM Server. This, together with the injection mechanism, makes connection templates a reality, allows admins to manage connections at scale, utilizing commandline tools to build up and distribute them.
  • The entire mechanism for connections and their Sources is highly extensible and new Sources and connection types can easily be added by extensions. We foresee a whole host of new connections and Sources to become available over the next few months.
  • Auto Model can now be executed on RapidMiner Server instead of locally
  • Users can select if the execution should happen locally in RapidMiner Studio or if processes should be pushed to a connected Server instead. The latter allows to close RapidMiner Studio and fetch the results later from the Server instance.
  • Jobs can be added to any queue the user has access to.
  • Results will be stored on the Server and can be loaded back into Auto Model after completion. Loading of partial results is supported as well.
  • If RapidMiner Studio is kept open while the execution happens on the Server, results will be loaded dynamically, and the progress is shown. The execution of all remote processes can also be stopped in this case.
  • Time Series Analysis features:
  • New Default Forecast Model
  • Predicts always the same forecast value for all future values
  • Can be used as a baseline model to compare other forecast models against it
  • New operator Default Forecast:
  • Trains a Default Forecast Model
  • The forecast value can be calculated by last value, mean in window, median in window or mode in window
  • Last value and mode in window can be used to even create a forecast model for nominal time series
  • New function and Seasonal Forecast Model
  • Predicts future values by evaluating a polynomial function to forecast the trend of a time series
  • Adds or multiplies the values of the seasonal component to the forecasted trend values
  • New operator function and Seasonal Component Forecast
  • Trains a function and Seasonal Forecast Model
  • The operator performs a decomposition (Classic- or STL Decomposition) to determine trend and seasonal component of the input time series
  • A polynomial function is fitted to the trend component
  • The function and the seasonal component are provided as the function and Seasonal Forecast Model to the model output port
  • New operator Autocorrelation / Autocovariance
  • Calculates dependency functions (autocorrelation function, autocovariance, partial autocorrelation function) for an input time series
  • Enhancements:
  • Write Excel now supports creating multiple sheets. Sheet names can be specified via the sheet names parameter
  • Write Excel now supports collections of example sets as input
  • Added Close all other results action for Result tabs, found in the right-click popup menu
  • Improved handling of mandatory parameters that were not set
  • Meta data from repository entries loaded by Retrieve operator are annotated with the repository location
  • Added forward macro checkbox to Schedule Process which allows you to forward all current macros from the calling process to the scheduled process
  • Write Database now defaults to a batch size of 100
  • The operators Map, Replace and Rename by Replacing now have a more convenient regex dialog that can store the replacement value as well
  • Added new function under Advanced functions named attribute(Nominal attribute_name) to the expression parser. This function evaluates the input and retrieves the value of the attribute with the name specified by the (resolved) input.
  • Added a new option Insert as attribute for inserting macros in the UI of the expression parser (e.g. for Generate Attributes).
  • Improved meta data for Nominal to Binomial for attributes where the nominal mapping is not clearly defined
  • Explain Predictions now offers the calculation of model-specific global weights based on the level of support and contradiction each attribute value contributes to the local explanations
  • Turbo Prep now uses the new visualizations for its Charts view
  • Auto Model now tracks more run times, including the time needed for scoring 1,000 rows and training the model on 1,000 rows in addition to the total process execution run time. The overview table also show small badges pointing out the best and fastest models
  • Auto Model now offers to save all results at the end of a local execution. Those results can be loaded instead of re-running the modeling
  • Auto Model now offers a list of recent data sets as well as a list of recent results as part of the first step
  • Auto Model now offers to override the selection of columns for text processing
  • Auto Model now shows the number of created models, the number of evaluated feature sets, and the number of generated features during a run
  • Auto Model now shows the importance of all attributes for each model in addition to the model-independent global weights in the General section of the results
  • Visualizations: Bubble charts (Scatter with a size column) can now display more than 5,000 data points
  • Visualizations: Scatter3D now also supports a numerical color column
  • Visualizations: Scatter Matrix now also supports a numerical or date_time color column
  • Visualizations: Added the highly requested color group option to line/bar/column/area/streamgraph plots. Each distinct value in this column becomes an individual plot element, to allow for easy logical grouping of data without pivoting. The column can be of any type.
  • Visualizations: Aggregation group-by now also supports numerical columns, it will take each distinct number and convert it to a category
  • Visualizations: If the group-by column is numerical or date-time, the groups are now sorted in ascending order
  • Visualizations: X-Axis column and aggregation group-by column are now linked, i.e. changing one also changes the other. This makes switching between aggregation/no aggregation more intuitive and easier to follow
  • Moving Average Filter now offers to specify the left and right side of the simple filter individually instead of being symmetric
  • Improved operator help for Loop Examples
  • Added a positive class parameter to Performance (Binominal Classification) which lets the user manually decide what the positive class is.
  • Visualizations: Heatmaps with aggregation enabled can now also be grouped by two columns at the same time, resulting in a 2D table-like structure with cells for each value combination of the two group-by columns. If you want to plot multiple value columns, you can still group by a single column as before.
  • Copied/pasted operators that have references to other copied operators will now correctly update their parameters.
  • When replacing an operator in place, parameters that are shared between both operators are kept.
  • Repository entry copies are now simply enumerated at the end of their name instead of suddenly starting with "Copy of". This will make finding the copy in large repositories much more straightforward
  • Repository entries can now be directly copied in-place without having to select a target folder first
  • Updated default Oracle jdbc driver class
  • Bugfixes:
  • Fixed a rare bug in Log operator where a process seemingly was not stopping when it was done
  • Fixed a rare bug that could freeze the UI
  • Switching tabs is now only possible with a left-click
  • Fixed schema retrieval in the parameters for some databases (e.g. MySQL)
  • Fixed rare exception in automatic sparsity detection when creating example sets via the new data core
  • Fixed error that could occur when starting Studio in relation to Academy Global Search entries
  • Fixed error message display in expression property dialog for very long errors
  • Fixed Real to Integer when encountering infinity values
  • Fixed a bug in Compare ROC that deleted prediction/confidence columns in the input example set in some cases
  • Fixed handling of non-finite values for integer and real column grouping attributes in Pivot
  • Fixed UI becoming broken when the macro sort order in the Context panel was changed, an empty macro was already in the context, and the user tried adding another macro
  • Fixed a problem that could result in Studio endlessly starting when switching between Win32 and Win64 versions on the same machine
  • Fixed links to educational materials in Auto Model and Turbo Prep
  • Fixed rare bug which could occur for Automatic Feature Engineering if feature generation was enabled with high complexity settings in combination with H2O models
  • Visualizations: OS X 10.11 will now have working HTML5 visualizations again
  • Visualizations: Fixed matrix data (e.g. correlation matrix) visualizations showing the wrong chart type
  • Visualizations: Fixed Scatter3D dots sometimes not being displayed
  • Fixed rare cases that no correct Exception was thrown in Extract Aggregates, Extract Mode and Extract Coefficients (Polynomial Fit)
  • Fixed expected input for the inner 'model' port of Forecast Validation
  • Fixed run-time problems in Replace Missing Values (Series)
  • Fixed the Retrieve operator to update output meta data after a repository entry was removed or created
  • Remove Unused Values now also sorts mappings that do not have unused values
  • Link button icons no longer look pixelated on macOS
  • Visualizations: Wordcloud now takes actual number of distinct words into account for the limit check, instead of also counting words that do not actually occur
  • Dialog about cancelling Progress threads with dependent tasks is now shown in front of the Progress dialog
  • It can no longer happen that Progress threads are still displayed in the Progress dialog even if they are already done
  • Development:
  • Added SwingTools#setPrompt(String, JTextComponent) method which can be used to set a prompt in a text field (gray help text displayed when the field is empty)
  • Added com.rapidminer.gui.actions.CopyStringToClipboardAction which can be used to copy any dynamically supplied string to the system clipboard
  • Added com.rapidminer.gui.ProgressThread#setDependencyPopups method to prevent popups that ask about aborting Progress threads with dependent tasks

New in RapidMiner Studio 9.2.1 (Mar 19, 2019)

  • New Features:
  • Converted old simple charts of the following data types to the new HTML5 visualizations: Weights, Kernel Models, Correlation Matrices, and Rainflow Matrices.
  • Added RapidMiner Academy learning content to the Global Search.
  • Enhancements:
  • Removed unnecessary process validations while editing operator parameters
  • Visualizations: Histograms now support datetime columns
  • Visualizations: Treemap no longer forces a name column (although it obviously makes a lot of sense using one)
  • Visualizations: Boxplot now has an y axis description if Group by is used
  • Visualizations: Boxplots can now only support 100 value columns at the same time, down from 500 (which was unreadable)
  • Visualizations: Large heatmaps (> 10,000 values) now render considerably faster and are less sluggish. If you plot more than 1 million values on a heatmap, it will however still take a while. Note that this means that large heatmaps can no longer be exported as an SVG (it will be an image instead).
  • Visualizations: Heatmaps can now support 500 value columns at the same time, up from 400.
  • Visualizations: Linear regression lines in Scatter plots now show their linear function in the tooltip
  • Visualizations: Improved some heuristics behind the automated chart selection when opening data for the first time
  • Visualizatons: Fixed missing chart update animations for some settings
  • Visualizations: Added reset button to text fields with custom input (e.g. axis min/max values) which resets the field back to its default value
  • Visualizations: Should now work on vanilla Ubuntu 18.04 out of the box, without having to install "libgconf-2-4"
  • Visualizations: Fixed possible freeze of the chart when looking at tooltips of a Vector plot
  • Visualizations: Attribute values can be a bit longer now before they are cut off
  • Bugfixes:
  • Fixed slow popup dialog for selecting attribute subset and user defined attribute ordering
  • Fixed deadlock in statistics calculation in case of 6 or more result sets being opened at the same time
  • Fixed some memory leaks
  • Fixed Apply Feature Set returning wrong meta data
  • Fixed an issue that some User Errors did not appear correctly
  • Fixed an error that sometimes appeared when stopping parallel operators, e.g. Optimize (Grid)
  • Fixed an issue where encrypted settings could not be read
  • Fixed an issue where operator notes were not moved when the operator was pushed out of the way
  • Fixed a crash issue with Print/Export Image function on macOS
  • Fixed a rare bug where CTAs would flicker in a multi monitor setup with Studio showing in a monitor above the primary monitor
  • Fixed meta data for integer or date-time column grouping attributes in Pivot
  • Time Series: Replace Missing Values (Series): fixed case when 'replace infinity' is true and 'skip other missing' is false, that empty strings or infinity values are replaceed with missings instead of just keeping these values.
  • Time Series: Extract Coefficients (Polynomial Fit): Fixed missing meta data information of the indices attribute for the fitted output port
  • Time Series: Process Windows: Fixed name of Window ID attribute in meta data
  • Visualizations: Fixed division by zero error in Histogram plots for histograms on columns with just a single value
  • Visualizations: Fixed jitter not doing anything if only a single distinct value existed in the data for numerical x/y columns
  • Visualizations: Fixed rare error when trying to display a linear regression line on weird data sets

New in RapidMiner Studio 9.2.0 (Mar 19, 2019)

  • Bugfixes:
  • Cross Validation now applies Bessel's correction on the performance variance and standard deviation.
  • Connecting operators in an infinity loop no longer freezes RapidMiner Studio.
  • Fixed unhelpful error message: "Error while training the H2O model: {0}"
  • Fixed a rare bug in Log operator where a process seemingly was not stopping when it was done.
  • Fixed cause for sliders sometimes looking a bit broken.
  • Fixed rare bug in feature set navigator in Auto Model which could lead to misaligned plots and tables
  • Fixed rare bug in Automatic Feature Extraction which could lead to a wrong selection of final feature set
  • Fixed bug for data sets from read-only repositories shown in the results view and opened in Auto Model
  • Time Series
  • Fixed calculation of first quartile, median and third quartile in Extract Aggregates
  • Fixed a bug for all attributes selection when a filter type is selected which checks all Examples individually.
  • Fixed a bug in Apply Forecast operator, in case it was executed inside a parallel operator.
  • Fixed a bug for Windowing and Process Windows in case parameters were wrongly configured
  • Fixed Cross Validation returning the test example set with duplicate rows if multiple performance vectors were connected inside the cross validation. This did not affect any performance metrics.
  • Development:
  • Added utility class PersistentContentMapperStore. This class can be used to store arbitrary information in the local user cache. This can be used to store configurations of results for repository objects, or even things identified via a hash. Example usage of this are the HTML5 charts which save their configuration that way.
  • Added utility class ColorChooserUtilities for opening a HSL color chooser
  • Added DistinctColorSlider and LinearGradientColorSlider UI components where users can select and change a list of distinct colors / linear color gradients conveniently
  • Added ExtendedJListTransferHandler class that allows re-ordering inside a JList via drag&drop
  • Added new interface CleanupRequiringComponent which GUI result components can use to indicate they need to clean something up after a result has been closed by the user. It is called whenever a result tab has been closed.
  • Added "BETA" tag support for result visualization cards (the cards on the left shown when viewing results in the Results view). Add a gui.cards.I18N_KEY.beta = true flag to your i18n properties to indicate a result renderer as a Beta version.
  • Packages com.rapidminer.gui.plotter and com.rapidminer.gui.new_plotter were deprecated and will be removed in the future.
  • New Features:
  • Replaced old charts and advanced charts with new, powerful HTML5 visualizations. There are lots of new plot types and capabilities to explore! Main features:
  • New chart types: Step Line, Spline, Area, Step Area, Spline Area, Range (Line, Step, Column, Errorsbars), Streamgraph, Bellcurve, Funnel, Pyramid, Heatmap, Treemap, Sankey, Packed Bubble, Vector, Wordcloud
  • Enhanced existing chart types and new ones with features like multi-attribute selection, grouping, stacking options, inversion, and displaying as a radar chart (for select charts)
  • Multiple y axes supported
  • Added plotline support (annotated marker lines on x/y/z axes)
  • Chart configurations are now automatically saved. You configure a chart for your data set, close Studio, come back the next day, and when you look at the data again, the same chart you configured will be there again!
  • Some plots can be combined with other plots. You can add as many of those combinable plots to a single charts as you want!
  • Allows you to quickly select the basic settings to get started, but also to fine-tuning even minor chart details
  • Have multiple series in a single chart (e.g. something grouped by labels)? Try hovering over and clicking the legend items to highlight and hide the respective series!
  • Auto Model:
  • Added support for textual data
  • Added feature selection for clustering
  • Added Fast Large Margin and Multiclass Logistic Regression learners
  • Improved feature extraction from dates (calculate all pairwise differences and differences to today)
  • Added predictions vs. label chart for regression
  • Added correlation as performance criterion for regression
  • Explain predictions is now optional in Auto Model and is only automatically activated for smaller data sets
  • Significantly improved runtimes of Auto Model for larger data sets
  • New text analysis operator for feature extraction for text, adding sentiments, and language detection: Text Vectorization
  • New operator for assigning batch numbers to data rows: Generate Batch
  • Introducing the new Create ExampleSet Operator to create example sets from functions, numbers, dates, etc for quick prototyping
  • Cloud Connectivity:
  • Added connectivity to Azure Data Lake Storage (Gen 1):
  • Read Azure Data Lake Storage
  • Loop Azure Data Lake Storage
  • Write Azure Data Lake Storage
  • Time Series:
  • New Operator: Extract Coefficients (Polynomial Fit)
  • It fits a polynomial function to the time series and provides coefficients and (if selected) the discrepancy as features
  • It also provides the fitted function evaluated on the index values of the time series on an additional output port
  • New Operator: Exponential Smoothing
  • It smooths a time series by a factor alpha
  • New Operator: Lag
  • It lags (move) time series attributes to each other
  • Enhancements:
  • Improved CPU utilization of parallel processes (e.g. when using nested Loops).
  • Pre-run check and better error descriptions for Filter Examples wrong and correct predictions
  • Attribute selection dialogs and comboboxes now display the type (numeric, nominal, date_time) of the attribute
  • Attribute selection dialogs now properly sort the available attributes on the left in a human-readable way
  • Improved meta data generation and propagation for several source operators
  • Combobox popups are now as wide as their content needs them to be, regardless of the actual combobox width. This can look a bit funny sometimes, but it's much more useful to be able to actually read the contents than go for nicer looks.
  • Better information in Auto Model for cases and settings where longer runtimes can be expected
  • Dialogs opened by extensions are no longer displaying a warning icon next to them
  • Changed style of tutorials to incorporate RapidMiner Academy
  • Improved default parameters for Gradient Boosted Trees
  • All "Legacy Result Access" operators are now deprecated, existing processes that are still using these operators will continue to work. Please use the operators Store and Retrieve in future processes:
  • Use Retrieve instead of Read Model, Read Clustering, Read Weights, Read Constructions, Read Performance, Read Parameters, Read Threshold and Read.
  • Use Store instead of Write Model, Write Clustering, Write Weights, Write Constructions, Write Performance, Write Parameters, Write Threshold and Write.

New in RapidMiner Studio 9.1.0 (Dec 14, 2018)

  • New Features:
  • The Aggregate Operator got the percentile function where the percentile can be changed in the aggregation attributes functions list. It is possible to use an integer like 75 or a floating point value like 80.5 here. It is of course also possible to use a macro here.
  • Split the setting to keep operators connected upon disabling or deleting them into these settings:
  • Drop or bridge operator connections upon deletion
  • Drop, bridge or keep connections upon disabling
  • SSL certificates stored in .RapidMiner/cacert are now trusted on startup. See trust-certificates for more information.
  • Added support to open operator tutorial processes directly from the web.
  • Enhancements:
  • The "Import Data" dialog for CSV files will try to guess the best matching date format and preselect date for attributes that contain mostly matching date entries
  • The "Import Data" dialog for Excel files does now differentiate between date, time and datetime columns specified in Excel
  • Improved CSV import wizard to use the structure found in the header or starting row
  • Parse Numbers and the Data Import wizards now support exponents in numbers with a leading '+' for positive exponents, e.g. "5.9876E+7"
  • Improved Cross Validation error handling when the Performance port is not connected
  • The XML Panel does no longer hide default values
  • Split thread settings in foreground and background threads (for the currently opened process and processes running in the background, respectively)
  • Updated bundled Java for Windows and OS X to version 8u181. This should fix right-click issues on OS X
  • Added support for aggregation functions for Pivot operator and improved performance
  • When moving operators in the Process view, connected operators will be rearranged and moved to the right if necessary
  • Bugfixes:
  • For large ExampleSets with more than ~71.5 million rows, the result table will compress the height of each row a bit to accomodate. Data sets with more than ~86 million rows will only display the first ~86 million rows and show a warning that the rest is cut off.
  • Fixed an issue that could cause Studio to be stuck for up to ~2 minutes on start-up.
  • Fixed very rare process error when working with attribute weights.
  • X-Means item count of cluster model will now show the correct size.
  • Fixed an issue where (temporary) Access files could not be deleted in a RapidMiner process.
  • Development:
  • Added registerLanguage method to the I18N class, which allows to add new languages to the Settings->Preferences->Language selection. The i18n is picked up by providing resource bundles in the usual form of for example GUI_ja.properties and Error_ja.properties. If you want to get a list of not-yet-translated keys, add a file called translation_help.txt in your .RapidMiner folder. After you shut down Studio with your new language selected, it will write all keys for which it did not find the translation in it. This should help you identify keys that you still need to translate.
  • Added the OperatorPortActionRegistry to add actions to operator ports.
  • Added identifier for last delivering port to the IOObject's userdata via IOObject.getUserData(DeliveringPortManager.LAST_DELIVERING_PORT)
  • Added support for parameter dependencies and hidden state to the settings dialog.

New in RapidMiner Studio 9.0.3 (Oct 4, 2018)

  • Enhancements:
  • The Windows installer now automatically goes to the Finish page after all files have been copied instead of waiting for the user to click "Next".
  • Bugfixes:
  • Performance (Ranking) and Performance (Costs) no longer report a wrong micro average when used inside a Cross Validation.
  • Stacking does now work inside other Ensamble operators.
  • Fixed Outlier Detection in Auto Model.
  • Bugfix for some Time Series operators (notably Process Windows) which did not reset the data from the input port after the operators were executed.
  • Bugfix for Windowing, in case the attribute selection results in no attributes selected.
  • Fixed location of the some dialogs in multi screen setups.

New in RapidMiner Studio 9.0.2 (Sep 6, 2018)

  • Enhancements:
  • Improved user interface responsiveness when running processes
  • Changed progress bar progression for updates to better reflect the actual update process when downloading from the Marketplace
  • Improved default parameters for K-Means, K-Means (fast), K-Means (kernel), X-Means, K-nn, Parallel Decision Tree, ID3, CHAID, Parallel Random Forest, Gradient Boosted Trees, Neural Net and Join to better reflect commonly used values
  • Fixed a problem where entering the wrong credentials to connect ot a remote repository could take a long time to ask for new credentials
  • Improved password input dialog to show that the credentials were invalid or something else went wrong
  • Improved configuration of remote repository when editing the repository
  • Viewing Collection results no longer display lots of wasted space on the left side
  • Restore Default View is now only available for the Design and Result view
  • Enabled "Show location of current Process" for Training Resources
  • Bugfixes:
  • Fixed a bug in which the process panel becomes invisible.
  • Fixed a bug where the process panel was displayed only partially
  • Fixed possible crash on startup on Windows
  • Date to Numerical does now produce Real attribute instead of Integer to prevent truncation
  • Fixed behavior of the Unify Item Sets
  • Fixed bug in Join when using date-time attributes as key
  • Fixed bug with K-Means (fast) causing Determine good start values parameter to be ignored
  • Fixed bug if a process is opened from a file system path that contains more than 150 characters
  • Fixed an issue that prevented Studio from starting
  • Fixed process background image location when zooming in
  • Fixed potential infinity loop with K-Means and X-Means if Determine good start values was used.
  • Fixed a bug in time series operators regarding parameter misbehavior

New in RapidMiner Studio 9.0.1 (Aug 16, 2018)

  • Enhancements:
  • It is no longer possible to close or hide panels in RapidMiner Studio by accidentially pressing certain obscure key-commands. Panel manipulation can now be solely done via right-click on the panel header. Note that you can still press Ctrl-W to close result tabs.
  • Optimized educational & community repository to remove UI freezes.
  • When an operator has no parameters, that information is now displayed in the Parameters panel instead of just showing a completely empty panel.
  • Handle view switch errors more gracefully.
  • Bugfixes:
  • Added runtime check for Loop operators to require at least one iteration
  • Fixed roles bug in X-Means
  • The Configure RapidMiner Server Repository Check connection settings function does no longer give false-positive results, in case valid RapidMiner Server credentials exist in the Password Manager.
  • Fixed potential UI freezes when switching views due to breakpoints etc.
  • Fixed sometimes missing notification in the event of upload errors to RM Server
  • Fixed icon on operators depicting hidden notes when zoomed

New in RapidMiner Studio 9.0.0 (Aug 7, 2018)

  • New Features:
  • Added TurboPrep, your interactive data preparation in a data-centric UI
  • Added a new "admin configuration" feature (documentation here)
  • Operator Blacklisting
  • Extension Whitelisting
  • Telemetry
  • Studio Settings
  • Added new Time Series functionality
  • Added support for Google Cloud Storage with Read Google Storage, Write Google Storage, and Loop Google Storage operators. They work similar to their existing Amazon S3 and Azure Blob Storage counterparts.
  • Added new online repositories which contain up-to-date help content. These contents are used by our online educational materials.
  • Added concatenation function to Generate Aggregation
  • Enhancements:
  • Global Search results can now be navigated by keyboard
  • Operators can now be renamed by double-clicking on their name (indicated by a text cursor)
  • Improved operator renaming visuals when zoomed in/out of the process
  • Process panel in Design view can no longer be closed
  • Updated behavior for Result History panel outside of Result view
  • Uncloseable panels no longer have close buttons
  • Updated import wizards for Read CSV and Read Excel operators to make them consistent with the Add Data repository action
  • Added Remove All Breakpoints entry to Edit menu and right click context menus
  • A warning is shown for correlation matrices that could not be calculated
  • Improved the guessing for type of Quotes during CSV import
  • Improved the guessing on decimal separator in CSV import
  • Twitter operators now correctly warn about the rate limit when it is exceeded instead of throwing a generic error
  • Hyperlinks in process notes are now clickable and open the default browser
  • Repository actions that need write access are now grayed out when a read-only entry is selected
  • Inserting an operator via Global Search will now correctly grant focus to the Process panel, so you can immediately use the keyboard to manipulate the operator
  • Added workaround for a bug in the Amazon Redshift JDBC driver so that it can be used now
  • Saving a process in a read-only repository now offers the SaveAs dialog instead
  • Repository location chooser (for opening and for saving) no longer sometimes appears as a separate instance of RM Studio in the operating system taskbar
  • Bugfixes:
  • Clicking on a selected operator no longer sometimes selects an operator behind it
  • Fixed process panel sometimes being opened in other views
  • Fixed an issue where icons did not show up on Retina displays
  • Updated vulnerable libraries
  • Fixed potential UI freeze during the Import Data process
  • A rare error concerning parallel loops in combination with Generate Attributes was fixed
  • Fixed an issue that RapidMiner Studio always started in fullscreen mode on Mac OS X
  • Fixed results view not showing the latest result as the active tab
  • Development:
  • Added callback hook for DataImportWizardBuilder. The callback can be used to determine by the caller what should happen after the user has concluded the data import.

New in RapidMiner Studio 8.2.1.0 (Jun 28, 2018)

  • New Features:
  • Added possibility to disconnect from RapidMiner Server repositories
  • Enhancements:
  • Edit Access Rights dialog is now read-only if the user does not have enough permissions to make changes
  • The Generate Weight Stratification does now warn about mismatching data
  • Updated tutorial process for Loop Attributes
  • Bugfixes:
  • Fixed broken preview when using the Guess value types or Reload data buttons in the Import Configuration Wizard of the Read Excel and Read CSV operators, after manually changing the attribute selection or an attribute role.
  • Fixed a metadata problem with the Singular Value Decomposition operator showing the wrong type of preprocessing model.
  • Fixed a bug causing Aggregate to concatenate the same value multiple times even though only distinct was set.
  • It is no longer possible to toggle breakpoints if Process panel is not visible.
  • Write CSV is no longer writing Integer values as floating points.
  • Updated mode aggregation function of Aggregrate to take missing values into account.
  • Remember can now be used in every iteration of a parallel operator, instead of only the last. No execution order is guaranteed.
  • The New Revision server repository action does no longer block the UI.
  • Fixed bug preventing SVM Kernel Scatter Plot from displaying certain variables.
  • The macro command line argument -M does now work as expected when passed to the rapidminer-batch.bat launcher.
  • Fixed rare bug that could occur when looking at a subprocess of a parallel operator while zoomed out and trying to run the process.
  • Fixed pass through port of the Correlation Matrix operator (returned a subset of the input for some data sets).
  • Fixed missing visual indicator in the top bar for the currently selected view when resizing RM Studio horizontally.
  • Fixed spelling error in Direct Marketing template.
  • Fixed spelling error for mikro/makro.
  • Fixed a problem using undo/redo during a tutorial.
  • Fixed a rare bug that might occur on restoring a process on startup.
  • Fixed uncommon bug where Views will break when switching too fast between them.
  • Fixed bug making Apply Threshold use the wrong mapping.

New in RapidMiner Studio 8.2.0.0 (May 8, 2018)

  • Enhancements:
  • Double-click on an unconnected operator port will connect it to a matching output port of the process.
  • The menu View -> Show Panel is now scrollable.
  • Updated visualization of tutorial's next button to go to next tutorial or back to tutorial overview when reaching the end of a tutorial or a chapter respectively.
  • Removed search button from search bar and changed result dialog to open with one-click logic.
  • Creating a RapidMiner Server repository no longer stores the credentials automatically. However, if desired you can still do so by selecting the "Remember Password" checkbox when creating the repository.
  • Panels now always have proper tooltips.
  • Improved visualization of nested Operators.
  • Added primary parameter mechanic to some Operators; double clicking an Operator now opens the editor of a primary parameter. This also works for operators that have subprocesses. In that case, pressing the Alt-key while double-clicking activates the primary parameter.
  • Quickfixes now can be directly accessed after a process run fails from the error bubble.
  • Improved performance of FP Growth and added support for additional input formats.
  • The status bar (found at the very bottom of RM Studio) now more precisely displays possible actions when editing a process.
  • Pressing the arrow keys in the process panel when no operators are selected will now select the first operator.
  • Bugfixes:
  • Parallel operators now produce identical results when running in parallel and when running sequentially
  • Removed several sources for redundant undo steps
  • Fixed a bug that could lead to incomplete output of Execute Program
  • Fixed and improved on generic process runtime errors
  • Fixed erratic behaviour of EMClusterer
  • Date to Nominal does no longer remove the role of the selected attribute
  • Fixed a bug where results from Data to Similarity Data could not be processed further
  • Fixed an issue that could result in the "Drag here" annotation being shown in the process all the time when using the Global Search
  • Fixed a bug that allowed operators to connect to themselves
  • Fixed Web Analytics template

New in RapidMiner Studio 8.1.3.0 (Apr 19, 2018)

  • Enhancements:
  • Added feedback form to final step of each tutorial to help us improve the tutorials.
  • Added feedback form to operator help panel to help us improve the documentation.
  • Scrolling when multiple scrollable areas are nested within each other now works as expected.
  • Creating a RapidMiner Server repository no longer stores the credentials automatically. However, if desired you can still do so by selecting the "Remember Password" checkbox when creating the repository.
  • Bugfixes:
  • Fixed concurrent access of repository entries from RM Server (e.g. inside a parallel Loop operator).
  • The choice to downsample data if the data has more rows than the current license permits now works correctly inside parallel operators.
  • Typing attribute names in a field which has auto-completion no longer causes the field to lose focus mid-typing.
  • Fixed a bug with Aggregate that could crash a process.
  • Fixed a very rare bug that crashed the process when using the old and no longer supported Tree panel.
  • Fixed a problem where one could accidentally create connection loops when dragging an operator.
  • Fixed problem with Mac OSX fullscreen mode.
  • Fixed refresh issue in the Repository Panel.
  • Fixed an issue that could cause endless dialogs asking for login credentials when connecting to RM Server.
  • Development:
  • Fixed a crash that occured when using NominalToNumericModel programmatically.

New in RapidMiner Studio 8.1.1.0 (Mar 7, 2018)

  • Enhancements:
  • Deleting a RapidMiner Server repository now removes it's credentials from the wallet.
  • Whitespaces at the beginning or end of attribute names are now automatically discarded for newly imported data to prevent invalid attribute names.
  • Bugfixes:
  • Fixed bug causing multiple operators to reject attribute names with leading or trailing whitespaces
  • Fixed bug causing failure to load templates

New in RapidMiner Studio 8.1.0.0 (Feb 6, 2018)

  • New Features:
  • Added Auto Model feature, a new working mode for rapid creation, comparison, and exploration of new models. It can be found as a new view at the top.
  • Added a powerful global search functionality which can be found in the top-right corner and activated via Ctrl+F shortcut. You can currently search for operators, repository contents, UI actions, and Marketplace content. See the documentation for more information if you are interested in more complex and powerful search queries (e.g. finding data/models that contain a specific attribute, or were last modified before a certain date, etc).
  • Enhancements:
  • New Process Templates upgraded to use the latest operator versions.
  • Read Excel now allows sheet selection by name.
  • Read CSV, Read XML and Read Excel have a new expert parameter read all values as polynominal, which allows the user to disable type guessing.
  • Hide passwords in the Password Manager dialog and store them with a stronger encryption.
  • Seach Twitter and Get Twitter User Statuses added support for 280-character tweets.
  • All Twitter operators moved from numerical to nominal attributes for user and status IDs.
  • Made the Views display at the top more dynamic on resizing to prevent squashed GUI elements for low(er) resolutions and to show more views for high(er) resolutions. To achieve this, both the Undo and Redo buttons for process editing were removed. You can still undo/redo via the top Edit menu, or by pressing Ctrl+Z/Ctrl+Y, or even via the new global search by searching for Undo or Redo.
  • Bugfixes:
  • Secured XML parsing against XXE vulnerability
  • Fixed a rare error when logging inside parallel operators
  • Fixed problem that caused Parse Numbers to fail if input was an empty value
  • Fixed a rare error when running Join, Replace Missing Values, or Add inside a parallel loop
  • Fixed handling of polynominal attributes in Apply Model when applying a Cluster Model
  • Updated Regularized/Linear/Quadratic Discriminant Analysis to avoid uncaught errors and give more information if an error occurs
  • Fixed uncaught Runtime Exception when using Loop Parameters and Optimize Parameters (Grid) with log_all_criteria
  • Fixed issues with duplicated or missing entries, as well as missing groups in the Manage Connections dialog
  • Refreshing folders in a RapidMiner Server repository no longer blocks the entire Studio interface
  • Renaming entries in a RapidMiner Server repository no longer blocks the entire Studio interface
  • Pressing Ctrl-A in an empty process no longer makes the process parameters disappear
  • Hotkeys for view switches now work properly from all views
  • Upgraded MSSQL JDBC driver to version 4.2
  • Upgraded PostGreSQL JDBC driver to version 42.2.1
  • Development:
  • The Global Search feature is highly flexible and open to extensions - look at com.rapidminer.search.GlobalSearchable and com.rapidminer.gui.search.GlobalSearchableGUIProvider to get started!
  • Unsigned 3rd party extensions can now call ParameterService#setParameterValue(String, String) without causing a SecurityException
  • Please note: We have accumulated lots of outdated code over the years. Anything that is annotated with @Deprecated will be removed at some point in the future. Removal will start with RapidMiner Studio 9.0, so please prepare your extensions by not using any deprecated code anymore. JavaDoc will help guide you to replacement classes/interfaces/methods.

New in RapidMiner Studio 8.0.1.0 (Dec 28, 2017)

  • Bugfixes:
  • "Manage Database Connections" dialog is no longer shy and will appear again if requested.

New in RapidMiner Studio 7.6.1 (Sep 6, 2017)

  • Enhancements:
  • Random Forest results are now reproducible between runs
  • Bugfixes:
  • Fixed "Invalid ice_root" error if the Windows username contains whitespaces when running Logistic Regression, Generalized Linear Model, Gradient Boosted Trees or Deep Learning
  • Advanced properties of database connections now support special characters
  • Fixed an issue when storing in a repository inside parallel loops
  • Fixed Support Vector Machine (LibSVM) crashing when running in one-class mode
  • Advanced settings for Oracle database connections are no longer ignored
  • RapidMiner Studio no longer freezes, if a RapidMiner Server database connection has no password
  • Fixed freeze in case the EULA is declined
  • Developers:
  • ParameterTypeEnumeration#getXML properly returns the default value

New in RapidMiner Studio 7.6.0 (Sep 6, 2017)

  • New features:
  • Sending notification emails can now be configured in the preferences to make use of all modern connection security and authentication mechanisms like TLS 1.2 + PFS
  • Enhancements:
  • The sender of notification emails can now be configured in the preferences
  • Licenses are now valid for the full last day until midnight
  • Improved handling of infeasible parameter values for Self-Organizing Map
  • Changed default sampling type parameter for Validation operators to automatic
  • Write Message now has a parameter option to append to existing files instead of overwriting them
  • Logistic Regression and Generalized Linear Model learners now have a threshold output where they deliver a threshold value optimized for maximal F-measure
  • Improved handling of missing and infinite values for Normalize
  • Improved handling of missing or broken compatibility numbers in the process xml
  • Made behavior of add as label parameter consistent for all cluster operators
  • Improved checks for empty example sets in cluster operators
  • Improved shown capabilities for cluster operators and added quick fixes for inconsistent parameter selection
  • Reduced some internal logging by moving it behind the debug flag which can be activated in the preferences
  • Updated Java for Windows and Mac OS X to version 8u141
  • Bugfixes:
  • Fixed reproducibility of results when concurrent operators (e.g. Loops) are involved.
  • Changing the default connection timeout setting in the preferences now takes effect immediately.
  • Sending notification emails now uses the default connection timeout.
  • Fixed metadata of Flatten Clustering.
  • Fixed behavior of Loop Parameter inside parallel loops.
  • Removed unnecessary warning for clustering operators with nominal input data
  • Generate Weights (LPR) and Local Polynomial Regression now provide additional kernel parameters for the numerical measure KernelEuclideanDistance instead of failing
  • Fixed Gradient Boosted Trees renderer, it no longer shows wrong edge labels and incorrect value sets
  • Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning operators no longer crash the software if certain temporary folder permissions are missing
  • Logistic Regression and Generalized Linear Model learners now use 0.5 as the threshold as other binominal learners
  • Fixed behavior of Loop Attributes when only one attribute is selected for parallel execution
  • Fixed Average for Performance inputs that contain AUC
  • Fixed side-effects of Apply Threshold in other branches of the process
  • Fixed rare crash in Create Association Rules under certain parameter configurations

New in RapidMiner Studio 7.5.001 (May 10, 2017)

  • Enhancements:
  • Fewer unnecessary copies of example sets while running processes.
  • Added missing source description when opening data from the App Objects panel.
  • Bugfixes:
  • Fixed tracking of example set source for certain ExampleSet collections
  • Fixed closing a tab by clicking 'x'
  • Fixed macro support for process root parameters

New in RapidMiner Studio 7.5.0 (May 10, 2017)

  • New features:
  • The first iteration of new data core that manages data sets in a much more efficient way has arrived! This results in both better performance and less memory usage for the vast majority of operators.
  • Added support for Microsoft Azure Blob Storage with Read Azure Blob Storage, Write Azure Blob Storage, and Loop Azure Blob Storage operators. They work exactly like their existing Amazon S3 counterparts.
  • Added support for Amazon Key Management Service (AWS KMS) for all Amazon S3 operators. You can now optionally add an encryption key id to your Amazon S3 connection to decrypt/encrypt files when working with Amazon S3.
  • Added a new mechanism to provide help, advice messages, and even important announcements to the user.
  • Enhancements:
  • Completely revised result graph interaction, presentation, and visualization (e.g. decision trees, clusters, etc.).
  • It is now possible to highlight the path to a node of a decision tree in the Results view.
  • Cluster nodes in the Results view are now scaled according to their relative size.
  • Undo and redo functionality is now much more intuitive when working with the process canvas. It will now not only restore the process state, but also restore canvas location, operator selection, and the zoom level.
  • Navigating up and down through subprocesses in the UI is now more user friendly. When entering a subprocess and later going back up, you will see the same part of the process you were looking at before entering the subprocess.
  • Remove Duplicates now features a new output port called duplicates which returns the examples identified as duplicates.
  • Fixed memory leaks for Handle Exception, Select Subprocess, and Branch.
  • Execute Script now caches the parsed scripts for significantly faster execution, especially inside Loop operators or other highly concurrent environments. General performance of script execution has also been improved. Also added operator tags and added a default example script to make usage of the operator easier. Last but not least, error messages now include the causing stacktrace for easier debugging.
  • Improved AutoMLP performance.
  • Loading context data shows progress now.
  • Added new global process macro: %{process_start} which captures the timestamp when a process was started.
  • It is now possible to close result tabs with the same shortcut as in your web browser: ctrl+w (command+w on OS X)
  • Added new tutorials for RapidMiner Server and RapidMiner Radoop.
  • Added some more usable date and datetime format defaults to choose from when importing data.
  • Added folder buildingblocks in the .RapidMiner directory which will also be searched for .buildingblock files on startup.
  • The dialog letting you know about an available RapidMiner Studio update now also displays the version number of the update.
  • Bugfixes:
  • Fixed a bug making all parallel Loop operators incredibly resource hungry when running hundreds of thousands of iterations
  • Error bubbles indicating the source of an error in the process now work correctly in nested loops again
  • Removed empty confidence columns when applying the model from Linear Discriminant Analysis, Quadratic Discriminant Analysis, Regularized Discriminant Analysis, Single Rule Induction, Subgroup Discovery
  • Regular Discriminant Analysis no longer ignores the alpha parameter
  • The median for Aggregate now takes the middle point of both middle values in case of an even number of values
  • Fixed error that made operators which use a connection (e.g. Read Salesforce) unusable after importing a process
  • Fixed layout of marketplace search link in operator panel
  • Fixed broken dialog title for package download error
  • Fixed broken configurable entries due to unnecessary escaping
  • Fixed delay when trying to view decision trees in the Results view
  • Fixed major memory leak for Loop, Loop Values, Loop Attributes, and Loop Files
  • Fixed some operator parameter help tooltips being cut off
  • Fixed behaviour of Fast Large Margin if learned with bias (parameter)
  • Fixed pdf/svg image export of the scatter matrix chart
  • Fixed some spelling errors
  • Fixed Linear Regression calculation in case use bias is not selected
  • Fixed confidences of Ada Boost in border cases
  • Logistic Regression and Generalized Linear Model no longer allow p-value calculation without adding intercept
  • Fixed problem when trying to delete extensions of which more than one version was installed
  • Developers:
  • Concurrency API introduced with 7.4.0 is now available for unsigned extensions

New in RapidMiner Studio 7.4.0 (May 10, 2017)

  • New features:
  • Processes can now be executed in the background of Studio while you work on a different process in the user interface. This feature is only available for users with a Large license.
  • New parallelized Loop operator.
  • New parallelized Loop Values operator.
  • New parallelized Loop Attributes operator.
  • New parallelized Loop Files operator.
  • Repository entries can now be sorted by date.
  • Users with Large licenses can now grant additional permissions to unsigned extensions.
  • Enhancements:
  • Added a few new templates which can be used as a starting point when creating a new process.
  • Improved performance of Polynominal Regression.
  • Improved performance of Linear Regression.
  • Improved error message in case a selected input attribute for an operator is of the wrong type.
  • Improved operator progress for Generate Massive Data and several segmentation operators.
  • Improved performance of LibSVM and Fast Large Margin when sparse input data is not in sparse data format.
  • Small performance improvements for several operators that read parameters unnecessarily often.
  • Performance improvement for operators that iterate over all attributes.
  • Optimize by Generation (Evolutionary Aggregation) no longer shows unnecessary popup.
  • Repository entry sorting by name now ignores capitalization.
  • Users with Large licenses can now grant additional permissions to unsigned extensions via a new setting in the Start-up tab in the preferences.
  • The Log table in the results panel now also uses the new UI look and feel.
  • Bugfixes:
  • Fixed useless cipher error when starting Studio for the very first time.
  • Fixed swapped title in models of Linear Discriminant Analysis and Quadratic Discriminant Analysis.
  • Fixed side-effects of application of preprocessing models in other branches of the process.
  • Fixed side-effects of Impute Missing Values in other branches of the process.
  • Fixed wrong behavior when dismissing confirmation dialog asking for interruption of currently running process.
  • Fixed Delete File not being able to handle relative paths.
  • Meta data calculation of Generate Nominal Data can no longer cause freezing.
  • Optimize by Generation (Evolutionary Aggregation) no longer does one iteration too much.
  • Fixed Number of threads setting having no effect for Decision Tree and Random Forest if it was set to 1 and then increased again.
  • Fixed rare error that could occur when displaying a grouped model in the results view.
  • Developers:
  • Added a temporary API for operators which should run in a parallelized fashion. Use the com.rapidminer.studio.concurrency.internal.ConcurrencyExecutionServiceProvider to access it.

New in RapidMiner Studio 7.3.1 (Jan 10, 2017)

  • Enhancements:
  • Improved error messages
  • Improved speed of chart calculation for many nominal attributes
  • Improved performance of operator Remove Duplicates
  • Improved support for Salesforce objects with missing values
  • Bug fixes:
  • Fixed model applying and concurrency issue of new Cross Validation
  • Fixed side-effects of Remap Binominals
  • Fixed display of tutorial process descriptions in the operator help
  • Fixed discretization steps in Decision Tree (Multiway) models

New in RapidMiner Studio 7.3.0 (Jan 10, 2017)

  • Enhancements:
  • New parallel Cross Validation operator replaces X-Validation, Batch X-Validation, and X-Prediction.
  • Operator search now also searches for matching Marketplace extensions
  • Greatly improved Proxy UI and logic
  • Logistic Regression, Generalized Linear Model and Gradient Boosted Trees now return Attribute Weights output as well
  • Added reproducible parameter to Logistic Regression, Generalized Linear Model and Gradient Boosted Trees. If checked, the result is guaranteed to be the same, because the parallelization level is fixed.
  • Improved sorting for repository entries.
  • Performance improvement for Rule Induction and Perceptron operators.
  • Improved high DPI support.
  • Improved operator progress for Apply Model and Logistic Regression (SVM).
  • Improved welcome dialog layout.
  • Bug fixes:
  • Fixed NullPointerException in Logistic Regression and Generalized Linear Model with compute p-values on and solver set to AUTO on an input with large number of nominal values
  • Changed the default of the max_w2 parameter of Deep Learning to 10, as the operator help describes; it also became a non-advanced parameter
  • Fixed some minor tutorial inconsistencies
  • If there is a security error, Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning operators can recover without Studio / Server restart
  • Input data rebalancing in Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning no longer depends on the number of cores but the number of threads (configurable)
  • Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning operators are now loaded even if javafx package is missing from the Java Runtime Environment
  • Fixed multiple problems with the GSP operator
  • Operator progress now vanishes if operator is successfully stopped
  • Fixed operator progress animation being stuck sometimes
  • Fixed import excel data UI issues on Mac OS X
  • Fixed that in-Hadoop scoring of Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning models in Rapidminer Radoop no longer logs something for each row (leads to significant performance improvement)
  • Development:
  • Added a centralized API for data table creation: From now on a new ExampleSet should be created via an ExampleSetBuilder provided by the ExampleSets class instead of using MemoryExampleTable
  • Tweaked project structure for the open source core. This does not affect the functionality of RapidMiner Studio.

New in RapidMiner Studio 7.2.3 (Jan 10, 2017)

  • Bug fixes:
  • Read XML wizard UI crash fixed
  • Guess Types no longer fails in case an attribute only consists of missing values
  • Fixed display of Visualize Model by SOM operator results
  • Fixed exception handling for unsafe Exceptions which might be thrown by some operators
  • Fixed some problems with the Password Manager UI

New in RapidMiner Studio 7.2.2 (Jan 10, 2017)

  • Bug fixes:
  • Fixed slowdown of Execute Process operator if it was used in a loop and the process was cached.
  • Fixed permissions (e.g. H2O) for operators running inside a 3rd party subprocess operator.
  • Temp folder permissions are now be correctly granted to 3rd party extensions as intended.
  • Fixed some inconsistencies with extension dependencies.
  • Development:
  • Added system property com.rapidminer.security.enforce to force plugin sandboxing even on SNAPSHOT versions. Start Studio with -Dcom.rapidminer.security.enforce=true to force the sandboxing. This is useful to test how your extension will behave under live conditions without having to build and test against a release version of Studio.

New in RapidMiner Studio 7.2.1 (Jan 10, 2017)

  • Enhancements:
  • Improved error handling when marketplace is not reachable
  • Improved error feedback when trying to log into RapidMiner Cloud before having accepted the Cloud ToS via the Cloud Repository
  • Reporting a bug now takes users to the community portal instead of a dead link.
  • Bug fixes:
  • BUGFIX: Copying an operator with subprocesses which themselves contain operators now works again as expected.
  • BUGFIX: Quartile Color and Quartile Color Matrix charts will no longer crash when being used to visualize data with a numeric color attribute.

New in RapidMiner Studio 7.2.0 (Jan 10, 2017)

  • Enhancements:
  • Added search field for preferences dialog to help find specific settings.
  • Massive performance improvements for PostgreSQL connections.
  • Performance improvements for Write Excel operator.
  • Performance improvements for Nominal to Numerical operator.
  • Performance improvements for Set Minus operator.
  • Minor performance improvements for Join operator.
  • Minor performance improvements for FP Growth operator.
  • Errors while running a process now display a more meaningful headline in many cases.
  • Improved display of real numbers very close to integers. Increasing the fraction digits setting to 16 and higher will now lower eagerness to round extremely small numbers.
  • Deletion of repository items does not longer block the user interface.
  • Improved Data Editor handling for date cells.
  • Improved Data Editor appearance to match the Results view data display.
  • Data Editor dialogues will now be displayed relative to the application screen.
  • Improved progress display for several operators.
  • Improved loading and display of operator help.
  • Extensions will be sorted by name in the About Installed Extension menu and Manage Extension dialog.
  • Improved feedback in case we cannot open a URL in your browser.
  • Improved progress feedback for one-click extension installation via Marketplace web site.
  • Uninstaller.exe is now signed to prevent false-positives from some virus scanners.
  • Bugfixes:
  • Fixed some problems during CSV import if the file contents were not well-formed
  • Logistic loss of Performance (Classification) now returns the correct value (and not infinite)
  • Loop operator no longer ignores the timeout parameter
  • Loop Until operator now only shows timeout parameter when limit time parameter is checked
  • Removed weighted examples capability of k-NN operator info
  • Instead of failing, the Write Database operator throws now a UserError if the received example set contains no attributes
  • Write Excel will now cut off too long nominal values (limit is 32,767 characters) instead of producing broken Excel files
  • Generate Data now makes clear which functions support bounds and provide other parameters for those that do not support them
  • Changed the order of the parameters of Reorder Attributes to better reflect the dependencies between them
  • Fast Large Margin now throws meaningful error message when there is only one class in the training example set
  • Date to Nominal is no longer applied incorrectly to attributes with non-date types, but throws an error instead
  • The query builder UI of Read Database now quotes table and attribute names correctly. This fixes e.g. issues with running SQL queries on HSQLDB
  • Apply Model no longer accepts application parameters unsupported by the given model
  • Explicit design-time and execution-time error is shown when date format is invalid in Date to Nominal or Nominal to Date
  • When parsing numbers, lowercase "e" is now also accepted as exponent separator in scientific notation
  • Renaming or moving the currently open process in the Repository panel now updates the current path, so saving does not lead to process duplication
  • Fixed performance problem that could occur after inspecting graphs or certain charts
  • Pie, Pie 3D and Ring charts are now always circular and not stretched into an elliptical shape
  • Fixed possible GUI corruption in case a warning bubble appeared when a process was started from the Results view
  • Inspecting Association Rules in the Results view no longer causes random errors sometimes
  • The result table no longer cuts off the last character if the nominal value ends with a newline
  • Removed obsolete setting to specify whether extensions should be installed locally or globally
  • Shrink process works now when zoomed out
  • Stopping a process during the last operator no longer still produces results sometimes
  • Fixed rare StackOverflowError during advanced chart usage
  • Fixed misleading error when trying to connect to a very old RapidMiner Server
  • Fixed duplication of problems in the Problems panel when copying & pasting an operator
  • Fixed error when trying to paste operators into a process by using the global Edit menu

New in RapidMiner Studio 7.1.1 (Jan 10, 2017)

  • Enhancements:
  • The import wizard suggests the name of the imported file as repository entry name
  • When clicking on the warning icon of an operator, a bubble with the detected problem is shown
  • The database login timeout is now a Preference setting and its default value is increased from 30s to 60s
  • Improved progress display for numerous operators
  • Bug fixes:
  • BUGFIX: Installing an extension from the Marketplace no longer modifies the installation directory
  • BUGFIX: Fixed bug in .bat start script that aborted the program start if JAVA_HOME contained quotation marks
  • BUGFIX: Fixed position of connection multiplication button when zoomed

New in RapidMiner Studio 7.1.0 (Jan 10, 2017)

  • Enhancements:
  • Added zoom functionality to process editor
  • Process status display
  • Replaced MySQL Connector/J with MariaDB Connector/J
  • Removed Ingres Connector
  • Improved memory management for large processes
  • Optimized garbage collector settings and heap size calculation
  • Added icon to delete hovered or selected connections
  • Bug fixes:
  • BUGFIX: Optimize Parameters (Evolutionary) now also delivers the best result set
  • BUGFIX: Fixed unnecessary memory consumption by previous results
  • BUGFIX: Fixed the resolution of predefined macros in expressions
  • BUGFIX: Read Access creates a temporary file instead of loading the Access database into memory
  • BUGFIX: Read Access and Read Database only fetch meta data when the settings parameter “Evaluate SQL meta data” is checked
  • BUGFIX: Fixed error in Filter Examples when the condition expression evaluates to missing value
  • BUGFIX: The “cut” function of the expression parser no longer throws an exception when the last character is included

New in RapidMiner Studio 7.0.1 (Jan 10, 2017)

  • Bug fixes:
  • BUGFIX: Fixed Store and Retrieve operators for SVM and regression models

New in RapidMiner Studio 7.0.0 (Jan 10, 2017)

  • Enhancements:
  • Overhauled the entire user interface
  • Added new unified data import mechanism. So far supported: Excel, CSV, databases, and binary files.
  • New tutorials mechanism
  • New process template mechanism
  • Re-arranged operators in Operators panel to better fit analytic workflows
  • Renamed 'View' to 'Panel' and 'Perspective' to 'View'
  • New start-up dialog which replaces the old Welcome perspective
  • Moved "Synchronize Meta Data" toggle button from process panel to 'Process' menu in the top menu bar
  • Drastically improved performance when browsing data tables in the result view
  • Added support for high-dpi icons on OS X Retina Displays
  • Improved Operator documentation layout to increase readability
  • Improved Random Forest classification performance by adding a new voting mechanism
  • The "no result port connected" warning can now be turned off
  • Added option to specify a background image for a process when right-clicking on the process canvas
  • Changed keyboard layout: escape navigates to the parent process (if any), backspace deletes selected operators and annotations on OS X.
  • Added a pop-up menu item to the operator execution order mode as an alternative way to leave it
  • Dummy operators (e.g. from missing extensions) are now colored in red
  • Added a parameter for quickly downloading missing extensions for Dummy operators
  • The 'Generate Aggregation' operator now shows an error if the attribute filter does not return any attributes by default (can be deactivated)
  • Bug fixes:
  • Passwords for proxy server are stored encrypted now
  • Linear Regression now shows correct errors for coefficient estimation
  • Fixed an error in the ANOVA calculation that returned wrong values if the numbers of groups were equal to the number of examples
  • Association rules in the Results view are now displayed correctly filtered without user input
  • Selecting operators in the process panel no longer causes UI freezes
  • Rearrange operators will create only one undo step now
  • Fixed misbehavior of the undo action
  • Fixed a rare issue which prevented opening the 'Manage Database Connections' dialog
  • Process 'Tree' panel selection will once again navigate the main 'Process' panel
  • Fixed crash while starting Studio and immediately locking the computer on Windows
  • The view files in the .RapidMiner folder will no longer become huge due to duplicate information
  • Fixed re-attaching floating panels via pop-up menu
  • Fixed restoring hidden/detached panels sometimes vanishing
  • Fixed settings dialog layout when opened multiple times
  • Notification pop-ups (e.g. when adding unsupported operators) are now always in the correct location
  • Fixed parameter warning/error hint duplication when resizing Studio while the warning is being displayed
  • Fixed dialog size for long error messages
  • Fixed display of some characters on Windows 8/10 and Mac

New in RapidMiner Studio 6.5.2 (Jan 10, 2017)

  • Enhancements:
  • Re-launch of RapidMiner Cloud adding a free community offering
  • Bug fixes:
  • BUGFIX: Fixed a rare issue that prevented opening the Manage Database Connections dialog

New in RapidMiner Studio 6.5.1 (Jan 10, 2017)

  • Bug fixes:
  • BUGFIX: Fixed an expressions evaluation error that occurred when referencing attributes with an index of 251 or higher.
  • BUGFIX: Fixed automatic license downloading on startup for RapidMiner Studio.
  • BUGFIX: The backspace button now works again to navigate to the parent operator in the process editor.
  • BUGFIX: Average of Performance (Cost) now displays the correct micro performance.
  • BUGFIX: The ID attribute can now be shown in charts.
  • BUGFIX: Unsupported parameters for Optimize Parameters (Evolutionary) can no longer be selected.

New in RapidMiner Studio 6.5.0 (Jan 10, 2017)

  • Enhancements:
  • Completely overhauled problem and error notifications when running processes
  • All Learner Models will show an error rather than log a warning when applied on incompatible data
  • Repositories are now sorted by type and name
  • Improved churn template when using custom data
  • Improved performance when navigating RapidMiner Server repositories over a slow connection
  • Execute Process nesting depth is now limited to prevent endless loops; the maximum depth can be tweaked in the preferences
  • Added Netezza 7.0 JDBC support
  • Added a new "Move into new Subprocess" action that allows moving a group of selected operators into a Subprocess operator
  • Standard dialogs now support hyperlinks in the description
  • API: ParameterTypeText is now able to handle template text that is shown in the TextPropertyDialog if no text is set
  • API: Removed SassyReader and kdb dependencies, increased SLF4J API dependency to version 1.7.12
  • Bug fixes:
  • BUGFIX: Fixed possible startup problems when the _JAVA_OPTIONS environment variable is set
  • BUGFIX: Fixed rare cases of Studio becoming unresponsive because dialogs opened behind other dialogs
  • BUGFIX: When opening a process from the Server Processes view, confirmation is now required before an unsaved process is discarded
  • BUGFIX: Fixed rare problem when trying to save preferences
  • BUGFIX: Fixed some copy and paste problems of process notes
  • BUGFIX: Fixed Generate Data performance when selecting gaussian mixture clusters as the target function
  • BUGFIX: Fixed several problems when both Process and XML views were open and visible at the same time
  • BUGFIX: "Sample (Bootstrapping)" now duplicates examples when upsampling data
  • BUGFIX: Averaging of Performance Vectors can now handle additional or fewer classes after the first iteration
  • BUGFIX: Aggregate operator now supports non-alphanumerical attribute names for grouping
  • BUGFIX: Execution order is now up-to-date even if process validation has not finished
  • BUGIFX: Fixed computation of binary classification criteria (performance) for remapped binominal labels
  • BUGFIX: Decision Tree and Random Forest can now handle an unbounded number of different label values
  • BUGFIX: 'Principal Components Analysis', 'Generalized Hebbian Algorithm', 'Independent Component Analysis' or 'Principal Component Analysis (Kernel)' in combination with Apply Model no longer modify the original example set
  • BUGFIX: Decision Tree(rule) model edge labels now correctly display dates instead of Unix timestamps in the Results perspective
  • BUGFIX: Read Access and Write Access now work with 64-bit Java and Java 8
  • BUGFIX: Log operator no longer silently fails if duplicate column names have been entered
  • BUGFIX: Fixed rare case where the Chart view in the Results perspective was broken
  • BUGFIX: Fixed rare case where the date format field vanished in data import dialogs
  • BUGFIX: Context data is no longer loaded when the input port is not connected
  • BUGFIX: Generate Attributes no longer forgets roles in metadata if an attribute is overwritten
  • BUGFIX: Read Excel, Read CSV, and Read XML can now be stopped
  • BUGFIX: Metadata of Execute Process operators is no longer calculated if an endless process loop is suspected
  • BUGFIX: Loop Files operator now shows an error message if the directory is invalid or the user has insufficient privileges
  • BUGFIX: Fixed an error that occurred in Write Database with an empty JNDI name
  • BUGFIX: Fixed problems with reconnecting operators after the 'Replace Operator' action
  • BUGFIX: Fixed displayed number of combinations for integer parameters in Optimize Parameters (Grid)
  • BUGFIX: Fixed jumping to correct subprocess when clicking on the cause of a failed process in the error dialog
  • BUGFIX: Generate Attributes can now be stopped
  • BUGFIX: Fixed a bug that occurred when trying to install a non-existent extension via one-click installation
  • BUGFIX: Fixed reading of XLSX files with cells that contain mixed font formats
  • BUGFIX: Now max 100 attributes are shown in regex dialogs to prevent GUI freezes
  • BUGFIX: Fixed a rare bug that occurred while refreshing a remote repository with a remote database

New in RapidMiner Studio 6.4 (Jan 10, 2017)

  • Enhancements:
  • Improved Process history view
  • Connections to RapidMiner Server no longer require equal license editions for Studio and Server. For example, professional-level RapidMiner Studio can now connect to Enterprise-level RapidMiner Server.
  • Improved visual feedback for port and connection interactions in the Process view
  • Drastically improved Process view performance
  • Cleaned up right-click context menu in the Process view
  • RapidMiner Server connections are now editable in RapidMiner Studio
  • Breakpoints in subprocesses are now indicated in the top right corner of the Process view
  • Dragging multiple repository entries into a process is now possible
  • Updated keyboard shortcuts and mouse handling improves Mac user experience
  • Ctrl + Backspace is now available for text inputs and deletes an entire word instead of a single character
  • On opening, problem display only occurs if a critical problem was detected
  • In Select Attribute operators, numeric conditions now ignore blank spaces
  • Improved error message shown when class weights are specified for classes that do not exist
  • Added display of release platform to the About screen
  • Unmanaged extensions are now also loaded from ~/.RapidMiner/extensions if not specified otherwise in Preferences
  • All sample processes have been updated and improved to be compatible with the current version
  • Added new sampling type of automatic to the X-Validation operator
  • Operator search only expands groups with hits inside
  • Operator search is case sensitive when search term starts with an upper case letter
  • API: Added draw decorator and event hooks for the Process view. See ProcessRendererView#addDrawDecorator() and ProcessRendererView#addEventDecorator().
  • Bug fixes:
  • BUGFIX: Safemode dialog on startup is no longer sometimes hidden behind other windows
  • BUGFIX: Update Database now closes database connections after finishing
  • BUGFIX: Restarting after activating a license with more memory now correctly increases available memory on Windows
  • BUGFIX: A more meaningful error message is displayed when an invalid numeric condition is entered as a parameter
  • BUGFIX: Adding new database drivers via the Manage Database Drivers dialog no longer requires a restart
  • BUGFIX: Fixed rare error that could prevent the Manage Database Connections dialog from opening
  • BUGFIX: Fixed broken parameter help content for some operator parameters
  • BUGFIX: Calculation of a SOM-plot can now be cancelled
  • BUGFIX: It is no longer possible to drag operators out of the Process view
  • BUGFIX: Fixed rare error that could occur during automatic operator port connection
  • BUGFIX: Scrolling speed in the Process view is increased
  • BUGFIX: Fixed duplicate entry error in the History view
  • BUGFIX: Fixed Guess Types operator which occasionally took only the last numerical value into account
  • BUGFIX: A more meaningful error message is displayed when using Add generated primary keys for writing to MSSQL databases
  • BUGFIX: Fixed broken Execute Process operator help
  • BUGFIX: Disabled zoom functionality in Histogram Charts
  • BUGFIX: A more meaningful error message is displayed when using the Hyper Hyper operator with invalid input
  • BUGFIX: Principal Component Analysis operator works when applied on special attributes with missing values
  • BUGFIX: Fixed Read Excel operator encoding errors on Windows 8.1
  • BUGFIX: In Excel import wizard, wrong-typed values are parsed as missing instead of causing an error
  • BUGFIX: Removed unused parameter attribute type from Discretize by User Specification operator
  • BUGFIX: Fixed some broken templates and sample processes
  • BUGFIX: Clustering models now work with special attributes that contain missing values
  • BUGFIX: K-Medoids operator now always uses the selected measure type
  • BUGFIX: Fixed rare cases of broken standard coefficients for Linear Regression operator
  • BUGFIX: Right-clicking an operator now selects it before opening the popup menu (Linux/Mac)
  • BUGFIX: When installing extensions from Marketplace, dependencies are only added if not yet installed
  • BUGFIX: Marketplace dialogs now always open in the correct order
  • BUGFIX: The date functions of Generate Attributes operator now add correct metadata for new attributes
  • BUGFIX: Operator text parameter dialogs (e.g., the SQL query dialog) can now be closed by pressing Ctrl + Enter
  • BUGFIX: The log level of the Log view is now correctly restored on each start

New in RapidMiner Studio 6.3 (Jan 10, 2017)

  • Enhancements
  • Progress dialog no longer opens when saving the process to a remote location
  • The file chooser dialog for 'Read Excel' now defaults .xlsx and .xls files
  • 'Write Excel' format is now XLSX instead of XLS
  • The operator 'Execute Process' now shows a button to open the selected process in the parameter view
  • Parameter help is shown in a tool tip window when hovering over the information symbol
  • Histogram Charts now use date instead of numerical axis in case more than one date attribute is selected
  • Added Netezza JDBC support
  • The Application Wizard is now called Accelerator
  • Bug fixes:
  • BUGFIX: The 'Read Salesforce' operator can now handle relationship queries
  • BUGFIX: Operator recommendations now always appear when creating a new process or switching to the Design perspective
  • BUGFIX: Fixed process recovery encoding problem on Windows which could break umlauts and other symbols
  • BUGFIX: Fixed row deletion error in 'Edit Parameter List' dialog
  • BUGFIX: The recent analysis list in Home perspective no longer extends below the visible area of the monitor
  • BUGFIX: Naive Bayes is now handles dates correctly
  • BUGFIX: SVM models can now only be applied on ExampleSets with the same attributes
  • BUGFIX: Stratified sampling with a defined local random seed now produces the same output on every system
  • BUGFIX: The Surface 3D chart now limits the number of data points (to ensure good performance)
  • BUGFIX: The chart for distribution model attributes limits the number of nominal values (to ensure good performance)
  • BUGFIX: Fixed a validation error that occurred when choosing an inverted set of attributes
  • BUGFIX: Fixed a validation error that occurred when 'Execute Process' referenced the operator's process
  • BUGFIX: Fixed local repository not being created in some special cases when starting for the first time
  • BUGFIX: Tooltips now work with modal dialogs after being focused via F3

New in RapidMiner Studio 6.2 (Jan 10, 2017)

  • Enhancements:
  • Added operators 'Publish to App' and 'Recall from App' and a new view 'App Objects' for RapidMiner Server App manipulations
  • Resizing the attribute name column in the Statistics view of process results is now possible
  • New processes can now be saved via save button or ctrl+s
  • Improved error messages for broken custom filters in the 'Filter Examples' operator
  • Improved error message when selecting special attributes in an operator despite special attributes not being included
  • Show Git revision of RapidMiner Studio release in About window
  • Improved speed and behavior of 'Decision Tree' and 'Random Forest' operators
  • API: Introduced AbstractConfigurator which deprecates the Configurator class. The AbstractConfigurator improves parameter dependency handling for Configurables
  • API: removed Encog dependency and all deprecated classes that used Encog
  • API: Added capability to allow parallel processing inside operators
  • Bug fixes:
  • BUGFIX: Fixes problems with single parameter selection for several Java implementations
  • BUGFIX: Fixed opening of stored results via the result history
  • BUGFIX: Operator port tooltips should no longer cover the port
  • BUGFIX: Charts should now display 'Missing' instead of '1.1.1970' for missing values in date attributes
  • BUGFIX: 'Update Database' should throw a more reasonable error message in case the database user lacks permission
  • BUGFIX: 'Neural Net' operator works again when applied on special attributes with missing values
  • BUGFIX: 'Neural Net' can no longer be applied on incompatible data
  • BUGFIX: The expression parser function round() now returns a missing value instead of 0 when applied on a missing value
  • BUGFIX: 'Sample (Bootstrapping)' operator now throws a reasonable error message in case the input example set is empty
  • BUGFIX: Moving colors in the color scheme dialog of Advanced Charts does not save duplicates anymore
  • BUGFIX: Fixed a bug which occurred when an optional password field was left empty
  • BUGFIX: Fixed overwriting an already existing file in Import Binary File Wizard
  • BUGFIX: Fixed a UI problem that occurred when a Collection with empty ExampleSets was displayed
  • BUGFIX: Fixed operator tree display in log view which is shown in case of a process error

New in RapidMiner Studio 6.1 (Jan 10, 2017)

  • Enhancements:
  • Overhauled Repositories view: Now multiple elements can be selected, copied, moved and deleted at the same time
  • Completely revised preferences dialog to make customization of RapidMiner Studio more accessible
  • Drastically sped up Log view for larger logs
  • Improved startup code to reduce launch problems. Also memory settings are now based on the actual free memory when starting for Win32 versions. Furthermore added property in 'System' tab in the preferences where the maximum amount of memory for RM Studio can be configured
  • Improved SQL editor dialog responsiveness
  • It is now possible to ignore meta data for the 'Filter Examples' GUI
  • 'Weight by' operators: The default value of the parameter normalize weights is now false
  • API: Added support for parameter dependencies in the Configurable framework (see Configurator#getParameterHandler())
  • API: Added operator parameter type which can display a file chooser for arbitrary remote file systems (see ParameterTypeRemoteFile)
  • API: Added greater control over preferences internationalization and layout (see SettingsDialog)
  • Bug fixes:
  • BUGFIX: Results containing missing values are sorted correctly
  • BUGFIX: Update Database now throws a meaningful error when the input example set contains no attributes
  • BUGFIX: Improved error message when applying a PCA model to incompatible data
  • BUGFIX: More meaningful error message when a mandatory attribute is not selected
  • BUGFIX: Loop/Optimize parameters are not longer dismissed if selection changes
  • BUGFIX: Distribution Models will no longer be able to be applied on subsets of the training set or sets with same name but other type
  • BUGFIX: Log Operator now uses modern UI to show the result
  • BUGFIX: Fixed Linear Regression matrix calculation corner cases which could lead to missing values for standard error, t-stat, and p-value
  • BUGFIX: Fixed an issue that caused Top Down Clustering to fail
  • BUGFIX: Replace (Dictionary) maps each value only once
  • BUGFIX: Fixed an issue that sometimes caused data in the results perspective to be shown with a null source

New in RapidMiner Studio 6.0.8 (Jan 10, 2017)

  • Enhancements:
  • The performance of the Read XML operator has been increased dramatically, especially for large files
  • OK buttons in dialogs can now be pressed via keyboard (ALT+O)
  • Bug fixes:
  • BUGFIX: K-Means no longer erroneously requires a label
  • BUGFIX: Clicking on the cause in a 'Process Failed' dialog now switches to the design perspective in addition to selecting the operator
  • BUGFIX: The Split Operator now displays an Error when an attribute does not exist
  • BUGFIX: Tooltip windows are now displayed on the same screen as Rapidminer Studio
  • BUGFIX: Fixed rare histogram display error in the Result Statistics
  • BUGFIX: Fixed a bug where custom Perspectives were still visible after their removal
  • BUGFIX: Import Wizards don't freeze anymore when certain encodings are being selected
  • BUGFIX: Reorder Attributes no longer shows a warning for undefined "attribute ordering" when it is not needed
  • BUGFIX: Generate Massive Data now returns correct meta data
  • BUGFIX: Fixed error occurring while opening the SQL query editor when no connection is selected
  • BUGFIX: Dates are now correctly displayed on x-axis of Histogram Charts
  • BUGFIX: Entering a license directly after the last one expired now works again

New in RapidMiner Studio 6.0.7 (Jan 10, 2017)

  • Bug fixes:
  • BUGFIX: Adds missing compatibility level for Aggregate operator

New in RapidMiner Studio 6.0.6 (Jan 10, 2017)

  • Bug fixes:
  • BUGFIX: Fixed problem with prepared statements in Read Database and Execute SQL operators

New in RapidMiner Studio 6.0.5 (Jan 10, 2017)

  • Enhancements:
  • Improved copy and paste functionality of the process editor
  • Added new logging mechanism which can also be used by extensions to display their own logs in the default log view
  • Added parameter to Parse Numbers operator to show an error message or use missing values if a value can't be parsed
  • On lower screen resolutions smaller plot preview icons will be used
  • Aggregate operator throws an error when the example set does not contain attributes selected by the parameter "group by"
  • Improved the ability to stop the process while executing a Join operator
  • Ports of disabled operators are now highlighted to indicate that interaction is possible
  • Loop/Optimize Parameters GUI now automatically selects newly added parameter
  • Refreshing a repository folder is now possible regardless whether a folder or a data entry is selected
  • New chart type added: Web
  • Improved tooltip behavior
  • Improved resizing of subprocesses
  • Added parameter to Loop/Optimize Parameters which specifies how errors occurring in the inner process should be handled
  • Switching perspectives now remembers focused tabs and the position of all scroll bars
  • Bug fixes:
  • BUGFIX: Data readers will no longer automatically choose binominal as the value type to avoid import failures
  • BUGFIX: Saving a process can no longer freeze the user interface
  • BUGFIX: Storing/Reading models in XML representation works again when executing the process on RapidMiner Server
  • BUGFIX: Pasting process xml into the process view directly no longer messes up the layout and the connections
  • BUGFIX: Execute Process: Number of ports shown by operator matches ports used by embedded process.
  • BUGFIX: Weighting Operators which require a label attribute now throw an error if no label is present
  • BUGFIX: Superset and Union operators now fail with a better error message if the special attributes do not match
  • BUGFIX: macro() can now be used in the expression condition at Branch
  • BUGFIX: Loop Repository: using the parent folder name as filtered string does not throw an error anymore
  • BUGFIX: The Cumulative Variance plot for the PCA now displays the correct values
  • BUGFIX: Excel Operators show a human readable Error if wrong sheet is selected
  • BUGFIX: Aggregate now detects DATE_TIME in MetaData
  • BUGFIX: Predefined operator macros are working again
  • BUGFIX: Data import operators of extensions are no longer sometimes displayed as disabled for some licenses
  • BUGFIX: Use correct file filter for Loop Zip-File Entries file chooser
  • BUGFIX: Read and Update Database operators can now be stopped
  • BUGFIX: Generate Macro will no longer add unnecessary zeros to the end of numbers
  • BUGFIX: Reduced logging at Generate Function Set if NaN was generated
  • BUGFIX: Operators which provide a subset selection now show an error if selected attributes are not present
  • BUGFIX: Correct display of operator status when starting a process
  • BUGFIX: Catch errors when trying to parse empty strings to numbers
  • BUGFIX: Remember/Recall operators now use a more sensible default for the io object type
  • BUGFIX: Fixed endless loop in Logistic Regression
  • BUGFIX: Generate Data can now be stopped
  • BUGFIX: Import wizards now ignore the check for duplicate names regarding columns that are disabled
  • BUGFIX: Linear, Quadratic and Regularized Discriminant Analysis can now be stopped
  • BUGFIX: K-Means, Linear Regression and SVM now ignore missing values in special attributes, except for the label
  • BUGFIX: Generate Nominal Data operator can now be stopped
  • BUGFIX: The arrange operators function no longer adds horizontal space between operators unnecessarily
  • BUGFIX: Fixed Filter Examples operator failing on date filters for dates before 1970
  • BUGFIX: The Split operator correctly outputs missing values if the input value was missing
  • BUGFIX: The Replace (Dictionary) operator now displays a meaningful error message if the to or from parameters are left undefined
  • BUGFIX: The displayed error, when using an invalid expression in the Branch operator, now contains a link to the operator
  • BUGFIX: Fixed a rare error while loading extensions on startup
  • BUGFIX: RapidMiner remembers all tabs that are visible and keeps them focused between perspective switches
  • BUGFIX: Tooltips in New Operator Dialog are now correctly formatted
  • BUGFIX: The Loop Repository operator now shows an error when the selected repository location does not exist
  • BUGFIX: Building Block Numerical X-Validation now defaults to shuffled sampling
  • BUGFIX: Improved error handling when pasting an unsupported file into the process editor
  • BUGFIX: More meaningful error message when a wrong attribute is selected in some operators

New in RapidMiner Studio 6.0.3 (Jan 10, 2017)

  • Enhancements:
  • Added new dialog to create and manage various connections
  • Tasks (shown in the lower right corner) should no longer unintentionally block each other
  • Process result display creation should be much faster now
  • Added attribute statistics when hovering over a table header in the example set result view.
  • New order for special attributes in data and meta data result view
  • Execute SQL dialog now has syntax highlight and content assist (ctrl+space)
  • Extension can now declare more than one dependency
  • Added 'unmatched example set' output port to Filter Examples operator which outputs all examples that did not match the specified condition
  • Added parameter to De-Normalize operator to control handling of missing attributes
  • Added parameter to Execute Process which allows to control if process should fail if you define a macro which is not defined in the context of the embedded process
  • Added GUI parameter rapidminer.gui.plotter.default.maximum which defines the maximum size of an example set for which a default plot will be created
  • Bug fixes:
  • BUGFIX: Vote operator should be functional again
  • BUGFIX: Excel 2007 import no longer fails when the sheet contains nominal formula values
  • BUGFIX: Custom filters for the Filter Examples operator should no longer crash when selecting the 'matches' filter on empty input
  • BUGFIX: FindThreshold operator now throws error if the confidence role has the wrong name or does not exist
  • BUGFIX: Fixed bug preventing storage of Lift charts in the repository
  • BUGFIX: Fixed bug in expression parser which did not remove faulty expressions, leading to errors in later runs
  • BUGFIX: Fixed bug that prevented the usage of global process-related macros
  • BUGFIX: Loop Repositories operator can now be stopped
  • BUGFIX: Fixed recent processes being sometimes cut off in the Welcome perspective
  • BUGFIX: Fixed wrong default file extension for directory and file parameters
  • BUGFIX: Fixed rearranging of operators in subprocesses
  • BUGFIX: Fixed bug when creating charts for an empty example set
  • BUGFIX: Optimize Parameters Operator now interrupts with an understandable explanation when no performance values were delivered
  • BUGFIX: Fixed error with password fields when the password is less than 4 characters long
  • BUGFIX: Vector Linear Regression now checks for missing values
  • BUGFIX: Fixed scrolling when moving operators outside of visible area
  • BUGFIX: Support Vector Machine(LibSVM) can now be stopped
  • BUGFIX: Fix result of Join operator with only missing values in ID nominal attribute
  • BUGFIX: Decision Tree operators no longer fail with a cryptic error message when the label attribute contains missing values
  • BUGFIX: Generate Macro no longer proceeds if an error occurred during macro generation
  • BUGFIX: Using undefined macros as operator parameters now causes an error when executing the process
  • BUGFIX: Applying a k-NN model can now be stopped
  • BUGFIX: Logistic Regression (Evolutionary) can now be stopped
  • BUGFIX: NominalToNumerical can now be stopped
  • BUGFIX: Optimize Parameters (Evolutionary) can now be stopped
  • BUGFIX: Polynomial Regression can now be stopped
  • BUGFIX: Remove Duplicates operator can now be stopped
  • BUGFIX: Self-Organizing Map operator can now be stopped
  • BUGFIX: Support Vector Machine (Evolutionary) can now be stopped
  • BUGFIX: In most cases programs executed with Execute Program operator can now be stopped properly
  • BUGFIX: The chart selection menu in the results perspective should no longer appear in strange locations

New in RapidMiner Studio 6.0.2 (Jan 10, 2017)

  • Enhancements:
  • Improved process view bread crumb layout
  • Improved error for operator 'Extract Macro' in case the selected attribute is missing
  • Bug fixes:
  • BUGFIX: It is no longer possible to accidentally overwrite restored processes.
  • BUGFIX: fixed operator 'Execute Process' for subprocesses with empty output
  • BUGFIX: fixed operator 'Generate ID' to provide correct meta data
  • BUGFIX: fixed operator 'Polynominal by Binominal Classification' to provide correct meta data
  • BUGFIX: removed false warning triggered by operator 'Extract Macro'
  • BUGFIX: fixed 'Distribution' plotter for incomplete polynominal data
  • BUGFIX: fixed operator 'Linear Regression' for mismatching data
  • BUGFIX: removed false license expiration warnings on startup when running RapidMiner Studio and Server on the same machine
  • BUGFIX: fixed meta data checks of operator 'Set Minus'
  • BUGFIX: executing a process on RM Server with ALT-F11 shortcut now submits parameter values before starting the process
  • BUGFIX: fixed function expressions for operator 'Generate Data by User Specification'
  • BUGFIX: fixed configuration options for 'Bubble' plotter
  • API: Added experimental mechanism to tear down and reload extensions at runtime

New in RapidMiner Studio 6.0.1 (Jan 10, 2017)

  • Enhancements:
  • Added Data Editor which can be found under 'View' -> 'Show View' -> 'Data Editor'. With it you can manually create and edit example sets in a way similar to spreadsheet applications. Allows editing of values, attribute name/type/role modification, adding/removing attributes and rows, search&replace, filtering and a value generator for attributes.
  • Show perspective names in perspective buttons
  • Improved Home Screen button rendering on low resolutions
  • Show action shortcuts (if existing) in tooltips
  • Bug fixes:
  • BUGFIX: fixed error on loading certain repository entries
  • BUGFIX: fixed broken GUI when trying to enlarge statistics for attribute with only missing values
  • BUGFIX: selecting a filter in the result data table no longer changes row height
  • BUGFIX: fonts used in charts are now able to display Japanese symbols
  • BUGFIX: fixed operator 'Optimize Parameters (Grid)' failing sometimes
  • BUGFIX: removed broken line breaks in result overview
  • BUGFIX: fixed repository 'Open in file browser' option not appearing in preferences dialog
  • BUGFIX: now correctly ignores annotations if 'first row as names' parameter is used in csv import
  • BUGFIX: 'Branch' operator configuration problems are now reported in a more user friendly way

New in RapidMiner Studio 6.0.0 (Jan 10, 2017)

  • Enhancements:
  • Increased compiler level and source level to Java 7

New in RapidMiner Studio 5.3.14 (Jan 10, 2017)

  • Bug fixes:
  • BUGFIX: Fixed 'Support Vector Machine' not complaining when input contained missing values

New in RapidMiner Studio 5.3.13 (Jan 10, 2017)

  • Bug fixes:
  • BUGFIX: Fixed a problem with cursors leakage in Oracle databases

New in RapidMiner Studio 5.3.12 (Jan 10, 2017)

  • Enhancements:
  • All operators that write files to disk will create missing directories
  • Disabling an operator with a sub-process does not disable its children operators anymore
  • Attribute parameters that are marked as mandatory but are not set will now cause an error when executing a process
  • RapidMiner creates a log file which logs exceptions
  • The operator tree will expand again when searching for operators
  • Clustering Algorithms will stop if processes is aborted
  • JDBC drivers updated
  • Macros in the macro view are by default ordered by macro name
  • Adds an API for adding custom functions to the Expression Parser
  • Improved performance of the import wizards
  • Tabs can now by minimized with Alt+Backspace instead of Ctrl+Backspace
  • Removed extensive logging if dockables are missing
  • Neural Net: Improved handling of attribute names
  • k-Means: Improved Metadata handling
  • k-Means: Applying nominal measures to numerical data is not possible anymore
  • Linear Regression: Improved missing values handling
  • Performance (Costs): Metadata checks for missing attributes
  • Map: Reduced the number of warnings shown in the log
  • Rename: Renaming attributes to an already existing attribute name is not possible anymore
  • Aggregate: Fixed error in median function that occured if ignore_missings was checked
  • Read CSV: Renamed 'escape character for quotes' parameter to 'escape character'
  • GSP: shows correct renderer in results perspective again
  • Loop Parameters: Show correct error if process is run without specifying parameters
  • Optimize Parameters: Improved keyboard handling of parameter dialog
  • Update Database: Fixed bug in case no columns are SET
  • Average: Improved error messages
  • Bug fixes:
  • BUGFIX: Fixed problems with uploading binary files to RapidAnalytics
  • BUGFIX: Fixed error in CSV import wizard
  • BUGFIX: Fixed memory leak in process result perspective
  • BUGFIX: Fixed error in Pareto Plotter
  • BUGFIX: Fixed error when calculating Cluster Density Performance of kMeans
  • BUGFIX: Fixed error in auto-wiring
  • BUGFIX: Fixed compatibility issues after copying and pasting operators
  • BUGFIX: Fixed bugs in the Regexp dialog
  • BUGFIX: Fixed bug in the "cut()" expression* Neural Net: Fixed model which gave different prediction depending whether the example set had a label or not
  • BUGFIX: Performance (Costs): Fixed error with missing prediction attribute
  • BUGFIX: Read Arff: Fixed handling of missing values in date attributes
  • BUGFIX: Expectation Maximum Clustering: Fixed missing values handling
  • BUGFIX: GSP: Fixed problems with binominal regualr input attributes
  • BUGFIX: Generate Attributes: Fixed removal of attributes if overwriting attributes, keep_all parameter removes all attributes
  • BUGFIX: Loop Parameters: Fixed parameter editor dismissing values
  • BUGFIX: Send Mail: Fixed bug which caused an error after the password is encrypted
  • BUGFIX: Numerical to Date: Fixed error in the attribute selector

New in RapidMiner Studio 5.3.10 (Jan 10, 2017)

  • Enhancements:
  • Plot view now shows preview images of the available standard plotters

New in RapidMiner Studio 5.3.9 (Jan 10, 2017)

  • Enhancements:
  • Added rm.log logfile in .RapidMiner5 folder (found in the user home folder) for easier error diagnosis. The log will be overwritten each time RapidMiner is started.
  • Bug fixes:
  • BUGFIX: Fixed some operators only showing the Annotations renderer in the results view, e.g. the Generalized Sequential Patterns operator
  • BUGFIX: Fixed 'Update Database' operator to also work when all attributes are set as ID attributes
  • BUGFIX: Disabling an operator which contains operators in a subprocess no longer disables its inner operators
  • BUGFIX: Pasting process xml into the XML View and clicking "Apply" will now also work when whitespaces have been inserted before the xml
  • BUGFIX: Exporting images/pdfs is more robust now
  • BUGFIX: Fixed 'Send Mail' operator failing with SMTP passwords set
  • BUGFIX: Fixed visual glitch when editing a metadata column header in the Import Wizards and resizing a left-handed column at the same time

New in RapidMiner Studio 5.3.8 (Jan 10, 2017)

  • Enhancements:
  • Plugins can hook into the operator and port context menus to add own entries
  • Bug fixes:
  • BUGFIX: Fixed memory leak in the "Fill Data Gaps" operator
  • BUGFIX: Fixed error in regular expression dialog
  • BUGFIX: Several fixes and improvements for the data import wizard
  • BUGFIX: Ignoring missings in the count function will actually ignore the missings in the count and will not display missing for the whole group

New in RapidMiner Studio 5.3.7 (Jan 10, 2017)

  • Enhancements:
  • The current process is auto-saved, and the last edited process can now be restored after RapidMiner has terminated abnormally
  • Moved "Close all Results" button in the Results view to the popup menu on result tabs (right-click on a result tab header)
  • It is now possible to store different credentials for multiple RapidAnalytics repositories which all point to the same RapidAnalytics
  • Support for Vertica database (JDBC driver not shipped with RapidMiner)
  • Buttons from the "RapidAnalytics Proceesses" toolbar are moved to the context menu (Stop process, Open result, etc.)
  • Newly created repositories must have unique names now
  • Bug fixes:
  • BUGFIX: Certain dialogs should be no longer too big when using HD-ready resolution or 1024x768 (e.g. for presentations)
  • BUGFIX: Process should no longer be flagged as changed when entering/leaving a subprocess via undo/redo
  • BUGFIX: It is no longer possible to create multiple local repositories in the same location
  • BUGFIX: Fixed error when trying to import a binary file into a newly created folder
  • BUGFIX: Changes to the user credentials in the "Configure Repository" dialog will now take effect without having to restart RapidMiner

New in RapidMiner Studio 5.3.6 (Jan 10, 2017)

  • Bug fixes:
  • BUGFIX: NPE in CrossDistanceOperator for nominal attributes
  • BUGFIX: In the Rename operator the check for existing attribute names will not check the role name

New in RapidMiner Studio 5.3.5 (Jan 10, 2017)

  • Enhancements:
  • Changed drop target highlighting color from light-blue to light-orange
  • Process panel will not be highlighted anymore when dragging folders from the Repository View
  • Added a GUI preference option for drag target highlighting. Now it is possible to select whether the full target, the target's border or nothing at all should be highlighted.
  • In the Repository View it is no longer possible to drag Repositories.
  • Process size is now automatically adjusted when entering a subprocess or when loading a process.
  • When installing extensions, all licenses can be accepted with one click in a new license dialog.
  • Default connection timeout settings changed
  • Bug fixes:
  • BUGFIX: Recursive creation of parent directories works properly again in all cases
  • BUGFIX: 32-bit Windows version of RapidMiner will now have more than 64 MB memory available when installed on a x64 system
  • BUGFIX: Write Database operator table name parameter allows custom names again
  • BUGFIX: Fixed disabled operator not always being activated when connected
  • BUGFIX: Fixed "Click to branch" popup sometimes appearing at the wrong position
  • BUGFIX: Errors during Repository movements should no longer break RapidMiner in headless mode
  • BUGFIX: Fixed some error messages and dialogs
  • BUGFIX: Process is no longer marked as changed when entering a subprocess
  • BUGFIX: Loading documentation from Wiki never blocks UI

New in RapidMiner Studio 5.3.0 (Jan 10, 2017)

  • Enhancements:
  • Development JDK was switched to Java 7 but code still is compatible with Java 6.
  • Added new and improved extensive documentation (often including tutorial processes) for almost all operators
  • Added improved RapidAnalytics support. New run button: "Run process on RapidAnalytics". Can only be used if the process is stored on a RapidAnalytics repository. Instantly runs the process on the RapidAnalytics server the process is stored on
  • Connecting ports in reverse order is possible now (Input port -> output port)
  • Run on RapidAnalytics-Dialog can choose execution queue
  • New operators: Create Archive File and Add Entry to Archive File allow to create zip files
  • New operator: Performance to Data
  • New operator: Throw Exception
  • New file system operators: Copy File, Move File, Delete File, Rename File, Create Directory
  • New operators for handling annotations: Annotate, Annotations to Data, Data to Annotations, Extract Macro from Annotation
  • Aggregate: new aggregation functions: sum (fractional), count (fractional), count (percentage) and string concatenation
  • Execute Program operator has File Object ports for stdin, stdout, stderr
  • Loop Attributes: new output port which collects the data from all iterations
  • Macros can be passed through the command line. Example: 'rapidminer //repository/home/test/process -Mkey1=value "-Mkey2=value with spaces"' will provide two macros named key1 and key2
  • Result Perspective: Added button in top right corner of a result tab to close all open results at once
  • Repositories View: Added button which navigates to the repository location of the currently opened process
  • Repositories View: Added popup menu item to open the selected entry in the OS file browser (only for Local Repositories)
  • Added new view: 'Macros': This view shows macros and their values in real time during process execution
  • More consistent handling of input sinks and output sinks of processes and subprocesses: Sinks can be moved up or down by dragging them with the left mouse key while shift key is pressed. Removed double click on process input sink, all actions on process sinks can now be trigged via a popup menu
  • Added resize button to text parameter editor dialog
  • Added resize button to text parameter editor dialog
  • Preferences menu buttons reworked: 'OK' button saves settings permanently and closes the dialog; 'Apply' button saves settings permanently and does not close the dialog
  • Generate Data: warn user if the selected number of attributes is not supported by the target function
  • The RM window title now shows the complete location of the current process to avoid confusion with multiple processes with the same name
  • Trying to create a new process/open another process/exit RM while a process is running now requires user confirmation
  • Local repositories which are inaccessible for any reason now have an annotation which shows that they are inaccessible
  • 'Remote' repositories are now called 'RapidAnalytics' repositories.
  • Plugin loading: when loading manually installed plugins from the webstart or plugins folder with multiple versions, load the one with the highest version number
  • Added database metadata caching to improve performance. If you need to clear the cache, use the menu item 'Clear Database Metadata Cache' under the 'Tools' menu.
  • Declare Missing Values: allow to declare an empty string as missing, and ignore attributes which are not compatible to the selected mode.
  • Extract Macro: added optional list parameter to add unlimited macro name/value pairs when 'macro type' is set to 'data_value'.
  • 'Synchronize Meta Data with Real Data' toggle button added in the top right corner of the process view. This will now propagate the real meta data to all reached input ports after process execution. This means that for example the operator after a breakpoint will have its meta data synched with the real data, therefore enabling e.g. attribute selection parameters to show the list of paramters available. Known Issue: Currently this information is lost once the operator updates itself, which happens for example if you deselected/select it again.
  • Loop Until: Added 2 checkboxes in order to choose whether you want a condition check depending on the example set or on the performance. 'condition before' is now deselected by default.
  • Select Parameters Dialog: Parameters from type ParameterTypeString are now treated like numerical Parameters (the Grid option is enabled now), in case you want to assign a row of numerical values to a String Parameter.
  • Select Parameters Dialog: All acceptable parameter types are now shown, even if the continuous/discrete mode is enabled.
  • Regular Expression Dialog: Dialog has a new tab: Regex Options. In this tab, the user can define options like multiline mode or case-insensitive matching. These options will be added to the pattern though embedded flag expressions.
  • Added new shortcut to toggle the breakpoint before an operator (Shift+F7)
  • Improved many error messages
  • Update Dialog: Revamped update dialog to show various lists of packages (search, most popular, bookmarks, etc. as well as functionality to log in/out)
  • Added a startup check for purchased but not installed extensions and a property to disable the check. Those Extensions can be directly installed from the dialog.
  • RapidMiner enters safe mode (not loading plugins) when startup was interrupted
  • Added new 'Export as PDF' action to the 'Print results or export' dropdown button
  • Deleted tooltips for the operator list of the OptimizeParameters (Grid) operator
  • UpdateDialog: Switched the positions of the install button and the link to the extension homepage
  • UpdateDialog: Checks if the user purchased the extension when returning to the dialog after hitting the "purchase" button
  • Replaced the standard Random function in the ExpressionParser with a custom one in order to involve the random seed/the RandomGenerator for the process
  • Updated JTDS driver to version 1.2.5
  • Loop Collection Operator has new parameters: 'set iteration macro', 'macro name' and 'macro start value'
  • Execute Process Operator: inverted the default values for all boolean parameters
  • Repository names now enforce a blacklist of invalid characters
  • Operators "Execute Process" and "Retrieve" are now named after the files your drop into the Process window
  • Nominal to Numerical: Parameter "default coding" is now set to dummy coding per default
  • Changed the help menu entry "Update RapidMiner" to "RapidMiner Marketplace"
  • Improved Remote Repository authentication
  • Improved data import wizards
  • Added default dialog options when running a process
  • Added attribute selector for the Extract Performance Operator
  • Database access: when several database connections with the same name exist, the once provided by the same server as the process is preferred
  • Bug fixes:
  • BUGFIX: Recent files are now updated on process save
  • BUGFIX: The condition on performance check at the Loop Until Operator now works properly if the performance decreases
  • BUGFIX: Editing context variables now immediately flags the process as changed
  • BUGFIX: If set to 'ask', the 'close previous results' dialog will no longer appear when resuming from a breakpoint
  • BUGFIX: Breakpoints can no longer be added to the root operator
  • BUGFIX: 'Store process here' popup menu action on another process will now correctly flag the process as saved
  • BUGFIX: Added error message for 'Find Threshold' operator when an invalid class name is entered as parameter
  • BUGFIX: The Pivot operator used on an empty example set no longer creates an example set with one example (filled with missing values)
  • BUGFIX: The De-Pivot operator now has much better error handling when trying to setup the index attribute as an already existing attribute
  • BUGFIX: Read Excel cannot open the Import Configuration Wizard, if the excel_file parameter is not set
  • BUGFIX: Nominal to Binominal can handle border cases with mapping containing less than 2 values
  • BUGFIX: Configure Repository dialog now saves user credentials fore remote repositores again
  • BUGFIX: Added error shown in Problems view when entering invalid regular expression for 'replace what' parameter of Rename by Replacing operator
  • BUGFIX: When moving a repository entry to another location which contains an entry with the same name asks for overwrite instead of showing error
  • BUGFIX: Execute Process operator potential error reporting improved
  • BUGFIX: Creating folders in repository with the same name but different capitalization (e.g. 'test' and 'Test') is now forbidden
  • BUGFIX: filtering numerical and date attributes with Filter Examples is possible again
  • BUGFIX: Wrong parameter format in Clone Parameters causes exception
  • BUGFIX: Fixed GUI problem when trying to schedule a process on RapidAnalytics without existing RapidAnalytics repositories
  • BUGFIX: Fixed possible data loss when trying to store data/processes/etc in the repository using invalid characters for the given filesystem by now showing an error instead of failing silently
  • BUGFIX: Fixed possible data loss when trying to move repository entries into their own folder
  • BUGFIX: fixed an initialization problem in the Cross Distances operator, which caused wrong calculation of distances in some rare cases
  • BUGFIX: Quickfix Dialogs no longer vanish right after showing (RCOMM2012)
  • BUGFIX: RapidMiner no longer blocks for a varying amount of time if a connection to an online server fails
  • BUGFIX: Focus issue with delete action
  • BUGFIX: Import Binary File no longer freezes the GUI
  • BUGFIX: New Plotters showed a "This should not happen" message once in a while and where unusable until restart of RapidMiner (Bug #1274)
  • BUGFIX: Real to Integer operator: don't convert missings to 0, but keep them as missing
  • BUGFIX: Fixed wrong cell selection on rightclick for some tables after reordering columns
  • BUGFIX: Fixed startup failure when trying to start RapidMiner with broken Plugins
  • BUGFIX: Optimized update routine. Instead of failing with an cryptic error if no admin rights available, RapidMiner no shows an dialog that asks for admin rights
  • BUGFIX: Fixed the "Select for installation" button: Error when content/behaviour changed according to factors like if the extension was purchased. Now leads to the extension website when purchased but not installed. Now reacts properly to double-clicks in the extension list The "Install" button turns to a disabled state when no extensions are marked for installation
  • BUGFIX: AccountService is only queried when we are logged in. The login state is saved internally
  • BUGFIX: Process undo steps are now reset when a new process is opened
  • BUGFIX: Using undo in a subprocess will no longer reset the view to the top-level of the process
  • BUGFIX: The welcome perspective now updates the recent files list, so opening a process via it now opens the correct selected process
  • BUGFIX: Drag&Drop from the OS to the RapidMiner Process design canvas now also works for .xlsx files
  • BUGFIX: Fixed several repository problems when trying to overwrite entries with themselves
  • BUGFIX: UpdateDialog: Purchase link now changes to "install" after logging in and the purchase button now redirects to the extension website
  • BUGFIX: Generate Aggregation operator can now handle the case of zero matching attributes
  • BUGFIX: Fixed several key shortcuts that worked in the wrong perspective. For example, it is no longer possible to delete operators while in the result perspective
  • BUGFIX: Drag&Drop of files (e.g. .csv/.xls) creates the corresponding read operator with the now correct filename parameter
  • BUGFIX: Drag&Drop of operators should no longer create operators halfway outside the process canvas
  • BUGFIX: Import wizards will no longer overwrite existing data without asking for permission first
  • BUGFIX: Import wizards will no longer accept wrong file types/invalid filenames in the first step
  • BUGFIX: Read SAS operator no longer causes an internal error when the data file could not be read
  • BUGFIX: "Wiki" links in the documentation tried to open a tutorial process, now open the corresponding wiki page
  • BUGFIX: Macro Editor will now remember entered values without having to press enter
  • BUGFIX: Opening the context menu on result tables will no longer deselect the currently selected cells
  • BUGFIX: Pressing "Delete" in a subprocess with the surrounding operator selected will no longer result in deletion of the whole subprocess

New in RapidMiner Studio 5.2.8 (Jan 10, 2017)

  • Enhancements:
  • Send Mail operator has a new behavior: Stop process and show error if mail cannot be send
  • Send Mail operator has a new parameter: 'Ignore errors'
  • Write CSV operator has a new parameter: 'Append to file'
  • Write Excel operator has a new parameter: File Format (xls, xlsx)
  • New Operator: Reorder Attributes
  • Set Macro: now can define empty macros
  • Bug fixes:
  • BUGFIX: Fix a "This should not happen" message in the Advanced Charts.
  • BUGFIX: Keep old settings after updating RapidMiner
  • BUGFIX: 'Cancel' ParameterTypeList Dialog works correctly now
  • BUGFIX: FP-Growth correctly handles parameter 'must contain'
  • BUGFIX: Join Operator displays MetaData correctly

New in RapidMiner Studio 5.2.6 (Jan 10, 2017)

  • Enhancements:
  • Read SAS operator
  • Added timezone parameter to Date to Nominal operator
  • Excel 2007 support
  • Dialog for editing cron expressions
  • Dialog to see preview for regular expressions
  • Update Database operator
  • Easier mechanism for Extensions to create configurable items
  • Japanese and German translation

New in RapidMiner Studio 5.2.2 (Jan 10, 2017)

  • Enhancements:
  • Remove Correlated Attributes uses deterministic random numbers
  • Improved Repository Tree handling (save expansion state on refresh and improved tree selection on entry removal)
  • Improved exporting of Advanced Charts View

New in RapidMiner Studio 5.2.1 (Jan 10, 2017)

  • Enhancements:
  • Added operators to manage repository entries: Copy, Move, Delete, Rename

New in RapidMiner Studio 5.2.0 (Jan 10, 2017)

  • Enhancements:
  • Added "Advanced Charts" view
  • Superset and Union operator can handle special attributes
  • Catch block subprocess for Handle Exception operator
  • Database connections can define driver properties
  • XML import
  • Join operator can operate on multiple columns
  • Easier bug reporting: Direct connection to Bugzilla
  • Aggregation Operator now supports default Aggregation for a set of attributes and is implemented more efficiently
  • The last edited process can now be restored after RapidMiner has terminated abnormally
  • Added "File" objects to pass to reader operators.
  • Added operators to open files and URL connections
  • Added operators to iterate ZIP files
  • Added new Operators:
  • Denormalization Operator
  • Remove Unused Values Operator
  • Loop Repository
  • Open File, Write File, Loop Zip-File Entries
  • Read Excel with Format

New in RapidMiner Studio 5.1 (Jan 10, 2017)

  • ENHANCEMENTS:
  • Added RapidAnalytics connectivity
  • Added new repository type that reflects database connections
  • Added type-specific icons to repository tree
  • Added annotations to IOObjects
  • Import operators and wizards remake
  • Most wanted feature: "Rename" and "Set Role" can handle multiple attributes at a time
  • Versioned operators allow easier updates
  • "Generate Attributes" has new UI and supports more text and date functions
  • Operator documentation uses Wiki (http://rapid-i.com/wiki/).
  • IOObjects can be annotated, e.g. with file source or SQL statement
  • Database operators can prepare statements
  • Revised import wizards
  • Background tasks stoppable
  • Added process profiling and resource consumption annotations
  • Added Support for R Extension
  • Added new boolean GUI property rapidminer.gui.fetch_data_base_table_names which suppresses to fetch data base table names in the SQLQueryBuilder
  • More efficient meta data handling for Excel, CSV, and database readers
  • Meta data propagation uses context macros throw new UserError (this, "move_file.exists", destinationFile);
  • Splash screen shows plugins
  • Aggregate operator can compute product
  • Various smaller fixes
  • Various UI improvements
  • Added new Operators:
  • Print to Console
  • Unset Macro
  • "Auto MLP" and "k-Means (fast)" contributed by DFKI
  • Hierarchical Classification
  • Numerical to Date
  • Delay
  • BUG FIXES:
  • Fixed memory leak causing RapidMiner to run out of memory if processed many and large example sets
  • Re-added descriptive error messages

New in RapidMiner Studio 5.0 (Jan 18, 2010)

  • Added an operator for performing a local polynomial regression
  • Added an operator for calculating weights using a local polynomial regression.
  • Added an operator for extracting the cluster centroids or prototypes from a flat cluster model.
  • Added an operator for calculating the cross distances between example sets.

New in RapidMiner Studio 5.0 Beta (Jan 18, 2010)

  • Redesigned graphical user interface comprising a docking framework to freely layout GUI components and save different layouts in multiple perspectives.
  • New visual representation of processes, i.e. a graph-based
  • flow layout which allows to define and observe the actual
  • data flows in processes in a very intuitive way.
  • Added automatic generation, propagation and transformation
  • of meta data simulating the actual data handling in
  • RapidMiner at design time allows for much less error-prone
  • process design. Quickfixes provide hints and solutions in
  • case of inaccurate parameter settings, erroneous operator
  • usage, etc.
  • Added repository module allowing to conveniently store,
  • manage and archive processes, data, models and any other
  • arbitrary data object in RM.
  • New process context provides a new way to define the inputs
  • and outputs of processes and allows a better integration
  • and sharing of processes in distributed settings.
  • New result history view provides an overview of recent
  • process results.
  • Consolidated operator names and implemented a reasoned
  • operator naming scheme which provides easier access for
  • beginners as well as experienced RapidMiner users.

New in RapidMiner Studio 4.5 (Jan 18, 2010)

  • New Weka version (as of September 21st, 2009)
  • Implementation Details:
  • New properties for additional ioobjects.xml
  • Bugfixes:
  • Fixed bug for reporting images smaller than 800 x 600
  • Fixed class loader problem occurring when more than
  • one plugin was used
  • Fixed bug for iterative operator chain
  • Fixed XML export bug where XML in parameters was not
  • properly escaped
  • Fixed bug in parallel cross validation

New in RapidMiner Studio 4.4.2 (Jan 18, 2010)

  • New operators:
  • FormulaExtractor
  • Trend
  • LagSeries
  • VectorLinearRegression
  • ExampleSetMinus
  • ExampleSetIntersect
  • Partition
  • Script
  • Drastically reduced access times for attribute retrieval
  • by name
  • Improved the operator Aggregation in terms of speed and
  • memory consumption
  • Improved the operator ExampleSetJoin, correct inner join
  • for example sets with non-equal numbers of ids, added
  • left and right outer joins
  • Latest version of Weka (as of 2009/07/11)
  • Latest version of MySQL JDBC driver (5.1.17)
  • Implementation Details:
  • Updated to new versions of Jung and JFreeChart
  • Bugfixes:
  • Fixed bug in Split operator for ordered splits where
  • shorter sequences did not get filled with ?
  • Fixed bug in FeatureSubsetIteration where not all
  • subsets where used during the iteration

New in RapidMiner Studio 4.4.1 (Jan 18, 2010)

  • New operators:
  • ForwardSelection
  • NeuralNetImproved
  • KernelNaiveBayes
  • ExhaustiveSubgroupDiscovery
  • URLExampleSource
  • NonDominatedSorting
  • Deprecated operators:
  • NeuralNet (use NeuralNetImproved instead)
  • NeuralNetSimple (use NeuralNetImproved instead)
  • Deprecated operators are also shown in context menu with
  • a light gray color now
  • The notification mail at the end of a process can now
  • also be sent by SMTP instead of sendmail
  • Most file based data input operators now provide an option
  • to skip error lines
  • Most file based example source operators (Arff, Excel,
  • DasyLab, Stata, SPSS, XRFF) as well as the IOObjectReader
  • and the new URLExampleSource now accept URLs instead of
  • a filename for the input source location
  • All discretization models now support the definition of
  • the desired number of digits for automatic interval name
  • determination
  • The LiftParetoChart now supports the definition of the
  • number of digits for the confidence intervals
  • Improved time display in status bar
  • Enabling / Disabling operator now works with CTRL-E
  • Fixed several issues in GUI thread handling which
  • might have lead to deadlocks and long GUI updates
  • on certain systems
  • Clean-up of nominal value mappings in process log table
  • in case of sorted top-k for reduced memory footprint
  • Implementation Details:
  • DistanceMeasure creation now is based on the operator
  • and gets the input container as well
  • Bugfixes:
  • NeuralNet and NeuralNetSimple did not properly work
  • on regression problems. While NeuralNetSimple could
  • be fixed, a new operator NeuralNetImproved is now
  • provided which should be used instead of NeuralNet
  • and NeuralNetSimple. Since this operator is also faster
  • and more scalable, it should be used instead of the
  • both old (and now deprecated) neural net
  • implementations
  • Fixed bug in renaming where decimal point characters
  • got lost
  • Fixed issue in model applying leading to a wrong
  • remapping of the label values afterwards if an
  • independent test set was used. Important: this bug
  • did not deliver wrong predictions but simply changed
  • the label values displaying.
  • Fixed several issues in GUI thread handling which
  • might have lead to deadlocks and long GUI updates
  • on certain systems
  • Fixed bug in bar chart for numerical group by columns
  • Fixed bug in DasyLab example source which sometimes
  • led to doubled characters at the end of feature names
  • Fixed bug in OperatorSelector for macro usage

New in RapidMiner Studio 4.4 (Jan 18, 2010)

  • New operators:
  • ExampleSetSuperset
  • ExampleSetUnion
  • MacroConstruction
  • CumulateSeries
  • FastLargeMargin
  • Split
  • Construction2Names
  • NeuralNetSimple
  • Parameters will now be adapted according to an operator
  • rename, for example the settings of operators like
  • the ProcessLog or the parameter optimization operators
  • are automatically corrected to the new operator names
  • Graphs like the similarity graph display the strengths
  • of the edges now by their color
  • Added new tree layout algorithm for the decision trees
  • preventing most overlapping, the old tighter version
  • is available as layout type "Tree (Tight)"
  • Decision trees now show the subtree size as tool tip
  • for the inner nodes, the edges are now darker for
  • larger subtrees and brighter for smaller ones
  • Decision trees are learned faster now due to internal
  • optimizations in the splitted example set handling
  • Tables like the (meta) data view now supports a new
  • context menu for common table operations like column
  • sorting or row / column selection
  • The "New Operator" dialog now also supports full text
  • search in the description texts of the operators
  • RapidMiner now stores all parameter values in the
  • process files including the default values which ensures
  • a better compatibility with future versions. The XML tab,
  • however, only shows the values differing from the default
  • Plugins can now define a class com.rapidminer.PluginInit
  • providing a method "initPlugin()" which will be invoked
  • during plugin initialization
  • Univariate and multivariate series windowing operators
  • now also support nominal attributes and even mixed
  • types in cases where the series is represented by
  • the examples (rows) of the data set
  • The range statistics of nominal attributes in the
  • meta data view now shows the values with highest and
  • lowest occurrency counts, sorts the values according
  • to the counts, and displays only an excerpt of the
  • occurring values if large amounts of different values
  • exist
  • List of recent files is now directly saved after opening
  • a new process and not only during shutdown
  • Changes in the process setup are now allowed even during
  • process runtime, e.g. when waiting at a breakpoint
  • NaiveBayes can now handle new nominal values during the
  • model application phase
  • Deprecated operators are now rendered with a gray color
  • in the new operator tab and dialog
  • Updated to the latest version of Weka (as of February 26th,
  • 2009)
  • Updated to the latest version of Joone, optimized some
  • of the neural network default parameters
  • Added some new sample processes to the sample directory
  • as well as to the tutorial
  • ExampleFilter and most important discretization parameters
  • are no longer expert parameters
  • ArffExampleSource now states an error message in cases
  • where attributes containing a space which is not quoted
  • New binominal classification performance measures:
  • positive predictive value
  • negative predictive value
  • psep
  • Implementation details:
  • SplittedExampleSet has been improved leading to
  • faster data access times for operators like cross
  • validation or decision tree learning
  • Plugins can now define a class com.rapidminer.PluginInit
  • providing a method "initPlugin()" which will be invoked
  • during plugin initialization
  • Bugfixes:
  • fixed bug accuracy criterion for the revised decision
  • tree learner
  • Fixed bug in parameter list of ValueSubgroupIterator
  • Fixed bug in ExceptionHandling which sometimes led to
  • doubled outputs
  • Fixed bug in ProcessBranch which sometimes led to
  • doubled outputs
  • ViewAttributes did not add min and max statistics
  • so that those statistics where not calculated on
  • data table views
  • Fixed bug in Windows GUI start script (linebreak)
  • Fixed bug for surface 3D plot where x and y were
  • replaced by each other
  • Fixed paths to icons for building blocks
  • Fixed issue with ROC plots in cases where several
  • points with same confidence occurred
  • Fixed potential thread deadlock during the filling
  • of the plotter list
  • Fixed bug for distance weighted vote and k = 1
  • in NearestNeighbors
  • Fixed a bug in ChiSquaredWeighting for mixed-type
  • data sets where the number of bins was smaller than
  • the maximum number of nominal values
  • The default global random seed in the preferences
  • dialog was not allowed to be set to -1
  • The property keys of the preferences dialog were
  • editable
  • Fixed bug in PolynomialRegression
  • Range normalization now delivers maximum value
  • for constant attributes
  • Weighted precision and recall do now no longer
  • deliver NaN if a class did not occur

New in RapidMiner Studio 4.3.2 (Jan 18, 2010)

  • New operators:
  • LinearDiscriminantAnalysis
  • QuadraticDiscriminantAnalysis
  • RegularizedDiscriminantAnalysis
  • DasyLabExampleSource
  • FileIterator
  • ExceptionHandling
  • ChangeAttributeNamesReplace
  • ChangeAttributeNames2Generic
  • DateAdjust
  • MinMaxBinDiscretization
  • RainflowMatrix
  • Deprecated operators:
  • DirectoryIterator (use FileIterator instead)
  • Renamed parameters:
  • ExampleSetWriter:
  • quote_whitespace is now named quote_nominal_values
  • ExampleSetMerge can now handle missing values
  • RapidMiner does now better support counts for the in-
  • and output types which should considerably reduce the
  • amount of warnings if operators like IOConsumer,
  • IOMultiplier or ExampleSetMerge (reducing several objects
  • of the same type to one of the same) are used
  • FileIterator replaces DirectoryIterator and adds many
  • new features like recursive iteration, file name based
  • filtering, and a new macro for the parent path
  • Centroid based clusterings now support assigning unseen
  • examples to the nearest cluster on apply time
  • ProcessBranch now supports a branching with respect
  • to the existance of an input object
  • ClearProcessLog now also allows to remove the complete
  • logging table
  • The logging tables of the ProcessLog operator will now
  • not be generated during start up but during the first
  • operator usage (and also during the following if the
  • table was deleted in the meantime, e.g. in a loop)
  • Added support for different time zones, users can now
  • define the preferred time zone in the settings dialog
  • and time conversion operators are not able to respect
  • this setting
  • Date and times are now displayed in the system's local
  • settings
  • New plotter: Block
  • Added support for applying a log scale for the color
  • column for the Scatter plot and the new Block plotter
  • Data tables like those generated by the process log
  • are now de-coupled from the table used for plotting
  • preventing that the rows will be sampled and rows
  • would be removed from the data table
  • A double click on the region between two columns in
  • the table header now automatically resizes the left
  • column to a fitting size (known from Windows programs)
  • A double click on the same region while pressing CTRL
  • will resize all table columns according to the contents
  • GuessValueTypes now only works on regular attributes
  • and provides a parameter for extending it on the special
  • attributes (work_on_special)
  • AttributeFilter now also provides a new parameter
  • work_on_special
  • The operator Replace now also allows empty replace_by
  • values
  • The ExampleSetJoin operator now also works if the
  • id of the first example set is not part of the second
  • Guess value types can now handle missing values
  • CSVExampleSetWriter now supports the parameter quote_nominal
  • All feature selection and weighting operators now also
  • provide the possibility to log the names of the features
  • of the current generation's best individual
  • The Replace operator now supports capturing groups
  • The file based example source operators (ExampleSource,
  • SimpleExampleSource, CSVExampleSource...) now better
  • supports quoted strings and also escaped quotes (escaping
  • with ")
  • Implementation details:
  • The method Tools.quotedSplit(...) should now be used
  • instead of a regular split followed by the method
  • Tools.mergeQuotedSplits(...)
  • Bugfixes:
  • fixed bug in DBScan for empty cluster models
  • fixed bug for simple sampling in cases where a local
  • random seed was used
  • fixed bug in process logging to files which prevented
  • the writing of the first logged result
  • fixed bug in PSO optimization for cases where the fitness
  • should be minimized instead of maximized
  • fixed bug in binary performance measure which was not
  • delivering the fitness for specificity, sensitivity,
  • and youden index
  • fixed bug in meta data table viewer in cases where huge
  • numbers of long nominal values existed which caused a
  • crash of the Java Virtual Machine in some cases

New in RapidMiner Studio 4.3.1 (Jan 18, 2010)

  • New operators:
  • RemoveDuplicates
  • Cluster2Prediction
  • DirectoryIterator
  • TextObjectWriter
  • TextObjectLoader
  • TextExtractor
  • SingleTextObjectInput
  • TextCleaner
  • TextObject2ExampleSet
  • TextSegmenter
  • AddAttribute
  • SetData
  • EMClustering
  • AttributeWeights2ExampleSet
  • TransitionGraph
  • DatabaseExampleVisualizationOperator
  • Revised decision tree learning which lead to drastically
  • reduced runtimes and better tree models in terms of
  • generalization capabilities
  • The bar chart now displays the category as label in the
  • domain axis
  • Removed plotter: Bars 3D
  • The IOObjectReader now allows the definition of the expected
  • output type
  • The LiftParetoChart does no longer re-apply the input model if
  • a predicted label does already exist
  • Added the ability to "explode" tiles of pie and ring charts
  • Added several new options for the reporting operators of the
  • RapidMiner Enterprise Edition as well as true parameter handling
  • including type checks
  • Updated to latest release of Jung
  • Fixed GUI related memory leaks
  • Implementation details:
  • The class AttributeWeightsCreator was renamed to
  • ExampleSet2AttributeWeights
  • Bugfixes:
  • Fixed a combination of GUI and process thread related
  • memory leaks
  • Fixed bug in Series Multiple Plotter which prevented
  • rescaling
  • Pie and Bar charts used class limit instead of legend
  • limit in order to decide if the legend should be shown
  • special format in ExampleSetWriter ignored quote
  • whitespace setting
  • bug in XVPrediction fixed

New in RapidMiner Studio 4.3 (Jan 18, 2010)

  • New operators:
  • AccessExampleSource
  • Example2AttributePivoting
  • Attribute2ExamplePivoting
  • PolynomialRegression
  • Similarity2ExampleSet
  • ExampleSet2SimilarityExampleSet
  • Nominal2String
  • String2Nominal
  • Date2Numerical
  • Real2Integer
  • Numerical2Real
  • Nominal2Numerical
  • Numerical2Binominal
  • Numerical2Polynominal
  • - AbsoluteDiscretization
  • ConditionedFeatureGeneration
  • AttributeAggregation
  • SupportVectorCounter
  • MutualInformationMatrix
  • GaussFeatureConstructionOperator
  • ProductGenerationOperator
  • AbsoluteValues
  • MovingAverage
  • ExponentialSmoothing
  • SeriesMissingValueReplenishment
  • DifferentiateSeries
  • IndexSeries
  • Numerical2Real
  • Real2Integer
  • FillDataGaps
  • EnsureMonotonicity
  • WindowExamples2ModelingData
  • WindowExamples2OriginalData
  • ProcessLog2AttributeWeights
  • Mapping
  • Substring
  • Trim
  • Replace
  • AddValue
  • MergeValues
  • AttributeConstruction
  • ValueIterator
  • IOStorer
  • IORetriever
  • SQLExecution
  • ClearProcessLog
  • ProcessLog2ExampleSet
  • Data2Performance
  • Data2Log
  • Macro2Log
  • DataMacroDefinition
  • LiftParetoChart
  • Deprecated Operators:
  • Nominal2Numeric (please use Nominal2Numerical instead)
  • Numeric2Binominal (please use Numerical2Binominal instead)
  • Numeric2Polynominal (please use Numerical2Polynominal instead)
  • LinearCombination (please use AttributeAggregation instead)
  • AttributeValueMapper (please use Mapping instead)
  • AttributeValueSubstring (please use Substring instead)
  • AddNominalValue (please use AddValue instead)
  • MergeNominalValues (please use MergeValues instead)
  • New implementation of clusterings for more efficient computing and memory usage:
  • Reimplemented or adapted operators:
  • AgglomerativeClustering
  • ClusterModel2ExampleSet
  • DBScanClustering
  • ExampleSet2ClusterModel
  • FlattenClusterModel
  • KMeans
  • KMedoids
  • KernelKMeans
  • RandomFlatClustering
  • SupportVectorClustering
  • TopDownClustering
  • ClusterModelWriter
  • ClusterModelReader
  • TransitionMatrix
  • Removed operators:
  • AgglomerativeFlatClustering, use AgglomerativeClustering and FlattenClusterModel instead
  • BregmanHardClustering, use KMeans with BregmanDivergences instead
  • ExampleSet2ClusterConstraintList
  • MPCKMeans
  • TopDownRandomClustering, use TopDownClustering with RandomFlatClustering as inner learner
  • UPGMAClustering, use AgglomerativeClustering with average link instead
  • SimilarityComparator
  • The new AttributeConstruction operator supports infix
  • written formulas, a simple format for constants and
  • new calculation methods
  • Better support for special characters in process XML
  • Macros are now also supported in parameter lists and for
  • numerical parameters
  • Added new overwriting mode to the DatabaseExampleSetWriter
  • named "first overwrite, then append"
  • Replaced "append" parameter in ExampleSetWriter by the
  • new overwriting modes "none", "overwrite", "append",
  • and "first overwrite, then append"
  • ExampleFilter can now use regular expressions for the values
  • of the nominal attribute value filtering
  • New Plotter: Pareto Chart
  • New Plotter: Series Multiple
  • New Plotter: Scatter Multiple
  • The old scatter plotter has been divided into a new Scatter
  • plot and the new Scatter Multiple plot
  • Most plotters now support panning during zooming by
  • pressing the Ctrl Key while dragging the mouse
  • The file chooser in the modern look and feel now always
  • remembers the last directory from which a file was chosen
  • as an additional default bookmark (on the left)
  • Changed the order the in which models are added to the
  • grouped model (ModelGrouper), i.e. the last created model
  • will now be added as last one
  • The wizards of the database reading and writing operators
  • are now initialized with the last settings
  • The feature selection and feature weighting operators are
  • now based on double arrays which should lead to smaller
  • memory footprints
  • Added new performance measures:
  • sensitivity,
  • specificity,
  • Youden index,
  • relative error lenient,
  • relative error strict
  • The CachedDatabaseExampleSource operator has now a more
  • appropriate wizard
  • The plotters now provide consistent colors for classes
  • Improved the names of the features of the (multi-)variate
  • windowing operators
  • Multivariate windowing now also supports a name for the
  • label column in addition to the index
  • Multivariate windowing can now also applied without the
  • creation of a label and even with horizon 0
  • Improved the graph and plotter panel for long column / item
  • names, long names are now displayed in a short fashion and
  • the full name is shown as tool tip
  • DecisionTree now supports a new parameter min_size_for_split
  • Added new process branch conditions:
  • attribute_available,
  • min_examples,
  • max_examples,
  • min_attributes,
  • max_attributes
  • The viewers for symmetrical matrices like correlations etc.
  • now always shows the values of the first column
  • Improved the range names of discretized data
  • Added selection of criterion to AssociationRulesGenerator,
  • also improved the visualization of association rules by
  • adding a selector for the criterion used for the minimum
  • value slider
  • Added new option for Normalization. Now might chose from z-transformation,
  • range-transformation or the new proportional transformation via category selection.
  • LinearRegression is now also applicable on binominal
  • classification tasks
  • Added support for logging only the top-k or bottom-k objects
  • with the ProcessLog operator
  • Improved the parameter optimization / iteration dialog:
  • small numbers are no longer cut off, GUI is more consistent,
  • dialog now used icons
  • Improved the CachedDatabaseExampleSource operator and
  • database handling: now arbitrary tables are accepted and
  • primary keys (index) and / or mapping tables are
  • automatically handled
  • Integrated the latest version of the JFreeChart library
  • A dialog informs the users now if any unknown parameters
  • were part of the process during loading
  • A SimpleVoteModel now supports the output of textual
  • results
  • (Multivariate) Windowing on example based input representations
  • now keep the input id attribute
  • Added writing of intermediate weights for GeneticAlgorithm
  • (feature selection) and EvolutionaryWeighting (feature
  • weighting), both operators now also support the initialization
  • with attribute weights (e.g. from the last run)
  • Implementation Details:
  • Moved AnovaMatrix(Operator) into the package
  • com.rapidminer.operatir.visualization.dependencies
  • Moved all attributes based matrix operators
  • (correlation, covariance etc.) into the new package
  • com.rapidminer.operatir.visualization.dependencies
  • Moved aggregation functions into package
  • com.rapidminer.tools.math.function.aggregation
  • Bugfixes:
  • processes now only write the logged information from the
  • run, not the global information for example collected from
  • the GUI. Hence, the logging will also no longer directly
  • overwrite old log files right after loading
  • switch workspace and initial workspace selection now
  • prevent the selection of the RapidMiner main directory
  • and all subdirectories in order to prevent a recursive
  • copy
  • switched weight "direction" for corpus based weighting
  • fixed bug in evolutionary parameter optimization in
  • combination with logging
  • fixed bug in Wizard for ExampleSource preventing the
  • correct guess of value types (were always nominal)
  • fixed error in nominal re-mapping for cases where the
  • nominal values of training and test set did not match
  • fixed jittering bug in Histogram plots causing the bins
  • to drop out of the plotter
  • fixed minor bug in ExampleSetWriter which caused the
  • ExampleSource operator to state a warning
  • fixed bug if special characters were part of the process
  • XML
  • DistributionModel is updatable now
  • AttributeValueSubstring ignores missing values and is
  • able to extract single characters now
  • Fixed a GUI error only occurring in Java 6 Update 10
  • Fixed bug in FeatureSubsetIteration where the specified
  • maximum number of features was not used
  • Fixed bug in PerformanceVector writing from the result
  • dialog (Save button) which led to large data files and
  • long runtimes until the data was actually saved
  • Fixed bug in uninstaller which under certain circumstances
  • also removed non-RapidMiner files in the installation
  • directory

New in RapidMiner Studio 4.2 (Jan 18, 2010)

  • New operators:
  • Nominal2Date
  • Date2Nominal
  • KernelPCA
  • EqualLabelWeighting
  • StataExampleSource
  • FeatureSubsetIteration
  • RelativeRegression
  • AttributeValueSubstring
  • CachedDatabaseExampleSource
  • NameBasedWeighting
  • BatchProcessing
  • GroupModel
  • UngroupModel
  • Aggregation now supports multiple aggregations (also of
  • different attributes) as well as grouping by values of
  • multiple attributes. Aggregation attributes and functions
  • are now specified by a parameter list.
  • Added support for attributes with value types date, time,
  • and data_time: these can be created from nominal attributes
  • with the operator Nominal2Date for arbitrary date formats
  • Histogram plotters now support jittering and log scales
  • The database wizard is improved and now supports large
  • data sets which caused memory problems in the older
  • versions during table and attribute name retrieval
  • The statistics in the meta data view of data sets are
  • no longer calculated per default for data sets larger
  • then 100000 rows - the calculation is available from
  • the menu in the upper right corner
  • "ExampleSet" was renamed to "Data Table", the rows are
  • still called "Example" and the columns are still called
  • "Attribute"
  • The iteration through partitioned / splitted data sets
  • is now more efficient (especially for linearly splitted
  • sets)
  • All plotters can now handle missing values
  • Many plotters now support the plotting of absolute values
  • and / or sorting according to the plotted column
  • Removed time-consuming checks (including a full data
  • scan before plotting)
  • One-Class SVM for LibSVMLearner now properly supported
  • The new operators GroupModel and UngroupModel now replace
  • the automatic building of ContainerModels (merging
  • preprocessing with prediction models) and hence give the
  • user more control over the model building / grouping
  • process
  • AttributeSubsetPreprocessing now supports the inversion
  • of the specified regular expression
  • The operator AttributeSubsetPreprocessing was enhanced
  • so that it can now be applied on subsets defined similarly
  • to the new AttributeFilter operator. Hence, the subset
  • preprocessing can for example only be performed on
  • nominal or numerical attributes
  • The database example set writer now supports new
  • overwriting / appending modes
  • Instead of the "work_on_database" mode of the usual
  • DatabaseExampleSource operator we now recommend the
  • new operator CachedDatabaseExampleSource which will
  • keep the data in the database in a more efficient way.
  • However, please note that writing in such a table is
  • not directly possible and must be performed with a
  • DatabaseExampleSetWriter
  • Implementation Details:
  • optimized KNN for speed issues, gaining boost up to 13x
  • replaced NaiveBayes with highly efficient version
  • (changes: distribution plots now show conditional
  • probabilities without consideration of a priori probabilities,
  • heuristic use of kernels has been removed)
  • integration of RapidMiner is now easier since the location
  • of plugins and Weka can be properly defined with settings
  • and the definition of "rapidminer.home" is no longer
  • necessary
  • clean-up for value types (Ontology)
  • The ValueInterface now delivers Object instead of double,
  • i.e. the logging of nominal values is now also supported
  • New renderer service for providing the visualizations
  • of the results. This will replace the method
  • getVisualizationComponent() in the long run
  • added latest version of the chart library (as of July
  • 13th 2008)
  • added latest version of Weka (as of July 13th 2008)
  • Bugfixes:
  • Fixed two bugs in new parameter wizard gui for string
  • and integer parameters
  • CSV- and SimpleExampleSource now accept lines which
  • correctly divided empty strings (i.e. missing values)
  • at the end of the lines
  • Fixed wrong number of bins for the square root number
  • of bins in the frequency discretization operator
  • Fixed closing behaviour of the switch workspace dialog
  • Changes in XML tab were not used if the tab was left
  • in other ways than by changing the tab to another one

New in RapidMiner Studio 4.1 (Jan 18, 2010)

  • New operators:
  • StratifiedSampling
  • AbsoluteStratifiedSampling
  • GuessValueTypes
  • UseRowAsAttributeNames
  • MemoryCleanUp
  • MaterializeDataInMemory
  • UncertainPredictionsTransformation
  • CovarianceMatrix
  • AttributeFilter
  • RandomSelection
  • FrequentItemSetUnificator
  • FrequentItemSetAttributeCreator
  • OperatorSelector
  • CostEvaluator
  • AttributeMerge
  • KennardStoneSampling
  • New 64 bit version for Windows x64 OS now provided;
  • other 64 bit systems are supported by using a
  • 64 bit Java version
  • Parameter optimization operators now provide a nicer
  • wizard dialog for setting the parameters
  • All GUI elements provide now longer descriptions for
  • operators
  • SplitChain and AbsoluteSplitChain were moved from the
  • postprocessing into the meta group
  • Meta group was restructured and two subgroups (control and
  • other) were added
  • Fixed a memory leak in the result history which was affecting
  • the GUI for multiple processes if they were performed in a
  • single sequence
  • SOMDimensionalityReduction and SVDReduction are now able to create
  • a preprocessing model
  • BruteForce and GeneticAlgorithm feature selection now support
  • a minimum and maximum number of features and also the selection
  • of a exact number of features
  • RapidMiner now offers two different look and feels: modern
  • (recommended) and classic
  • Improved comment tab so that it already registers and saves
  • new text directly after it was typed (instead of changing
  • the tab)
  • DataStatistics (IOObject) now shows the standard deviation
  • like in the GUI instead of the variance
  • Robustified ExampleSource wizard: the same output files
  • as the input file are no longer allowed
  • Series Plotter does now no longer scale the axis ranges in
  • a way that zero must be contained
  • All SVM and other hyperplane models now supports the visualization
  • of a sortable data table for the coefficients (weights)
  • An error message now indicates if XML entities are used for
  • operator names which is not allowed
  • Anova calculator now allows value editing in table and the
  • specification of the significance level
  • Meta data views can now be correctly sorted according to sum
  • or unknown value columns
  • MissingValueImputation: added warnings in the case that not all
  • values could be imputed, improved attribute ordering (ascending
  • and descending sorting, sort by number of missing values), added
  • log messages
  • Naive Bayes distribution model now uses the same class coloring
  • for both numerical and nominal distributions
  • Latest available Weka version integrated (as of 2008/05/09)
  • Implementation Details:
  • The AttributeParser no longer supports batch generations
  • The ClusterModel reader is now able to read both compressed
  • and uncompressed files
  • PCA and GHA now use global covariance matrix calculation
  • Bugfixes:
  • LibSVMLearner now provides the correct range for the nu
  • parameter
  • Fixed bug in AttributeParser which prevents the correct
  • calculation for nested generations or cases where
  • the generation is divided into several operators
  • Fixed bug in value type guessing for numerical columns
  • with missing values
  • Fixed bug in ExampleSetTranspose for missing values
  • in nominal attributes
  • Fixed bug in DatabaseExampleSource Wizards for user
  • defined URLs
  • Parameter lists are now cloned correctly
  • Fixed bug for quoted input files occuring in some cases
  • where the quoted string was part of the line before
  • Fixed a bug for learning with example weights with the
  • JMySVM learner
  • Fixed a NPE if empty example sets were used as input
  • for feature selection operators
  • Fixed wrong normalization for confidences predicted by
  • distribution models (e.g. NaiveBayes)
  • AttributeEditor and ExampleSource wizard did not regard
  • the decimal point character (and quotes)
  • The value type guessing operators did not take a
  • possible decimal point character different from
  • '.' into account
  • Fixed tool tip for z-transform in Normalization operator:
  • changed "variance" to "standard deviation"
  • Fixed locale for Ok - Cancel dialogs to US locale like
  • the rest of RapidMiner
  • Fixed bug in operator tree which caused the reconstruction
  • of the expansion state to be faulty in some cases
  • Fixed statistics copy bug introduced in 4.1beta2 for
  • predicted label statistics

New in RapidMiner Studio 4.1 Beta 2 (Jan 18, 2010)

  • New operators:
  • ProcessBranch
  • FileEcho
  • ExchangeAttributeRoles
  • ChangeAttributeRole
  • SeriesPrediction
  • Deprecated operators:
  • ChangeAttributeType (use ChangeAttributeRole instead)
  • New version of chart plotting library
  • New plotter: Series
  • Removed the numerical sample sizes for the tree and rule learners
  • Introduced different shapes for plotter points
  • Use bigger strokes for plotter lines
  • Added max_items parameter for FPGrowth
  • Changed default mode for view creation of preprocessing models
  • Added signum generator for manual feature generation and for
  • generation with YAGGA2
  • Relief can now handle missing values
  • Changed default data representation back to double because too
  • high number of rounding errors otherwise for larger data ranges
  • Implementation Details:
  • Introduced AttributeDescriptions and AttributeTransformations
  • in order to lower large memory consumptions due to clones
  • and to avoid re-wrappings for new views on the example
  • set view stack
  • removed clone of mappings for clones of nominal attributes
  • Changed DataRow methods from package private to protected
  • ConditionedExampleSets no longer support dynamical
  • conditions
  • Changed default data representation back to "double"
  • The visualization of integers and the nominal statistics
  • calculation are now based on longs instead of integers
  • Bugfixes:
  • Fixed MAJOR bug introduced in 4.1beta in example sets /
  • views which occured after a new view was created on
  • top of a splitted example set (e.g. in a cross validation)
  • and has hidden the partition then
  • Fixed some problems (due to too much cloned objects, see
  • above) which caused much more memory usage in 4.1beta
  • Fixed bug in PredictionTrendAccuracy calculation
  • Fixed wrong linefeeds in unix start scripts
  • Fixed bug in aggregation function selection of the chart
  • plotters
  • Fixed ID handling bug for example sets (views) which
  • prevented the correct application of Id-based operators
  • like the ExampleSetJoin operator
  • Fixed bug in table index assignment of view attributes
  • Fixed bug in SortedExampleSet
  • Fixed bug in some plotters based on JMathplot
  • Removed remapId() call in IdUtils which increased the
  • runtime of some clustering schemes (especially DBScan
  • and SupportVectorClustering)
  • Fixed bug in RuleLearner for nominal attributes
  • Fixed bug for (operator / parameter) pair parameter
  • values for the parameter iteration and optimization
  • operators
  • Fixed wrong name for continous attributes in C45 loader
  • ConditionedExampleSet caused some problems if the base
  • attributes for conditions were removed after the
  • filtering
  • Fixed a bug in getNominalValue(Attribute) of Example
  • which delivered the first nominal value instead of
  • missing values
  • File filters do now accept lower and upper case
  • extensions
  • Fixed wrong colors after sorting a column of the
  • ANOVA matrix
  • Removed unnecessary statistics registration in nominal
  • attributes consuming unused memory and runtime
  • Fixed rounding error in the stepwise parameter
  • operators
  • Removed data representation type query during first
  • startup since rounding errors are often too high
  • AbsoluteSampling produced sample with duplicates

New in RapidMiner Studio 4.1 Beta (Jan 18, 2010)

  • RapidMiner GPL is renamed to RapidMiner Free and is licensed
  • under the General Public License version 3 (GPLv3) now
  • New operators:
  • SingleMacroDefinition
  • MissingValueReplenishmentView
  • Perceptron
  • SugroupDiscovery
  • ExcelExampleSetWriter
  • CSVExampleSetWriter
  • several new data generators
  • New preprocessing models for discretization and nominal to
  • binominal filter, these operators now create only a new view
  • on the data as default instead of actually changing the data
  • ArffExampleSource and XrffExampleSource now support sampling
  • Improved Windows installation
  • New icons and look and feel for GPL version
  • Added graph visualization for association rules
  • Added new filter modes for association rule visualizations
  • The non-GPL version now natively supports Oracle, IBM DB2, and
  • Microsoft SQL Server without the need of an additional driver
  • installation
  • The availability check for JDBC database drivers was improved,
  • the same applies for the corresponding dialogs
  • The database operators and wizards can now work with table
  • and column identifiers containing spaces and other special
  • characters
  • Improved performance of DecisionTree and RuleLearner for
  • data sets containing numerical values
  • Improved encoding handling for input operators, configuration
  • wizards, and attribute editor
  • New default encoding: 'SYSTEM' which uses the standard encoding
  • of the underlying operating system
  • All performance criteria now support example weights for
  • calculations if possible (and available)
  • New rule evaluation methods available for
  • AssociationRuleGenerator
  • Diagonal of confusion matrix is now marked by a different color
  • All clustering schemes do now use MixedEuclideanDistance
  • as default
  • The chart plotters (pie, bars) are now more robust for larger
  • data sets
  • The chart plotters (pie, bars) now provide the possibility for
  • the selection of an aggregation function type and use distinct
  • values only
  • KMeans now provides a warning for data sets containing
  • missing values
  • The sometimes slightly annoying dialog asking for saving the
  • process can now be deactivated
  • Passwords are now encrypted in XML (also in files) ensuring
  • that passwords cannot be read from process files
  • New Plotter: Distribution
  • Changed operator numbering in operator info dialog for
  • inner operator conditions
  • The error messages and the error stack trace in the details frame
  • can now be copied via Ctrl-C
  • Data files written by the ExampleSource configuration wizards
  • are now compatible to the standard parameters
  • ExampleSource now uses quoted nominal values as default
  • New visualization for NaiveBayes models
  • Operator trees do now not longer change their expansion status
  • after saving them or after process stops
  • Multiple paste operations are now possible after copy
  • Decision trees show now the size of leaf nodes through the
  • height of the frequency bar
  • More evaluation measures added for association rules
  • ExampleSources now also allow the usage of no comment charaters
  • Increased the default size for the file chooser and the text
  • dialogs
  • Text dialogs like the SQL editor do now keep linefeeds and
  • tab information
  • Changed the default minimum support of FPGrowth to 0.95 and added
  • an option to decrease the support until a minimum number of
  • frequent item sets was found. The latter working mode is the
  • default now.
  • KMeans cluster models now provide a parallel plotter
  • visualization of the cluster centroids
  • New macro: %{v[OpName.ValueName]} which will be replaced
  • by the current value of the specified value of the operator
  • Added cross-entropy as a new classification criterion
  • Ranges of discretized attributes now contain information about
  • the numerical thresholds
  • Changed default criterion for RuleLearner from accuracy to
  • information gain
  • Added default data management type to the initialization
  • screen
  • Icons for all tabs (non-GPL version)
  • Latest Weka version (as of 30/11/2007)
  • Implementation Details:
  • New init method also allowing the easy definition of
  • additional operators
  • ParameterSet now provides access to parameter values
  • New views (example sets) in order to improve the integration
  • into other products
  • Changed signature of startCounting(ExampleSet) to
  • startCounting(ExampleSet, boolean) in MeasuredPerformance
  • (see above)
  • All Models now have to return the transformed example set
  • instead of changing the values by side-effect. This was
  • necessary to allow the usage of views and view models
  • Bugfixes:
  • Fixed wrong license texts
  • Removed the file weka.jar from the free version which
  • was accidentally included in the last release. Weka is
  • of course still part of the GPL version of RapidMiner
  • Fixed templates (SimplePerformance was renamed to
  • Performance)
  • Fixed example visualization in cluster models (wrong
  • examples were shown in some cases)
  • Wizard from Welcome Screen did not change into edit mode
  • Faulty wizard files were fixed
  • Faulty building block files were fixed
  • Fixed bug in the calculation of the confidence of
  • association rules
  • Fixed bug if several manual feature construction were
  • applied in a row (overwriting old generated columns)
  • Unknown values of nominal attributes were not correctly
  • encoded in Arff files
  • Problem with example encoded multivariate series in the
  • MultivariateSeries2WindowExamples operator
  • Fixed bug for ranking in TransformedRegression
  • Fixed bug in RapidMiner initialization for user defined
  • operators.xml streams
  • Configuration Wizard of ExampleSource did not use correct
  • encoding from process root operator
  • After deleting the contents of a password field it was
  • still part of the process setup (empty string in XML)
  • Changed result set scrolling type to "sensitive" which
  • is necessary for the Microsoft SQL Server 2005
  • Fixed bug in XrffExampleSetWriter which did not properly
  • escape XML characters
  • Using Save for a ParameterSet result did not work
  • Fixed Weka related bugs in the online tutorial
  • Fixed a possible stack overflow error in the
  • RepeatUntil meta chain
  • Fixed problem with Microsoft SQL Server 2005 with
  • respect to the scrolling / updating behavior
  • ProcessLog got an error if the value "best_length" of a
  • feature operator should be logged
  • Fixed error in the k-distance plot which calculated a wrong
  • x-axis offset for certain settings
  • SparseFormatExampleSource did not trim the sparse array which
  • caused higher memory usages
  • Removed data view icon for some of the plotters since an
  • error in a third-party library caused problems after
  • activation
  • RuleLearner did not use numerical attributes twice
  • Fixed error in attribute editor which has added empty
  • data lines after re-opening the edit dialog

New in RapidMiner Studio 4.0 (Jan 18, 2010)

  • New operators:
  • Performance (could be used in most cases instead of the
  • now deprecated PerformanceEvaluator)
  • ClassificationPerformance
  • BinominalClassificationPerformance
  • RegressionPerformance
  • UserBasedPerformance
  • SingleRuleWeighting
  • MPCKMeans
  • Almost all process setups will now also correctly work if the
  • nominal values of training and test data are not defined or
  • are not defined in the same order
  • The somewhat big operator "PerformanceEvaluator" is now
  • deprecated and was divided into several smaller operators
  • which now fit the different learning task types.
  • Added compatibility checks for the example sets for
  • prediction models between training and application data
  • Added a filter for the New Operator tab
  • Added learning for numerical attributes for rule learners
  • Renamed lowest verbosity level to "all"
  • Improved visualization of performance criteria
  • Added automatical ROC curve visualization for AUC criterion
  • Added averaged ROC curves
  • Added deviation plotter
  • Improved ExampleSetMerge
  • Improved rule learning on numerical data sets
  • Improved tree learning on numerical data sets
  • Added k-distance plot for similarity measures in the
  • similarity visualizations
  • Changed AUC calculation to a more pessimistic calculation
  • which better fits the ROC plots
  • Operator info is now available in context menu of
  • operator list in new operator tab
  • Added example visualizations after clicking a node
  • in the graph view of similarity visualizations
  • Improved the speed confidences are set for LibSVM models
  • Latest Weka version included (as of 30/07/07)
  • Implementation Details:
  • Revised clustering operators and introduced improved
  • abstract clustering
  • The global logging can now be specified either by
  • general properties or via the method
  • LogService.initGlobalLogging(...)
  • Attribute.getStatistics(...) is now deprecated, please
  • use ExampleSet.getStatistics(Attribute, ...) instead
  • Changed the log verbosity of the process informations
  • at the beginning and the end of process executions
  • Plugins can now define own building blocks in their
  • resources directory (each bb file is described by a
  • line in the file "buildingblocks.txt")
  • Improved closing of streams in error cases
  • Bugfixes:
  • Removed unnecessary parameters from RandomForest
  • Fixed attribute name bug (not case sensitive) causing
  • errors in some preprocessing operators if features with
  • the same name but different cases exists
  • Fixed bug in Anova and T-Test calculation (wrong degree
  • of freedom)
  • Fixed bug during weight normalization which lead in many
  • cases to a concurrent modification exception which was
  • covered by a process change message
  • Removed possible bug in UPGMA-Clustering
  • Graph View of Cluster Models did not work
  • Added missing clone in discretization operator which
  • might have caused problems in cases where the discretization
  • was added into an iterating chain (like validation chains)
  • Streams are now not automatically closed during XML (de-)
  • serialization
  • Rule learners did not produce greater equal conditions
  • Plotters now can handle missing values for plot columns
  • Mikro-averages of attribute weights were not correctly
  • calculated
  • Fixed bug if a data set (.aml) is re-loaded containing
  • confidence attributes
  • Added missing option for k in the k-distance plots
  • (similarity visualizations)
  • Fixed notification error (double beeps) after a process
  • was stopped in a breakpoint
  • Fixed a bug which made it impossible to save neural net
  • models

New in RapidMiner Studio 4.0 Beta 2 (Jan 18, 2010)

  • New operators:
  • BatchXValidation
  • BatchSlidingWindowValidation
  • AttributeCopy
  • ExampleSetTranspose
  • AssociationRuleGenerator
  • RelevanceTree
  • CHAID
  • Tree2RuleConverter
  • Removed operators:
  • RegressionTree (may be re-added in later releases)
  • Ripper (replaced by RuleLearner)
  • Renamed operators (old operator names are deprecated now):
  • ExperimentEmbedder operator was renamed
  • to ProcessEmbedder (see below)
  • ExperimentLog operator was renamed to
  • ProcessLog operator (see below)
  • API change: Renamed Experiment to Process
  • (the old class Experiment is still available for
  • compatibility reasons but deprecated)
  • API change: OperatorService.createOperator(Class)
  • is now the preferred way for operator creation
  • and does no longer need a cast (generics)
  • Added correct file encodings to all IO operators
  • Renamed log verbosity "minimum" to "all" and log
  • verbosity "maximum" to "off"
  • Added meaningful default and range values for the
  • parameters of the ParameterOptimization operators
  • Replaced Tip of the Day dialog by the tip in the
  • Welcome screen
  • Changed all Weka parameters to non-expert parameters
  • (available in beginners mode)
  • SVMWeighting now supports more than 2 classes
  • All weighting schemes now return normalized results
  • Completely revised tree and rule learners
  • Completely revised tree, cluster model, and similarity
  • visualization
  • Latest release of LibSVM integrated (2.84)
  • Latest release of xstream integrated (1.2.2)
  • Latest release of Jung integrated (2.0alpha2)
  • Added table view for experiment log results
  • Added text views for learned tree models
  • Added text views for learner kernel models
  • Added text view for logistic regression model
  • Added Anova kernels for JMySVM and EvoSVM
  • Removed obsolete temp file service
  • CommandLineOperator now uses a higher log
  • verbosity for the output of the command
  • Improved output of Naive Bayes models
  • Improved context menu for attribute editor
  • Example visualization now automatically added after
  • IdTagging
  • Improved standard example visualization
  • Bugfixes:
  • Added missing dialog if more than one special attribute
  • with the same name was defined with the ExampleSource
  • configuration wizard
  • Log view panel was not resizable
  • Special attributes were no longer special after
  • AttributeSubsetPreprocessing on special attributes
  • LibSVM multi class issues fixes (no confidences)
  • Bugfix in the fast example set to sparse transformation
  • causing problems in Weka learners (and maybe the LibSVM)
  • Dichotomization did not properly work
  • Parallel plotter did not properly work for special
  • attributes
  • Fixed missing Id problem for top down clusterers
  • Fixed wrong nominal value writing for attribute editor
  • Column colors were not transferred if columns were
  • moved in data views
  • The AttributeConstructionLoader did not properly
  • created attributes for the identical function
  • (no construction at all)
  • Normalization did not work properly work on nominal
  • attributes
  • AttributeSubsetPreprocessing did not properly keep
  • the old attributes
  • Replace operator (context menu) of operator chains
  • added (2) to the inner operators even if the names
  • were not used in the process setup
  • Spearman's Rho and Kendall's Tau now deliver 0 if not
  • defined (e.g. for default model) instead of NaN
  • Fixed problem with delegate attribute unwrapping in
  • some feature selection cases in combination with
  • cross validation operators

New in RapidMiner Studio 4.0 Beta (Jan 18, 2010)

  • "YALE" was renamed to "RapidMiner"
  • New operators:
  • DensityBasedOutlierDetection
  • LOFOutlierDetection
  • DistanceBasedOutlierDetection
  • PCAWeighting
  • SVMWeighting
  • Relief
  • InfoGainWeighting
  • InfoGainRatioWeighting
  • ChiSquaredWeighting
  • SymmetricalUncertaintyWeighting
  • PSOWeighting
  • FPGrowth
  • LinearRegression
  • NaiveBayes
  • NeuralNetLearner
  • LogisticRegression
  • DecisionStump
  • DecisionTree
  • ID3
  • ID3Numerical
  • RegressionTree
  • RandomTree
  • RandomForest
  • Prism
  • Ripper
  • OneR
  • NearestNeighbors
  • AdditiveRegression
  • Stacking
  • Vote
  • MetaCost
  • CostBasedThresholdLearner
  • Binary2MultiClassLearner
  • SVDReduction (from clustering plugin)
  • KMedoids (from clustering plugin)
  • KMeans (from clustering plugin)
  • KernelKMeans (from clustering plugin)
  • SupportVectorClustering (from clustering plugin)
  • AggomerativeClustering (from clustering plugin)
  • AgglomerativeFlatClustering (from clustering plugin)
  • UPGMAClustering (from clustering plugin)
  • TopDownRandomClustering (from clustering plugin)
  • TopDownClustering (from clustering plugin)
  • DBScanClustering (from clustering plugin)
  • RandomFlatClustering (from clustering plugin)
  • ExampleSet2ClusterModel (from clustering plugin)
  • FlattenClusterModel (from clustering plugin)
  • ClusterModel2ExampleSet (from clustering plugin)
  • ExampleSet2Similarity (from clustering plugin)
  • ClusterModel2Similarity (from clustering plugin)
  • SimilarityComparator (from clustering plugin)
  • Bootstrapping
  • WeightedBootstrapping
  • BootstrappingValidation
  • WeightedBootstrappingValidation
  • MissingValueImputation
  • ExampleSetMerge
  • ExampleSetCartesian
  • XrffExampleSource
  • XrffExampleSetWriter
  • DatabaseExampleSetWriter
  • IOSelector
  • LinearCombination
  • AttributeSubsetPreprocessing
  • ModelVisualizer
  • ModelUpdater
  • LabelTrend2Classification
  • Sorting
  • AddNominalValue
  • ExampleRangeFilter
  • Numeric2Polynominal
  • PartialExampleSetLearner
  • SlidingWindowValidation
  • GroupBy
  • GroupedANOVA
  • ANOVAMatrix
  • Aggregation
  • Renamed operators:
  • AttributeSetWriter /-Loader into
  • AttributeConstructionsWriter / -Loader
  • Renamed all operators starting with Y- into
  • the names without this prefix
  • Added W- to all Weka operators, old experiments
  • can be loaded though
  • AverageLearner (was deprecated) now revised and
  • renamed into AttributeBasedVote
  • Deprecated operators:
  • Numeric2Binary (use Numeric2Binominal instead)
  • API CHANGES: please refer to
  • http://sourceforge.net/forum/forum.php?thread_id=1698583&forum_id=390413 and
  • http://sourceforge.net/forum/forum.php?thread_id=1730986&forum_id=390413
  • for details
  • The clustering plugin is now part of the YALE core
  • Drag'n'Drop for operator trees
  • New Icons (please refer to the license files for
  • informations about the icons)
  • New Look and Feel (please refer to the license
  • files for informations about the look and feel)
  • Improved general speed, most YALE runs now use
  • less 60% of the runtime needed before
  • Added page setup and print preview dialogs
  • Improved printing
  • New file chooser and added favorites to it (in the
  • left part of the dialog)
  • Tool tips can now be painted over multiple lines
  • allowing more informations about the operators and
  • parameters
  • New view menu
  • Result History viewer showing textual descriptions
  • of all experiment results in the session so far; allows
  • also the calculations of Anova for different results
  • Parameter values are now always saved at focus losses
  • or during resizing operations
  • All tables (viewers) can be sorted by clicking on the
  • table headers (at least all tables where this makes sense)
  • Speed up of plotter initialization which was the reason
  • for the long times needed for displaying data sets
  • GUI is now able to immediately stop a running experiment
  • Improved capability to use YALE as library which makes
  • necessary that the Ant target "copy-resources" must be
  • performed before starts (see implementation details
  • below)
  • All file formats were changed (sorry!) and are now
  • based on XML
  • Grid based parameter optimization / iteration operators
  • now support another format for parameter definition:
  • [start;end;step]
  • XVPrediction can now also handle confidences for
  • problems with more than two classes
  • Improved automatic closing of files and temp file
  • deletion after major experiment changes
  • Added graph view for BayesianNet models
  • Added textual and graphical view modes for models which
  • are capable of both, e.g. decision trees and Bayesian Nets
  • Added possibility to invert the result of an ExampleFilter
  • Added possibility to connect several attribute value
  • conditions for an ExampleFilter
  • Added new performance criteria: Spearman's rho and
  • Kendall's tau
  • Added option for AttributeWeightsApplier allowing for
  • changing just the data view instead of the actual data
  • table
  • The data representation type "sparse_array" was renamed
  • to "double_sparse_array"
  • Added new data representation types "short_array",
  • "short_sparse_array", and "boolean_sparse_array" allowing
  • for more efficient data handling
  • The univariate Series2WindowExamples operator now again
  • supports sets of examples if the time series is encoded
  • as attributes
  • The (meta) data tables now support text selection allowing
  • for copy and paste into other applications.
  • Performance Vector results can now be selected and copied
  • Example Set views can now be selected and copied
  • All displayed results now provide a "Save..." button
  • Use JTable for confusion matrices
  • Use JTable for correlation matrix (DataTable)
  • Added HSQLDB JDBC driver
  • Full platform compatible line feeds
  • ResultWriter can now also write results into single files
  • instead of the global result file defined in experiment
  • Improved LearningCurveOperator now using better dynamically
  • growing training sets and a fixed test set
  • Allow the definition of number of digits for the
  • ExampleSetWriter format
  • Added log scale to usual scatter plotter
  • Added several chart plots (new bars 2D and 3D, pie charts
  • 2D and 3D, bubble plotter)
  • ExampleSetWriter now support zipped data files
  • Added initial support for updatable models, currently
  • only the updatable models from Weka are supported, other
  • will follow
  • Added another replenishment type 'zero' for the
  • MissingValueReplenishment operator
  • Added source definition for all IO objects, i.e. the
  • results do now show which operator was the creator
  • (only shown in result view if more than one result of
  • the same type was created)
  • Allow complete data scan for value type guessing now
  • in ExampleSource configuration wizard
  • Added weighted performance measures for weighted means
  • of the per-class recalls and precisions
  • Model writing and loading works for zipped files (gz)
  • Changed attribute statistics handling and displaying
  • Implementation Details:
  • The Ant target "copy-resources" must be performed
  • before starts are possible
  • new initialization methods available Yale.init(...)
  • allowing the specification which parts of YALE should
  • be initialized
  • Revised database access handling. Statements are now
  • always closed
  • changed name of method getVisualisationComponent into
  • getVisualizationComponent
  • no longer necessary to register operators in an
  • experiment (done automatically during adding)
  • no longer necessary to implement the abstract
  • OperatorChain method getNumberOfSteps()
  • Completely revised the example set / attribute /
  • example table data core of YALE which leads to much
  • better implementations of the core classes and more
  • possibilities for extensions. Please refer to the
  • YALE forum for an in-depth description of the
  • changes
  • attribute statistics are now handled in a different
  • way, all statistics are queries now with a statistics
  • name string
  • most actions are now part of own packages
  • replaced shuffled partition building by a version
  • reflecting the way Java shuffles collections
  • improved efficiency of WekaInstancesAdaptor by finding
  • YALE weight attribute only once instead anew for each
  • example
  • removed static field in class Yale for the current
  • experiment
  • The class Main was renamed into YaleCommandLine
  • Added possibility to define default values for
  • attributes
  • BinaryAttribute was renamed to BinominalAttribute
  • Newest versions of all libraries
  • PropertyValueCellEditor can now be registered in
  • PropertyTable allowing plugins to provide new
  • editors for new parameter types
  • The same applies for PropertyKeyCellEditor
  • Averagable: compareTo now implemented in subclasses
  • Averagable: cloneAveragable(Averagable) is now
  • deprecated, please use copy constructors
  • Added ParameterTypeText for longer text inputs
  • XML serialization now uses object streams
  • Bugfixes:
  • IOObjectWriter / - Reader did not work for Windows
  • executable due to library typo
  • LibSVM regression models could not be saved
  • Bugfix in PermutationOperator which uses all
  • attributes of the ExampleTable instead of only using
  • those currently selected in the ExampleSet
  • Exception in list property editors after one row was
  • deleted
  • Use default GUI properties in cases where loading of
  • properties did not work
  • Colons in attribute names were not supported by the
  • AttributeWeightsLoader / -Writer. Replacement by XML
  • format fixes this problem
  • Percent (%) in parameter strings were replaced by
  • the method expandString(String) which was not desired
  • The new format for short commands is %{a} now
  • new CSV operator which better supports quoting and
  • column separators
  • fixed problem for category parameters if the check
  • value was a string of the index number
  • fixed bug for number of components = -1 in GHA models
  • fixed error for regular attributes with special names
  • when written into sparse format
  • Fixed bug for RVM model writing
  • Fixed bug for data transformation into the
  • association rule learning format of Weka
  • Removed error if a parameter for a non-existing
  • special attribute was in the special format of the
  • ExampleSetWriter

New in RapidMiner Studio 3.4 (Jan 18, 2010)

  • New operators:
  • MultivariateSeries2WindowExamples
  • EvolutionaryParameterOptimization
  • IOObjectReader
  • IOObjectWriter
  • AGA
  • YAGGA2
  • SPSSExampleSource
  • ExcelExampleSource
  • LiftChart
  • ROCChart
  • MacroDefinition (see below)
  • Removed operators:
  • NelderMeadParameterOptimization
  • PatternSearchParameterOptimization
  • Deprecated operators:
  • NaiveBayes, SimpleNaiveBayes, and NaiveBayesUpdateable
  • (replaced by Y-NaiveBayes)
  • LibSVM (use LibSVMLearner instead)
  • Changed parameters:
  • DatabaseExampleSource: replaced "driver", "urlprefix",
  • and "databasename" by "database_url" (can be easily
  • defined with help of the new configuration wizard, see
  • below)
  • ExampleSource now support zipped data files
  • Added new data representations backed up by non-double
  • arrays which will need less memory in case where no double
  • precision is needed
  • All IO objects also providing a loading operator are now
  • directly be saveable from the result tab
  • SimpleExampleSource is now able to automatically guess the
  • value types
  • The Attribute Editor has now some additional features:
  • Context menu on row: "Use row as attribute names" which
  • is nice for example for CSV files
  • Table Menu: "Guess all value types" which re-guesses
  • all value types which might be practical after declaring
  • one of the rows as names
  • Reminder during closing if the data file and attribute
  • description file were not saved before
  • New configuration wizards for more sophisticated input
  • operators like ExampleSource or DatabaseExampleSource
  • (available via the "Start configuration wizard..."
  • button of these operators)
  • New item in Tools menu "Show database drivers" which lists all
  • available JDBC drivers
  • JDBC drivers can now be defined via adding them to the
  • CLASSPATH or by copying them into lib/jdbc
  • Free JDBC drivers for MySQL, PostgreSQL, Microsoft SQL Server,
  • and Sybase included
  • The file resources/jdbc_properties.xml can be used to define
  • driver dependent settings like URL prefixes etc.
  • Improved the directly working on database mode (DatabaseES)
  • Improved data saving for ExampleSets
  • Added macro definitions. Macros can be defined with the
  • operator MacroDefinition and used with %{my_macro}. Several
  • predefined macros exist like %{experiment_name},
  • %{experiment_file}, and %{experiment_path}
  • The minimum and maximum colors for plotters can now be
  • specified in the properties dialog
  • Improved error messages for Weka learners and attribute
  • evaluators
  • Density and SOM plotters now support example visualization
  • Density and SOM plotters now use buffered images (more
  • efficient)
  • Allowing both attribute and example representations for
  • Series 2 Window Examples operators
  • Improved logging for both the message viewer and into files
  • Improved EvoSVM
  • Added several non-psd kernels for JMySVM and LibSVM as well
  • as support for returning the original optimization fitness
  • New operator dialog shows now deprecation information
  • Generating feature operator do now provide a parameter for
  • the total maximal number of attributes
  • PerformanceEvaluator: improved handling of input
  • performances
  • Robustified plotters in cases where the given data contain
  • missing values
  • An environment variable YALE_OPERATORS_ADDITIONAL will now
  • be regarded and set by the start scripts (for user written
  • operators)
  • IOConsume operator now allows deletion type "delete_all_but"
  • Implementation Details:
  • the method getInput(Class) of Operator / IOContainer
  • do now deliver the correctly casted instance (no casts
  • necessary any longer)
  • checkIO() of Experiment is now also able to check for
  • given input objects
  • Removed parameter number editors based on JSpinner
  • because of rounding and transformation problems (see
  • below)
  • Installer now uninstalls old versions
  • Windows launcher now allows external classpath settings
  • ExampleSet.getSize() is deprecated now, use size()
  • instead
  • ExampleSet.getExampleReader() is deprecated now, use
  • iterator() instead
  • Deprecation infos are now defined in operator in
  • operator description files
  • Bugfixes:
  • Fixed bug in Windows start scripts which did not allow
  • for space in filenames and paths
  • Attribute weighting schemes do now provide correct
  • error messages for missing label
  • IOContainer reading and writing did not work
  • Description of the column separators did not match
  • the actual implementation of ExampleSource and
  • SimpleExampleSource
  • Export did not work for unnamed experiments
  • Numerical parameter fields rounded to zero for small
  • values (only in YALE 3.3)
  • Better error message in case of non-decomposable data
  • sets in RVMLearners
  • SOM is now not longer applicable to data sets
  • containing missing values
  • In version 3.3 there was a problem introduced if YALE
  • should be started via "java -jar yale.jar" which did
  • no longer work without defining the property
  • yale.home. Should be fixed now
  • Additional performance criteria were not stored in XML
  • Added missing close statements for database handling,
  • prevent errors if already closed
  • Fixed bug during statistics calculation if a column
  • only contains missing values
  • Exception was thrown by feature binary generators if
  • the generated value was NaN or infinite
  • Fixed LibSVM model application bug for high class
  • skews

New in RapidMiner Studio 3.3 (Jan 18, 2010)

  • New operators:
  • Y-AdaBoost
  • Y-Bagging
  • MultiCriterionDecisionStumps
  • RVMLearner
  • Gaussian Process Learner
  • ExperimentEmbedder
  • OperatorEnabler
  • ExampleSetJoin
  • Numeric2Binary
  • Permutation
  • Removed operators:
  • JViToPlotter (added most important functionality
  • directly in YALE, other will follow)
  • Deprecated operators:
  • RenameAttribute (replaced by ChangeAttributeType
  • and ChangeAttributeName)
  • YALE is now available as exe-file for Windows systems
  • YALE now provides a Windows installer
  • Newest Weka version (CVS from 2006/08/04)
  • YALE now provides actual ensemble learners for more than
  • one inner learner
  • Search and Replace for XML tab
  • Save as "building block" in order to ease future experiment
  • setup
  • All validation operators are now able to optionally produce
  • the model of the complete data set
  • Changes log verbosity of command line operator from
  • MINIMUM to MAXIMUM
  • Overworked all parameter optimization operators
  • Double click on operator in tree view now toggle breakpoint
  • status
  • Users can specify a search string and capabilities in the
  • new operator dialog now
  • New operator dialog is not longer modal and provides an "add"
  • button. This allows for multiple operator insertions without
  • recreating the dialog (and its settings)
  • New operator tree properties allowing to filter disabled
  • operators or expansion of the complete tree
  • Debug mode which adds a breakpoint after each operator
  • Disabled operators are now more clearly marked
  • Default file extension for all IO files now
  • String property values will no longer be deleted when
  • editing is started, the value will be used after losing
  • the focus
  • Added support for automatic parameter optimization of
  • nominal parameters
  • Exceptions for feature filter (skip all ... but not ...)
  • (Meta) data views are now backed up by tables which are
  • much faster than old HTML views
  • Added new (high-dimensional) plotters and jitter function
  • for plotting, overworked old ones
  • More intelligent availability checks for plotters and
  • automatic downsampling if number of data points is too
  • high
  • Added support for plotting and logging nominal values
  • and parameters
  • Data set plotters can now also consider feature weights
  • Range of integer parameters now use infinity symbol
  • Total number of parameter combinations is now logged
  • (parameter optimization operators)
  • (Almost) all randomized operators can now use own local
  • random seeds
  • The current memory usage can now be logged as a value of
  • the experiment operator (root)
  • All internal kernel based methods now provide the same
  • data and plot view component
  • Faster conversion to Weka instances for sparse examples
  • Improves parameter guessing for Weka operators
  • Improved tutorial and added section about data creation
  • from Java applications
  • Implementation Details:
  • new package structure for feature operators
  • new package structure for operators
  • new package structure for GUI
  • new package structure for preprocessing
  • used now JUnit 4.1 for testing
  • code clean-up (no Eclipse-warnings)
  • ExampleTable is an interface now
  • copy-resources is now not longer necessary,
  • plugins have to place their resources in
  • edu/udo/cs/yale/resources
  • Statistics now renamed in DataTable (in new package
  • called datatable)
  • createName(...) of AttributeFactory now handles own
  • counters for each name
  • prepareRun() is now autumatically invoked and must not
  • be invoked any longer before the run of an experiment
  • Bugfixes:
  • Validation check did not work in all cases
  • escaped XML characters for attribute description file
  • writing
  • JMySVM, EvoSVM, MyKLR, and MultiModel cannot be read
  • from files (fixed)
  • Result file was not resolved against experiment
  • location
  • Tooltips for string parameters did not always have been
  • shown
  • Streams for result output were not closed
  • Temporary directories are now deleted at the end of
  • experiment if delete_temp_files is set to directly
  • (default)
  • Resize bug after changing the name of an operator
  • in tree view
  • Fixed problems if two ExampleSetGenerators with the
  • same target function were used in the same experiment
  • Removed unnecessary check during loading of sparse
  • examples
  • not all operators with inner loops did invoke
  • inApplyLoop()
  • Bugfix for IteratingOperatorChain if timeout was -1
  • Dynamic parameter %t did not work for filenames under
  • Windows
  • At the end of a a Pattern param opt run the result was
  • not properly created
  • Windows start scripts did not work if spaces were
  • part of the paths

New in RapidMiner Studio 3.2 (Jan 18, 2010)

  • YALE requires now JAVA 1.5 or higher
  • New operators:
  • ThresholdCreator
  • AttributeWeightsCreator
  • WeightGuidedFeatureSelection
  • CFSFeatureSetEvaluator
  • ConsistencyFeatureSetEvaluator
  • AttributeCounter
  • WeightedPerformanceCreator
  • CompleteFeatureGeneration
  • Series2WindowExamples
  • TransformedRegression
  • SimpleExampleSource
  • PCA (new version)
  • FastICA
  • GHA
  • ComponentWeights
  • HyperplaneProjection
  • SplitSVMModel
  • RemoveCorrelatedFeatures
  • WeightOptimization
  • TFIDFFilter
  • MinimalEntropyPartitioning
  • EvoSVM
  • PsoSVM
  • EvolutionaryFeatureAggregation
  • PlattScaling
  • SplitChain
  • All operator chains now define conditions which must be
  • fulfilled by inner operators.
  • New model concept: models which are used for prediction
  • purposes (prediction models) can now be combined with
  • models for preprocessing, e.g. a z-transformation model.
  • This allows for fairer evaluations without using information
  • about the training data which might have been collected
  • during preprocessing.
  • Preprocessing models, e.g. a normalization model can be applied
  • with the same parameters on the test set
  • Improved Operator Info Screen (F1) which now also shows
  • conditions for inner operators. This eases experiment design
  • for new users
  • PerformanceEvaluator adds new criteria to input
  • performance vectors now
  • Evolutionary feature operators supports multiobjective
  • optimization now
  • Feature operators now allow an arbitrary number of
  • inner operators
  • Added new VectorGraphics package (freehep) version 1.2.2
  • New Weka version 3.5.2 (current CVS version of Weka)
  • The attribute type "string" of Weka is now also supported
  • Renamed two parameters of SparseFormatExampleSource:
  • "attributes" is now called "attribute_description_file",
  • "attribute_file" is now called "data_file"
  • AUC as a parameter of PerformanceEvaluator instead of
  • ThresholdFinder
  • ExampleSetWriter now resolves the relative path of the
  • data file
  • Tutorial now reflects the development since Yale 3.0
  • More example filter types for ExampleFilter operator
  • Added filters for Data View
  • Added parameter sample_ratio to example source operators
  • Speed up of experiments by preventing IO logging if
  • not necessary
  • GUI does not hang any longer after stopping an
  • experiment and a message is shown that the current
  • operator will be finalized
  • all regression performance criteria can now handle nominal
  • labels regarding the confidence for the desired true class
  • relative_absolute_error now renamed to
  • normalized_absolute_error
  • Implementation Details:
  • YALE is now completely type safe, i.e. no warnings
  • occur by compiling with Xlint:unchecked
  • Population operators now work on objects of class
  • Individual instead of directly working on
  • AttributeWeightedExampleSets
  • Added method getSpecialAttribute(String) to
  • ExampleSet interface. This allows a faster retrieval
  • of special attributes
  • UndefinedParameterError will be thrown if an
  • operator asks for the value of a non-optional
  • parameter with no default value and no user
  • defined value
  • The abstract method checkIO of OperatorChain was
  • replaced by getInnerOperatorCondition()
  • Removed deprecated method initApply()
  • added new check method performAdditionalChecks()
  • reworked package structure for feature operators
  • improved memory management for BayesianBoosting
  • Replaced method getValue() of averagables (like
  • performance criteria) by getMikroAverage().
  • Operators should use getAverage() which returns
  • the makro average if possible and the mikro
  • average otherwise
  • Bugfixes:
  • Update of Data View did not properly work
  • ThresholdApplier did not properly overwrite the
  • crisp predictions
  • error in root mean squared error calculation for
  • data sets with different sizes
  • wrong plotting of threshold values in ROC curves
  • new operator was not properly selected after replacing
  • an operator via the context menu. Therefore the old
  • parameters were not removed in the GUI
  • LibSVM used Math.random() and was therefore not
  • deterministic
  • Replace " by " in XML parameter descriptions
  • In some cases the variance of a performance
  • criteria became negative. Fixed now.
  • Bug in RemoveUselessAttributes since attribute stats
  • were not longer calculated

New in RapidMiner Studio 3.1 (Jan 18, 2010)

  • New operators:
  • IOMultiplier
  • PerformanceLoader
  • T-Test
  • Anova
  • DataStatistics (usefull only for command line, see
  • implementation details)
  • Removed operators:
  • old parameter based Weka operators (were deprecated)
  • MultipleLabelLearner and
  • MultipleLabelPerformanceEvaluator (please use
  • MultipleLabelIterator instead)
  • Drastically reduced runtime (see implementation details)
  • Improved attribute editor (added views on data,
  • load series data, icons, nicer error messages)
  • Binary classification performance criteria mark the
  • positive class
  • Predict confidences for both binominal and polynominal
  • classifications tasks
  • Confidences are now automatically set after applying
  • a classification model for all learners, the parameter
  • use_distribution is therefore not longer supported
  • ExampleSetWriter can also write prediction confidences
  • now. The dense data format and the special format was
  • slightly adapted to reflect this change
  • Attribute ranges can also be specified in meta data view
  • Splitted default noise of NoiseOperator in label_noise
  • and default_attribute_noise
  • New Weka version 3.4.6 integrated
  • Nicer error messages for many data reading problems
  • IteratingPerformanceAverage can now handle all types of
  • averagable vectors and also more than one inner
  • performance vectors
  • The Yale color plotter shows now a legend with a mapping
  • of the colors to the values for these colors. This also
  • applies for the scatter plot based on the color plotter
  • Sanity checks before learning if the used learner can
  • learn from the given data set (using the predefined
  • learner capabilities)
  • Uses (p) for initialization probability of feature
  • selection algorithms instead of (1-p)
  • The counter for the automatic creation of attribute names
  • is resetted before an experiment will be started
  • A new breakpoint type for breakpoints in operator apply
  • loops
  • CSVExampleSource now uses the first line for attribute
  • names
  • Implementation Details:
  • The position of the Weka Jar file can now be defined
  • via an environment variable WEKA_JAR
  • Removed the construction of attribute weights from
  • example if this is not necessary (this drastically
  • decreases the desired time for example constructions)
  • Improved the calculation of example set statistics
  • Removed the recalculation of attribute statistics
  • after data changes. Statistics are now only calculated
  • if they are needed (including display purposes in the
  • graphical user interface)
  • Attribute is an interface now, different classes of
  • attributes introduced. As a consequence attributes,
  • can only be constructed with help of the
  • AttributeFactory class
  • Added a FastExample2SparseTransform class which
  • provides methods for fast sparse representation
  • creation, especially for SparseArrayDataRows
  • Removed check if an attribute is already part of an
  • example set before it is added. This also improves
  • runtime
  • FilteredExampleSet is now called ConditionedExampleSet
  • Failing during operator initialization (during start up)
  • does not prevent loading the following operators any
  • longer
  • Bugfixes:
  • Bugs in SparseArrayDataRow
  • Copy of IOContainer was shallow. This bug might have lead
  • to a wrong parameter optimization behavior for complex
  • feature selection experiments
  • Implemented missing method in ConditionedExampleSet
  • Fixed size bug in ConditionedExampleSet
  • Key strokes for cut, copy, and paste did not work
  • Syntax highlighting for description tag did not work
  • Opening a new experiment kills experiment thread now
  • Saving of settings did not always work
  • Changing from XML view to other views caused empty
  • status bar
  • Error in change detection after modifying the
  • experiment in XML view
  • Range update in data view did not work for two changes
  • at the same time

New in RapidMiner Studio 3.0 (Jan 18, 2010)

  • New operators:
  • FeatureNameFilter (using regular expressions)
  • FeatureValueTypeFilter (replaces FeatureTypeFilter)
  • FeatureBlockTypeFilter
  • operators for all Weka tasks instead of specifying
  • the Weka operator with a parameter (see below)
  • MultipleLabelLearning
  • MultipleLabelPerformanceEvaluator
  • MultipleLabelIterator
  • AverageBuilder
  • RenameAttribute (renaming and type changing)
  • Data generators for testing purposes
  • MinMaxWrapper for linear combinations of average and
  • minimum values (which might lead to more stable
  • optimizations)
  • CorrelationMatrix (which can also produce feature
  • weights)
  • SimpleBinDiscretization
  • SimpleFrequencyDiscretization
  • Single2Series
  • PerformanceWriter (in addition to the ResultWriter)
  • ParameterCloner
  • ParameterSetWriter
  • GridParameterOptimization (replaces old ParameterOpt.)
  • NelderMeadParameterOptimization
  • PatternParameterOptimization
  • ParameterIteration (which simply iterates through
  • given parameter combinations instead of optimize them)
  • IOConsumer (consumes unused outputs)
  • ARFFWriter
  • WrapperXValidation (replaces old MethodXValidation)
  • SimpleWrapperValidation (replaces old
  • SimpleMethodValidation)
  • NominalExampleSetGenerator
  • JViToPlotter (additional to build in plotters)
  • Removed operators:
  • The external operators for the C versions of MySVM,
  • SVMLight, and C45 are not longer part of the Yale
  • core. Please use the Java implementations JMySVM,
  • LibSVM, and J48
  • LegalNumberExampleFilter was replaced by the
  • operator ExampleFilter. This operator can handle both
  • missing values and user defined value conditions
  • MethodXValidation was replaced by WrapperXValidation.
  • The old operator was not able to handle mere feature
  • weighting methods additional to selection
  • ParameterOptimization (see above). In addition, the
  • parameter parameter_file was removed from all
  • parameter optimization operators
  • SimpleMethodValidation (see above)
  • FeatureTypeFilter was replaced by an improved
  • FeatureValueTypeFilter
  • BatchedValidationChain
  • Improved data management and statistics. Yale can handle
  • larger data sets now
  • Undo and Redo function
  • Several new performance criteria including MinMaxCriterion
  • for weighted linear combinations of the minimum and
  • the average of arbitrary criteria
  • Some operators are deprecated now. Deprecated operators
  • provide messages during application and validation and
  • should not longer be used
  • New plotter concept, introducing Yale color plotter,
  • GnuPlotPlotter for 3D plots, scatter plots, and
  • distribution plotter (histograms). Plots are only
  • automatically created for smaller data sets (settings)
  • In addition to the new plotter concept the operator
  • JViToPlotter can be used to plot some of the IOObjects
  • of Yale. The current version at least supports ExampleSet
  • and some numerical models
  • Syntax highlighting in message viewer and XML editor,
  • colors can be specified in the preferences dialog
  • New Weka version 3.4.5 integrated
  • New LibSVM version 2.8 integrated
  • Generic operator classes and operator sub types. This
  • allows the building of generic operators with one class
  • for several operators. This feature is used for the new
  • Weka operator style where each learning scheme matches
  • one Yale operator (and not a parameter of an operator)
  • Added Learner Capabilities. Each learning scheme can now
  • define which type of data set is supported by the learner
  • Added stratified sampling for cross validation on data
  • with a categorical label. This ensures that the subsets
  • provide the same class distribution than the whole data
  • set
  • Added several additional selection and crossover schemes
  • for evolutionary feature operators.
  • Learners and performance evaluators can now deliver
  • the input example set as output if this is desired.
  • This also applies for models and ModelApplier.
  • New structure of settings dialog
  • (Optional) Tip of the Day at startup
  • Automatical update check during start-up (once in a month,
  • no personal data is transmitted or collected).
  • Command line version waits at breakpoints and can be
  • resumed by pressing enter
  • Only a user defined amount of lines will be logged,
  • the default is 1000. This value can be changed in the
  • settings dialog
  • Since massive logging may slow down experiments the
  • default log verbosity for new experiments is "init"
  • Removed some verbosity levels which were not frequently
  • used
  • Plugins can also provide a GenericOperatorFactory in their
  • operator description file which can be used to register
  • additional generic operators
  • Improved operator group structure in GUI and package
  • structure
  • Improved Javadoc documentation, at least all classes
  • should have a class comment
  • Learners cannot write the model directly into a file any
  • longer. Please use the operator ModelWriter for this
  • purpose.
  • Implementation details:
  • ATTENTION: Since operators should know their own
  • operator description the usage of the empty operator
  • constructor is not longer allowed. Operators must be
  • created with
  • OperatorService.createOperator(String name)
  • The usage of empty operator constructors is not longer
  • allowed for operator creation!
  • Using Arff loader from Weka instead of KDB package
  • Changed the method name getIdAttribute() to getId()
  • in ExampleSet, some methods from Example were removed
  • Added a copy method to Parameters
  • It is now possible to query examples by their id
  • It is also possible to query examples by their index.
  • This is only recommended for memory based example
  • tables and should not be used for iteration purposes.
  • Each operator which must iterate through complete
  • example sets should use ExampleReaders. However,
  • this change allows Yale to construct Weka instances
  • on the fly which drastically decreases memory usage
  • Operators can now define the default behavior for
  • input consumption and a parameter will be
  • automatically defined and queried. This allows that
  • some operators (like validation chains or performance
  • evaluators) can pass their input (the example set
  • for example) to the following operators
  • Added two helper methods getDeliveredOutputClasses()
  • and getAllOutputClasses(Class[] innerOutput). One
  • of these methods should be used to return the
  • delivered output of an operator chain at the end of
  • checkIO(). These methods reflect the consumation
  • behaviour changes. Please refer to the Yale tutorial
  • for further informations.
  • The implementation of the simple feature selection
  • operators was improved. The memory usage is reduced
  • especially in case of forward selection
  • SparseArrayDataRows need less memory than
  • SparseMapDataRows with the same runtime. This
  • datamanagement type should be used if data is
  • sparser than 50%
  • Using sparse array data rows after Nominal2Binary
  • filtering
  • Bugfixes:
  • bug in unix start scripts (plugins were not properly
  • loaded)
  • variance adaption in feature weighting
  • wrong conversion from Weka instances to Yale example
  • sets for data sets with more attributes than examples
  • Bug in average handling of validation operators mixed
  • up weights and performance values for some feature
  • operation experiments
  • strange plotting of some example sets
  • validation of experiments containing disabled
  • operators
  • fixed bug in database handling which prevents feature
  • selection to work correctly on example sets based on
  • databases (csv and dBase too)

New in RapidMiner Studio 2.4.1 (Jan 18, 2010)

  • New operators:
  • RandomOptimization
  • New Weka version 3.4.3 integrated
  • The Unix start scripts guess the location for YALE_HOME
  • depending on the location of the script
  • Bugfix: The performance which was delivered by validation
  • chains (only for plotting purposes) was not the average
  • but the last performance. This error was the reason for
  • a wrong plot in the ParameterOptimization sample
  • experiment

New in RapidMiner Studio 2.4 (Jan 18, 2010)

  • New operators:
  • LearningCurveOperator,
  • StandardDeviationWeighting
  • PrincipalComponents,
  • WekaAttributeWeighting
  • C45ExampleSource
  • Obfuscator,
  • Deobfuscator
  • CorpusBasedWeighting
  • Removed operators: UPGMAClusterer and WekaClusterer are
  • now part of the Clustering plugin
  • Changed operators: the former implementation of
  • DecisionTreeLearner was removed since it was not able to
  • produce pruned decision trees. The internal representation
  • of Weka's J48 learner which was formerly known as Y45Learner
  • is now named DecisionTreeLearner.
  • Splitting of KDBExampleSource operator in four operators
  • which individually load ARFF, csv, bibtex, and dBase
  • files.
  • The parameter "mean_variance_scaling" of the
  • normalization operator is no longer of type category but
  • of type boolean.
  • The parameter $v[name] of the special format of
  • ExampleSetWriter can now be used for both regular and
  • special attributes
  • accuracy and classification error are now calculated for
  • both binary and multiclass problems. Additionally the
  • confusion matrix is displayed.
  • ThresholdFinder can deliver AUC (area under the ROC curve).
  • The maximum number of ROC-points which are plotted is
  • limited to 200.
  • All results are presented with the same number of digits
  • Forward selection (FeatureSelectionOperator) initially
  • checks if the used attribute are useless, i.e. all values
  • are equal, before it creates a new example set based on
  • this attribute.
  • Validation chains which split example sets recalculate
  • the attribute statistics. Therefore for each iteration
  • one data scan is performed. These additional costs are
  • paid to clearify the values and eases the usage of inner
  • operators which make use of the statistics
  • Implementation details:
  • recalculation of attribute statistics can be done
  • directly with a method from example set now instead
  • of the example table
  • The method initApply() of operator is now deprecated
  • The method getSpecialAttribute(String) of ExampleSet
  • was removed. Use getAttribute(String) for both regular
  • and special attributes.
  • Bugfix:
  • data writing of the experiment log operator at the end
  • of the experiment
  • statistics plot is removed at the beginning of a new
  • experiment
  • Y45Learner (now named DecisionTreeLearner) did not
  • allow to create unpruned trees
  • newline at the end of data files can now be omitted

New in RapidMiner Studio 2.3.3 (Jan 18, 2010)

  • New operators:
  • IteratingOperatorChain
  • Some new target functions for the ExampleSetGenerator
  • With %b the apply count value plus 1 can be asked in a
  • parameter string (%b% will be resolved to %a + 1)
  • Bayesian Boosting now supports internal bootstrapping and
  • provides the performance values to plot them with the
  • experiment log operator
  • Weka models are now displayed in the message viewer and
  • log files
  • Allow environment variable definition of the maximal used
  • memory in Windows start scripts (like the unix scripts)
  • Some tutorial additions
  • Bugfix:
  • Exception in toString() of tree
  • wrong command line construction for C45Learner
  • bug in status bar which increases CPU usage (low
  • priority) and does not show the correct operator
  • removal of nominal attributes in feature selection
  • experiments

New in RapidMiner Studio 2.3.2 (Jan 18, 2010)

  • New operators:
  • ExampleSetGenerator
  • Weighted mutation can be bounded between 0 and 1.
  • Scaling of the ROC curve plotted by the threshold
  • finding operator.
  • Bugfix:
  • Internal change of representation for nominal
  • attribute values. This guarantees the same order for
  • nominal values when writing a attribute description
  • file and reloading it.

New in RapidMiner Studio 2.3.1 (Jan 18, 2010)

  • New operators:
  • Sampling
  • BayesianBoosting
  • ThresholdFinder
  • ThresholdApplier
  • ExampleVisualizer
  • Examples with Id can now be displayed from the plotter
  • by double clicking the example. Therefore a
  • ExampleVisualization operator must have been added.
  • Pressing the delete key removes the selected operator
  • from the operator tree
  • Bugfixes:
  • Settings dialog displays correct default values at
  • startup.

New in RapidMiner Studio 2.3 (Jan 18, 2010)

  • New operators:
  • AttributeValueMapping
  • AverageLearner
  • LearnerFeatureGeneration (to create attributes from the
  • predictions of different learning schemes)
  • RemoveUselessAttributes
  • Removed operators:
  • ExampleSetInfo (use RemoveUselessAttributes instead)
  • ModelContainerLearner (use LearnerFeatureGeneration)
  • Concept Drift operators (in plugin now)
  • New online Tutorial available in help menu
  • Zooming functionality for all 2D plotters. Simply drag
  • a rectangle to zoom into the selected region. Right
  • clicking sets the range to maximum size.
  • Added user descriptions (comments) which can be edited
  • in the operator info screen (F1). The description of the
  • root operator is shown after loading an experiment. This
  • can be set disabled in the settings dialog.
  • Expert and Beginner modes added. In expert mode all
  • parameters are shown. In the beginner mode only important
  • parameters.
  • Save as Template added. Experiments which were saved as
  • template can be used by the wizard.
  • Using the LearnerFeatureGeneration produces a new example
  • set containing the model predictions as attributes.
  • Another learning scheme can be used to learn from this
  • values a meta model. Alternatively the new AverageLearner
  • can simply calculate the average of the predictions which
  • is especially useful in a selection or weighting wrapper.
  • Implementation details:
  • new packages in operator and learner packages
  • Bugfixes:
  • FixedSplitValidationChain delivered not the defined
  • absolute number of examples
  • Problems with classification tasks and Weka learners
  • due to Weka's new internal representation.

New in RapidMiner Studio 2.2 (Jan 18, 2010)

  • New operators:
  • AttributeSetLoader (instead of FeatureGenerationOperator)
  • ModelWriter
  • Y45
  • WekaMetaLearner
  • ModelContainerLearner
  • NoiseOperator
  • FourierTransform
  • Java versions of MySVM and MyKLR
  • AttributeSetWriter uses new format:
  • name::construction_description
  • This change was necessary to allow the weighting of
  • attributes after loading and constructing them
  • Operators can now be disabled
  • Weighting vector of JMySVM now delivered for linear
  • kernels. JMySVM can also deliver xi alpha estimation
  • of performance now
  • Interface Learner added, former learner super class is
  • now abstract
  • Weka independent internal implementation of J48 added
  • for own adaptions
  • Weka meta learning schemes can be designed in two ways:
  • With the known WekaLearner operator and by specifying
  • parameters and, now, by specifying them as operator
  • chains. This allows the same representation for all
  • Yale meta learning schemes with internal learning
  • operators as children
  • Meta learner schemes can be used to create new attributes
  • from the predictions of learned models
  • new system for objects which can be averaged like
  • performance criteria or weights
  • example sets are displayed in table view or plot view, the
  • same applies for models of JMySVM learners
  • Bugfixes:
  • Exception in unbalanced crossover
  • unused IO objects will not longer be doubled by the
  • usage of simple operator chains

New in RapidMiner Studio 2.1.1 (Jan 18, 2010)

  • Weka version 3.4.1 included. Since many of the learners
  • are now part of a new package, it might be necessary to
  • adapt the weka class names in your experimentfiles.
  • YaleIdMapping included. May be part of example sets in
  • future releases.
  • Breakpoints are now saved.

New in RapidMiner Studio 2.1 (Jan 18, 2010)

  • New operators:
  • IdTagging
  • InfiniteValueReplenishment
  • InteractiveFeatureWeighting
  • AttributeWeightsWriter
  • AttributeWeightsLoader
  • AttributeWeightsApplier for different weight functions
  • Attribute2RealValues
  • AttributeWeightSelection
  • Support for Word Vector Tool, Value Series Preprocessing,
  • and Clustering plugin
  • The definition of a label attribute in cases where the
  • data contains no label but should get a predicted one is
  • no longer necessary
  • attribute selection is now seen as attribute weighting
  • which allows more flexible operators. Feature operators
  • like forward selection, genetic algorithms and the
  • weighting operators can now deliver an example set with
  • the selection / weighting already applied or the original
  • example set (optional). Therefore all feature operators
  • delivers the new IO object "AttributeWeights", not only
  • the weighting ones. A weight of 0 means, that the attribute
  • should be deselected
  • more than one additional operator description file can be
  • specified with the -Dyale.operators.additional option by
  • using the system dependant path separator (e.g. ":" on
  • Unix systems)
  • Settings dialog
  • Bugfixes:
  • cut and paste bug fixed
  • sometimes data columns and headers in attribute editor did
  • not match. Fixed.

New in RapidMiner Studio 2.0.3 (Jan 18, 2010)

  • Added parameter additional_performance_criteria to
  • PerformanceEvaluator for specifying user-defined performance criteria

New in RapidMiner Studio 2.0.2 (Jan 18, 2010)

  • Error estimation for MultiClassLearnerByRegression
  • Bugfixes:
  • Attribute editor combo box did not respond to attribute
  • type changes

New in RapidMiner Studio 2.0.1 (Jan 18, 2010)

  • New operators:
  • MultiClassLearnerByRegression
  • %-expansion in parameter values:
  • %a replaced by number of times, the operator was applied
  • %t replaced by current system time
  • %n replaced by name of operator
  • %c replaced by class of operator
  • "Replace operator" context menu added
  • Replaced kxml by Java XML parsers
  • Removed DirectedGeneratingGeneticAlgorithm (DGGA)
  • Bugfixes:
  • GUI used to hang when stopping experiment at breakpoint

New in RapidMiner Studio 2.0 (Jan 18, 2010)

  • New operators:
  • DefaultLearner
  • WekaAssociationLearner
  • QuadraticParameterOptimization
  • GNUPlotOperator
  • ConceptDriftAdaptor
  • ForwardWeighting
  • EvolutionaryWeighting
  • UPGMA (tree clusterer)
  • BatchedValidationChain
  • ExampleSetInformation
  • Added attribute weighting
  • Added plugin support.
  • SVMLearner does not automatically remove NaN examples. (This
  • feature was actually never documented). Use ExampleFilter to
  • remove NaNs instead.
  • Added gnuplot support for GUI; added GNUPlotOperator
  • Operator 'ConceptDriftAdaptor' added for experiments where the data
  • used for a classification task has a concept drift in the concept
  • to be learned. While the concept drifts in experiments performed
  • with the 'ConceptDriftSimulator' are artificially simulated, the
  • 'ConceptDriftAdaptor' handles data with real concept drift (and
  • does not generate any additional artificial drift).
  • Allowed arbitrary names for special attributes.

New in RapidMiner Studio 2.0 Beta 2 (Jan 18, 2010)

  • Operators for concept drift simulation experiments and
  • several time window management and example weighting
  • approaches were added (see operators "ConceptDriftSimulator",
  • "BatchWindowLearner", "BatchWeightLearner").
  • Renamed global experiment parameter 'keep_temp_files' to
  • 'delete_temp_files'.
  • LegalNumberExampleFilter replaced by more general operator
  • ExampleFilter. By implementing ConditionExampleReader.Condition
  • users can specify arbitrary conditions.
  • SparseFormatExampleSource: New parameter "attributes" allows
  • for an attribute description file similar to the ExampleSource.
  • If the old behaviour is desired, the parameter "format" must be
  • set to "separate_file".
  • DatabaseExampleSource: Separate query file replaced by new
  • parameters ("username", "databasename", ...). In case of long
  • queries the query (and only the query) can still be read from a
  • separate file (still specified by "query_file"). If the password
  • should not be written to the config file, it is queried when
  • needed.
  • Yale can now directly work on databases without copying the data
  • to memory first (alpha version!!!). If this behavior is desired,
  • the parameter "work_on_databases" must be set to true and the
  • parameter "table_name" must be the name of an existing table. Be
  • careful with this option since it will change the database.
  • FeatureGeneration: New parameter list "functions" allows
  • specification of attribute generation and selection in config
  • file. Was formerly specified in separate file (still working).
  • ExampleSetWriter: Output in sparse format and arbitrary user
  • defined format now possible.
  • PerformanceEvaluator: "comparator_class" allows for user defined
  • comparators of performance measures
  • Performance criteria measure micro and makro average and variance.
  • Special "id" attribute now supported ( tag in attribute
  • description files).
  • Memory of unused attributes (e.g. intermediately generated
  • attributes, predicted labels in crossvalidations) freed.
  • Weka models can now be displayed graphically.
  • Implementation details:
  • JUnit tests added
  • UserError introduced and exception handling improved.
  • Refactoring eases extensibility for user defined
  • custom operators.
  • Tutorial operator description automatically generated
  • from the JavaDoc comments in the operator source code
  • and the operator self description.

New in RapidMiner Studio 2.0 Beta (Jan 18, 2010)

  • Graphical User Interface (GUI) added.
  • Configuration file '~/.yalerc' moved to '~/.yale/yalerc',
  • together with some other configuration files.
  • Root operators (= outer most operators) in experiments
  • must now be of class "Experiment".
  • The "group" attribute of the tags was replaced by the
  • tag, e.g.:
  • ...
  • All model applier operators were replaced by a single
  • "ModelApplier" operator for all models.
  • The "parentlookup" attribute of tags is obsolete.
  • The operator "SVMLearner" was renamed to "MySVMLearner"
  • (because Yale does not only support the mySVM by Stefan Rueping
  • as the only SVM implementation, but also supports the implementations
  • SVM^light by Thorsten Joachims and LibSVM by Chih-Chung Chang and
  • Chih-Jen Lin).
  • Some parameters were renamed (which can be easily checked in the GUI).
  • PerformanceEvaluator: 'criteria_list' replaced by boolean parameters
  • Experiment: 'tmp_dir' renamed to 'temp_dir'

New in RapidMiner Studio 1.0 (Jan 18, 2010)

  • Initial public release of the machine learning environment Yale
  • (Yet Another Learning Environment).