RapidMiner Studio Changelog

What's new in RapidMiner Studio 10.3.0

Nov 2, 2023

Features:
Interactive Decision Tree: Added way to switch between alternative splits when more than one split is possible for the selected node. The splits are cycled through in order of significance for the tree based on the selected measure, from most to least significant.
Interactive Decision Tree: Added button in the tree UI to show the split report for the selected split.
Interactive Decision Tree: Improved minimap by consolidating control buttons and allowing to hide the minimap
Enhancements:
Added header row parameter to the Read Excel and Read CSV operators
Improved import wizard behaviour w.r.t header row and starting row handling
Interactive Decision Tree: Find Split will now overwrite an existing split
The open process tracks now the repository changes to prevent losing or corrupting processes
CSV import now supports quoted multiline text via an optional flag
CSV import has improved date format input field handling
Improved the user error messages for fatal expression exceptions
Changed HSQLDB default URL prefix to point to the server (jdbc:hsqldb:hsql://)
Auto Model: Removed inactive deployment option in results overview
Bugfixes:
Fixed an issue that caused process files to become corrupted when using certain emojis in parameters
Fixed an issue which could break displaying Chinese or Japanese symbols at certain places in the UI
Fixed an issue that could lead to having another view active than selected. This happened when a broken view (corrupted file) was selected. Broken views are now deleted on start-up.
Database connections for databases which use nvarchar as string column type can now properly create tables
Fixed Ingres JDBC driver class default
Fixed an issue that could prevent right click on operators when older extensions were installed.
Fixed ANOVA Matrix result cell coloring to be in line with the description (colored cells are below the significance threshold)
Removed warning message in log during first startup in relation to missing recent data sets file
Auto Model: Fixed issue that could cause Auto Model to not show Prediction details for each individual model
Interactive Decision Tree: Fixed issue which could cause the node order to switch when editing splits in the tree
Interactive Decision Tree: The Preserve Existing Split option in the auto-grow dialog now correctly keeps existing splits below the selected node untouched

New in RapidMiner Studio 10.2.0 (Aug 16, 2023)

Features:
Added user interaction after a project was cleaned up on AI Hub
Ignore it and keep project disconnected
Overwrite local version by clean check out from AI Hub
Archive local changes and then overwrite local version as above
Added Delete Amazon S3 Resource operator
Enhancements:
Migrated Generate ID and Split Data operators to the new Belt data core, future-proofing them and improving their speed.
Added new setting in the preferences to control whether RapidMiner Studio should favour speed over memory footprint or vice-versa. It can be changed to reduce memory footprint while trading runtime if memory is critical. The setting can be found under System and is called Memory Management.
Further reduced start-up time of RapidMiner Studio:
Introduced lazy loading of operators
Improved utilization of operator signature cache
Introduced shallow plugin initialization
Added repository web action to go to deployment endpoints
Improved error recovery and error messages for date_parse_str function of the expression parser.
Trailing white spaces are no longer treated as errors in the expression parser.
Improved opening URL experience on certain Linux distributions which do not support triggering browsing programmatically
The Correlation Matrix operator now uses the new and improved subset selector
Bugfixes:
Fixed broken error messages in the Edit Expressions dialog that operators like Generate Attributes use to display the expression parser
Removed deprecated Stream Database operator (deprecated since version 7.5, six years ago)
Fixed bug in data splitting code that prevented empty partitions in some cases.
Fixed Synchronize Meta Data with Real Data not working even though it was selected. The selection is now remembered after restart.
When Synchronize Meta Data with Real Data is activated and the process has been run, Read operators like Read Excel and Read CSV remember the real metadata even if another operator is added to the process.
Fixed parameters stay above value and stay below value of Prescriptive Analytics operator
Fixed a possible concurrency issue when writing json IOObjects in parallel
Fixed potential access denied error for Read Azure Data Lake Storage Gen2 operator when reading larger files
Development:
Added com.rapidminer.repository.recent.RecentDataManager to allow global access to the recently used data sets. It comes with a listener mechanism and currently keeps track of data opened in the Results view, as well as used in the Interactive Decision Tree wizard.
Removed deprecated classes and methods pertaining to the old concept of managing Perspectives (including MainFrame#getPerspectives())
Added DeveloperTools#shouldDeveloperToolsBeShown() to allow for an easy way to check whether you want to offer developer tools of some capacity when appropriate
Fixed bug that caused TableMetaData#columns() to return a meta data sub-table with random column order
Fixed bug when registering IOObjects from operator signature
Plugins now properly also look up resources like icons from the default com/rapidminer/extension/resources path. The old additional lookup for com/rapidminer/resources is kept for compatibility reasons.
Deprecated: SwingTools#addIconStoragePath(String), it never worked

New in RapidMiner Studio 10.1.2 (Mar 23, 2023)

New in RapidMiner Studio 10.0.0 (Nov 8, 2022)

Features:
RapidMiner Studio now finally uses Java 11 as opposed to Java 8!
AI Hub X now also uses Java 11, and as a consequence, RapidMiner Studio X cannot connect to AI Hub 9 or earlier! Both Studio and AI Hub need to be upgraded to version 10!
Windows & OS X users will get the updated Java runtime automatically, but Unix users (or anyone using the platform independent release) need to provide Java 11 manually for running Studio.
Some extensions might no longer work with Java 11 and require an update, please check the Marketplace for updates.
Visualizations: Added ability to sort results when using aggregations in all charts where it makes sense. Sorting can be ascending/descending either on the aggregated result value, or the aggregation column name.
Time Series: Added the Windowing Model as a preprocessing model for the Windowing operator.
The model can be used to apply the configured windowing operation on any data set (having the same columns) by using Apply Model operator
The model can be grouped together with other models using the Group Model operator.
Cloud Connectivity: Added Google Drive operators to read, write, delete and loop files, as well as create folders.
Connectivity: Added Snowflake as a first-class citizen for database connections
Enhancements:
Added preprocessing model to Pivot operator
Improved High-DPI scaling on Windows
The tooltip for date-time entries in the result view now shows the time-stamp in ISO format (including potential nanoseconds)
Copy&pasting data from date-time cells is now consistent with what is displayed in the precise tooltip
Added setting to disable repository indexing for searching altogether via the Enable repository search indexing setting. This can be used for very large repositories or ones behind a slow network drive or when a virus scanner is involved
Time Series: Added the parameter sort time series to all time series operators where an indices column is mandatory or optional
If selected the input time series is automatically sorted before the time series operation is applied. The output of original ports will also contain the sorted data set.
Time Series: Improved UserError for indices attributes which are not sorted or has non-unique values
Bugfixes:
Fixed a problem where collections with empty sub-collections might not be readable
Fixed problems with empty (sub-) collections not being readable
Fixed problems with repeatedly extracting collections because of an incorrectly set timestamp
Fixed the storage of the LFS & editable flags in the repositories.xml file for projects
Fixed a problem where collections with empty subcollections might not be readable
Fixed issue that could cause an error when a better license was installed automatically
Fixed creation of new Google Cloud Services connection after the recent Google OAuth flow changes

New in RapidMiner Studio 9.10.10 (Jul 14, 2022)

New in RapidMiner Studio 9.10.8 (May 3, 2022)

New in RapidMiner Studio 9.10.7 (Apr 14, 2022)

New in RapidMiner Studio 9.10.6 (Mar 28, 2022)

Enhancements:
Fixed a memory & file leak when using large numbers of repeated JDBC connections
Visualizations: Added options to customize Wordcloud word orientations
Visualizations: Added Jamaica to the map collection
Updated postgres JDBC driver to version 42.3.2
Added skip inaccessible parameter for Loop Files to skip inaccessible files/directories, instead of a silent failure. If unchecked, the operator does not loop at all and will throw a proper error.
Stopping Loop Files is now always possible in a timely manner, even if you selected a directory with millions of files.
Updated H2 DB library due to security advisory
Added new parameter fitting error handling to the ARIMA Trainer operator.
In case of a fitting error during training, either a proper error is thrown or a fallback Default Forecast Model is provided.
Removed meta data warning for number of parameters is too large for the ARIMA Trainer operator.
Added new option to Amazon S3 connections that allows for much more flexible authentication schemes, like credential profiles and IAM roles.
Bugfixes:
Fixed character corruption issue with Read Database and Execute SQL when reading a query via a file from disk on certain operating systems
Fixed a memory leak when using database connections
Fixed a general file leak when using connections
Fixed a problem when creating dynamically suffixed attributes through the AttributeFactory in parallel
Fixed side effects for models when executing in parallel
Fixed an issue in projects that could sometimes cause Execute Process or Retrieve operators within parallel loops or similar setups to fail with an error message like "Cannot retrieve 'entry', it does not exist"
Fixed an issue that could sometimes cause Execute Process operators within parallel loops or similar setups to fail with a error messages like "Cannot connect to the RapidMiner AI Hub repository '_LOCAL'" when running on an AI Hub legacy repository
Fixed a wrong error, which was thrown during Apply Forecast when a Multiply operator was used on the Holt-Winters model
Fixed calculation errors for Holt-Winters models with additive seasonality

New in RapidMiner Studio 9.10.1 (Oct 25, 2021)

New in RapidMiner Studio 9.10.0 (Aug 12, 2021)

Features:
Added Function Fitting operator that can optimize parameters in a function of the attributes to fit the label. It can be used to create an optimal function to fit the data points in your data.
Bias Awareness: if the use of a specific column is more likely to add unwanted bias to your models, it is highlighted as such. This happens in various places such as in the Statistics view of data, the model simulator, in Turbo Prep, in Auto Model, during model training, in model annotations among others.
Enhancements:
The De-Normalization operator has a new parameter to also de-normalize predictions.
Based on attribute name: prediction(abc) tries to use de-normalization of abc if no explicit de-normalization available
The label (or other special attributes) can be included in normalization already in the normalize operator. The changes allow for multiple prediction attributes to be affected
Added date format parameter to Write CSV in case format date attributes is selected
Improved performance of Append operator
Handled yet another case of JDBC drivers ignoring the JDBC standard gracefully (here: Infor Data Lake DatabaseMetaData#getTypeInfo())
Introduced operator signatures to improve the startup of Studio
Signatures contain meta information that is used in operator registration, global search setup and documentation browser display
Signatures are persisted between starts for an improved startup time
Signature persistence can be configured or cleared with the setting System -> Local File Cache -> Keep Operator Signatures
Time Series: Enabled the usage of constant values for the replace types in the Equalize Numerical Indices and Equalize Time Stamps operators
The operators can now be used to fill gaps in non-equal data sets with constant values
Time Series: All Time Series operators (except for Multi Horizon Forecast, Multi Horizon Performance) now working with Belt IOTable (as in- and output)
Bugfixes:
In rare instances, operator parameters did not get saved correctly if a default value was set for it. This e.g. affected date parameters used in extensions.
Generate Attributes max and min functions do now always return missing value if any of the values is missing.
Fixed missing operator help for Azure Blob Storage and Data Lake Storage operators

New in RapidMiner Studio 9.9.0 (Mar 24, 2021)

New Features:
Data is the central piece in any RapidMiner process. The way RapidMiner internally deals with data has fundamentally changed in this release with the new Data Core (codename Belt). Its new columnar table representation provides a quantum leap in processing speed and memory efficiency for RapidMiner processes. Multiple operators already use it internally and it becomes fully available now for extension developers to create fast and efficient operators.
Added a Set Positive Value operator for the new Data Core which can make nominal attributes binominal or change the positive value of binominal attributes
Enhancements:
Replaced the Rename by Example Values operator by a new and improved version
Replaced the Rename operator by a new one that can additionally handle a renaming dictionary
Replaced the Sort operator by one that can sort by multiple attributes (currently already part of the Operator Toolbox extension)
Improved the FP-Growth operator so that it only works with explicitly defined positive values (either via binominal attributes or the positive value parameter) for items in dummy coded columns
Improved memory consumption of Cross Validation in certain circumstances
The operators Read CSV and Read Excel were improved to use the new data core
Pivot now supports Least and Mode aggregations for numerical attributes as well
Annotate now adds the annotations to the meta data as well
Added warning when trying to run a process on an AI Hub with a lower feature version than the current Studio version
Added a reason when displaying incompatible extensions in the dialog after startup to show why an extension failed to load. Details available via tooltip.
Upgraded integrated Chromium to version 84
Improved some metadata transformation w.r.t. nominal value sets
The splashscreen no longer shows duplicate extension icons during startup if more than one copy of an extension is installed
Visualizations now also support Least and Mode aggregations for numerical attributes
Improved concurrent execution in some corner cases
Deprecated the Exchange Roles operator
Model viewer for Gradient Boosted Tree models now respects the Number format settings in Studio preferences
Auto Model uses new clustering algorithms which no longer require one-hot encoding on the data set and therefore reduce the memory footprint for data sets with nominal columns with many values. As a result, users can no longer specify the minimum number of clusters in the X-Means case (automatic determination of the optimal number of clusters). The minimum is now fixed at 2.
Time Series: Added the option to ignore invalid values to the Moving Average Filter operator: Invalid values (missing, positive and negative infinity are now ignored when calculating the filtered value
This also results in valid values at the beginning and end of the filtered time series
As the Classic Decomposition and the Function and Seasonal Component Forecast are based on the Moving Average Filter, the also have now the "ignore invalid values" option
Bugfixes:
Fixed Data Table reading/writing when LFS light checkout is enabled
Fixed a problem where an uncaught exception could go through when using date/time attributes with values in the far future/past
Fixed an uncaught exception that could happen when the process run via Execute Process failed, the user opened it via the popup and ran it directly after fixing the problem
Fixed wrong attribute weights for Random Forest regression
Fixed error in Store operator when used after application of k-Means model
Fixed issue that Save dialogs did not accept any selection if a wildcard (.*) filter was provided (e.g. for Write Document)
Fixed Pivot meta data column names not matching the real data
Fixed missing text for the file restoring confirm dialog in projects
Fixed an issue that could cause Studio startup to silently fail
Fixed a possible error during startup w.r.t port preconditions on some operators
Fixed a bug that could cause project creation to not show an error and appear to do nothing
Removed check for preprocessing models in model deployments for custom models. This has been causing certain grouped models to fail if they contained models which have technically been not preprocessing models (e.g. PCA).
Time Series: Fixed a bug for the Lag operator, which caused original data to be changed at preceding ports as well
Time Series: Fixed some small errors in the description of two tutorial processes for Sliding Window Validation
Time Series: Fixed an error, which occurs in time-based windowing, when the end of the last window is equal to the last timestamp in the input data. This effects all windowing operators (Windowing, Process Windows, Forecast Validation, Sliding Window Validation).
Cloud Connectivity: File browser now adds the correct path separator character on Windows, and resolves macros properly for AWS, Azure, and Google Cloud file operators

New in RapidMiner Studio 9.8.1 (Dec 3, 2020)

New in RapidMiner Studio 9.8.0 (Oct 14, 2020)

New Features:
Utilize AI Hub 9.8 support for large files in Projects. Files with more than 10MB and stored ExampleSets are automatically handled to be versioned as expected, but stored more efficiently. This is backed by Git LFS, which means Python or R coders can continue to easily work with these projects as long as they have the Git LFS extension installed.
Time Series Windowing Update:
Added time based (window parameters are specified in time units) and custom windowing (start and stop values of the windows are provided by an additional example set) for all windowing operators (Windowing, Process Windows, Forecast Validation, Sliding Window Validation)
Added a few more parameters: expert settings (couples a few expert parameters into not shown, if it is not selected), windows defined (specifies from which point windows are defined), empty window handling
Changed the computation of the final model for the Forecast Validation and Sliding Window Validation operators to compute the model on a final window with the same size as the training windows and which ends at the last example of the input series
Time Series: Added new aggregation methods (median, maximum, minimum, standard deviation, variance) to Moving Average Filter
Cloud Connectivity
Added connectivity to Azure Data Lake Storage Gen2:
Read Azure Data Lake Storage Gen2
Loop Azure Data Lake Storage Gen2
Write Azure Data Lake Storage Gen2
Enhancements:
H2O:
New operator: K-Means (H2O), which implements K-Means clustering using the bundled H2O library. Key features include:
Estimate the optimal value of k, when a good initial guess is not available from the user
Built-in standardization and nominal encoding
Quick and memory efficient execution
Note: estimate k is strongly preferential to low k values. Make sure to double check results and if they are in line with expectations.
Newly created repositories and projects are now by default stored in the current users "Documents" folder. The location continues to be customizable on repository / project creation
When opening a process or RapidMiner file using "Open with..." RapidMiner Studio, the process will be loaded from the repository registered for the path. Process files that are not stored in a repository will be imported just like the menu item "Import Process" would
IOObject collections are now stored in a new, zip-based file format, ending with .collection
Incorporated a new library to better make use of system proxy settings if "system" is selected in the preferences, especially w.r.t. Windows and WPAD/PAC files. This will drastically improve the experience in complex corporate network setups
HTML5 safe mode is now way more performant
Upgraded Chromium binaries to version 79
Improved error message for remote repository creation (central AI Hub repository and projects) when the authentication is mismatched (user/password vs SSO)
Added Settings option to optimize internal file browser for mapped network drives
Time Series: Moved Moving Average Filter into the Transformation operator group and removed the obsolete Filter operator group
Time Series: Reordered the output ports of the Multi Label Performance and Multi Horizon Performance operators
Bugfixes:
Fixed wrong metadata after renaming in the new repositories and then creating a new entry with the previous name
Fixed rare issues that could cause problems when trying to view Visualizations on certain machines
Fixed Mixed Euclidean Distance for nominal values and Nominal Distance
A JNA library on the Windows PATH no longer results in an error
Fixed issue that could cause charts in the Deployments view to not show up.
Fixed problem that caused the legacy smtp password setting in the Preferences dialog to become broken when the dialog was saved more than once after changing the value. Note that this setting is not recommended anymore, use the new Send Mail connection instead.
Fixed a similar problem with the legacy connection UI encrypting passwords and tokens multiple times
Auto Model Results calculated on AI Hub can now be opened via Results view after the folder with all results has been moved/copied
Upgraded bundled JRE to 8u265
Deployments keep working now after the Server repository has been renamed
Fixed a problem where unsigned extensions could not make use of the new connection objects inside operators
Fixed potential IllegalArgumentException in Google Storage operators when running on Server
ExampleSets with huge nominal values can be retrieved again from the repository
Time Series: Fixed a bug in Equalize Time Stamps which caused an infinite loop in some cases when the calendar time was set to 'domain' and the input data consists of already partwise equidistant time stamps

New in RapidMiner Studio 9.7.2 (Aug 4, 2020)

New in RapidMiner Studio 9.7.0 (Jun 3, 2020)

New Features:
Added versioned projects which are tied to RapidMiner ServerYou can have as many versioned projects as you like, no limits! The versioning is backed by Git and can be accessed by any regular Git clientsThis means sharing between Python/R coders and RapidMiner users has never been easier!
Added dialog to select which version of a file to keep in case of a conflict in the versioned projects while getting Snapshots from Server.Versioning happens on a project levelAs you can now have as many projects as you like, this is the most sensible behavior because most of the time many entries are interconnected in a projectThus the entire state is saved and can be later restored, without having to worry about dependency versions.
Projects support ALL files you may have on your computer! You can put your .py scripts, your .md files, your .png files, your .pdf files, etc all into a projectIt will be neatly displayed in RapidMiner Studio.
Of course, all those files can be versioned together, so RapidMiner users and Python coders can share the same git repositoryThe Python coders can even use their native Git client to do so, no magic requiredThis will make collaboration between RapidMiner users and Python coders easier than ever before!
Processes in versioned projects can also be run and scheduled on RapidMiner Server as they can for an existing Server central repository
All the files live locally on your computer, but are also shared via GitThis gives you the performance of a local repository when working with it during prototyping, but also allows for easy collaboration with your colleagues.
Added new panel "Snapshot History" which allows to browse the history of your versioned projects, as well as see the changes you've made since the latest snapshotIt can also be used to restore an earlier state of the project, view past versions of individual files, and to restore those past versions.
ExampleSets are now written to disk in a new file format: HDF5This is a well-established format used e.gby the NASA to store large amounts of dataThis also means that Python and RapidMiner Studio can exchange data via HDF5 files much more easily and faster than ever before.
Local repositories that will be created with RapidMiner Studio 9.7 or later can also take advantage of supporting all files you may have on your computer (.py, .jpeg, .pdf, etc).
New operator Target Encoding which can remove nominal attributes with too many values and performs a target encoding (also known as mean encoding) on the remaining attributes
Auto Model: some processes (e.gSVM, FLM, or weight calculations) now use the new Target Encoding instead of one-hot encoding which reduces memory usage and run times
Time Series: New operator Integrate to integrate time series with different methods (cumulative sum / left and right riemann sum / trapezoidal rule)
Enhancements:
Both local repositories and versioned projects (tied to RM Server) have been completely rebuilt to get rid of many old limitationsBenefits include:
Enhanced throughput and performance
Better meta data caching
Concurrent access support
Displaying all files (no matter what they are, e.gPython scripts, images, ...)
Allowing different file types (e.gdata, processes) and folders to share the same name
Note: Your existing local repositories have (Legacy) after their name, indicating they still run on the old technology and still have some of the limitations! If you create a new local repository, it will have (Local) after its name and have all the capabilities listed aboveYou can copy your data over via Studio from the old repository to a new one to migrate.
It is now possible to have a folder with the same name as a data entry in the repository (might not work for some old repositories)
It is now possible to have a process and a data entry with the same name in the repository (might not work for some old repositories)
Replaced Send Mail operator with new version which supports file attachments
Improved memory usage for Aggregate and Pivot operators for nominal columns with potentially a lot of unused values
Improved dealing with whitespaces in repository entry names
Improved cleanup of temp files, to reduce disk space clutter when Studio runs for a long time, i.ein a Server environment
Made log tables in Result View behave more like other results, adding more actions and a shortcut to the context menu
Process background images are now using a relative path to the image if possible, instead of an absolute pathThis only applies for background images set from now on, it does not work retroactively
For binominal attributes the Statistics tab shows the positive and the negative value
Renamed RapidMiner Server to RapidMiner AI Hub
Opening/Moving the Process panel into the foreground when opening a process while in the Design view to make it more obvious something happened
Auto Model: remote executions on Server require the central repository as storage location
Turbo Prep: only local file based repositories can now be used as temporary repositories for the handover to Auto Model
Model Ops: only local repositories or central Server repositories can be used as storage locations for deployed models (also known as "deployment location")
Model Ops: keep unused and ID columns in the results after scoring
The operators Explain Predictions and Model Simulator now also support grouped models where arbitrary models have been grouped instead of only preprocessing models
The operator Explain Predictions now offers a parameter to limit the number of important features also for the "importances" output
Time Series
Added options to use padding for Fast Fourier Transformation and calculate the frequency of the amplitude value.
Added the option to specify negative lags for the Lag operator
Added the option to specify a default lag for a set of attributes (selected by an attribute subset selector) to the Lag operator
Unfortunately due to parameter key incompatibilities, old version of the Lag operator is deprecated and new version with the same name, but different operator key is added.
H2O
Updated H2O library to version 3.30.0.1.
Added monotonicity constraints to Gradient Boosted Trees
Added weights port to Deep Learning
Expanded whitelist of accepted expert parameters, now supports all parameters provided by H2O
Deep Learning and Logistic Regression now work with datasets that have nominal columns with only one value
Bugfixes:
Fixed an issue that could cause Studio startup to never complete
Made Studio startup more rigid to quit process instead of silently hanging on the splash screen forever
Fixed issue that could cause panels to sometimes not open if they had been closed previously in this session
Fixed an issue that caused CTAs not working when HTML5 safe mode was enabled
Fixed an issue with back propagation of changes to performance vectors
Fixed a problem for JDBC drivers that do not implement a certain set of functionality by adding a fallback (e.gSQLite writing)
Fixed potential cause for complete UI freeze when interacting with a CTA notification banner
Fixed an issue with process navigation and property panel if operator names contain HTML
Generate Multi-Label Data does now correctly work in non-regression mode
Fixed memory leak caused by the Visualizations
Fixed rare issue where data sets could not be downsampled automatically if license limit was exceeded
Fixed an issue in Automatic Feature Engineering if all input features have been nominal in the feature selection case
Fixed "Edit Access Rights" dialog for Server repositories not getting the permissions correctly when using Enterprise SSO
Fixed an issue that caused Studio to lag and increase memory consumption when using the right-click "Insert operator" popup menu in the Process panel.
Fixed broken replacing (instead it was duplicated) on move of data entries to a different repository
Auto Model: remote executions show new submission screens now which only allows the reset of Auto Model to load the results which avoids problems with multiple remote submissions within the same session
Auto Model: reordering the columns in the column selection table no longer lead to graphics problems
Time Series: Fixed a bug in Extract Peaks, that causes all "_position" features to have an offset of 1 to the Example number

New in RapidMiner Studio 9.6.0 (Feb 26, 2020)

New Features:
Added buttons for copying/pasting the active process to the process toolbar. To make some room for it, removed the "Fit to size" button from the process toolbar (it is already in right-click menu)
Equalize Time Series
Added two new operators (Equalize Numerical Indices and Equalize Time Stamps) which provide the functionality to equalize input time series. The output time series will have new equidistant index values. The operators provide different possibilities to configure the number of examples, the start value and the stop value and the step size of the new index values. The corresponding values of the output time series are computed by using a Replace Missing Values (Series) operation.
Equalize Numerical Indices: Equalize numerical indices into equidistant numerical indices with a numerical step size.
Equalize Time Stamps: Equalize date-time indices into equidistant date-time indices. Either with an exact duration (with millisecond precision) as the step size, or with a period (multiple of days, weeks, months or years) as the step size.
Peak Transformations:
Added two new operators (Z-Score Peak Transformation and Highest Peak Transformation) which perform a peak detection and transformation on time series. They detect peaks in a time series and add an indicator peak series (with the values -1,0,1 as peak flag values) and a peaked series (original values if a peak was detected, missing for non-peak areas).
Z-Score Peak Transformation: performs the peak detection by calculating the local mean and standard deviation and identifies values as peaks when they have a large deviation to this local mean
Highest Peak Transformation: performs the peak detection by dividing the time series in different areas and checking if local minima and maxima are valid peaks or only noise effects.
Peak Feature Extraction:
New operator Extract Peaks which performs a peak detection (by utilizing one of the new Peak Transformation operators and extracts features describing the peaks)
Added optional custom endpoint parameter to Amazon S3 connections. This enables you to use an S3 API compatible storage service other than Amazon S3.
Deployments / Model Ops:
All custom prediction models are now supported in model ops, i.e. models created with the Design view, in addition to Auto Model models
Grouped models are now supported as well which allows combinations of preprocessing models with a prediction model
Model Simulator in Deployments now uses raw data columns as input and performs data prep on the fly
Offer setting if scores should be explained (about 100x faster without), new deployments will have this disabled per default, existing deployments enabled
Show if scores should be explained in overview table
Model Ops initialization happens in background now – no longer blocking UI start of RM if a remote location is not available (anymore)
Some speed improvements for model ops (less objects are loaded from repos which makes things a bit faster for remote deployments
Model Simulator operator now also supports grouped models
Enhancements:
Connections to external data sources like Cassandra or MongoDB are now properly re-used (within reason) and closed when a process is finished. This should lead to less connections to an external data source when using loop constructs, as well as properly closed connections after a process if finished.
Windows and OS X builds now ship with OpenJDK (version 8u232)
Added new timezone parameter to JDBC connections. Note: date handling in databases (and generally) is a tricky subject, and there are quite a few ways to make mistakes while doing so. Some databases/JDBC drivers also don't implement date handling properly. Last but not least, keep in mind that a date_time/date is a fixed point in time, but when it is displayed in a more human readable format than "milliseconds since 01-01-1970 UTC", the display string is converting that instant to your display timezone. So even if for example a date is 13th of Jan in UTC, you may see 12th of Jan when viewing it in Australia, due to the display timezone offset. The actual point in time (milliseconds since 01-01-1970 UTC) however would be identical. See documentation for further information.
When parsing a string to time with Nominal to Date, the associated timestamp now represents that time on the 1st of January, 1970 instead of 1st of February 1970
Added Default User-Agent setting to Preferences / System
Updated MariaDB JDBC driver
You can now see which Java version is being used when looking at the "About" dialog
Improved meta data warning in case the time series attribute selection of time series operators is empty
Added option to autodetect S3 region in Amazon S3 connections
Improved Google Cloud Services connection UI
File chooser icons on OS X are now also supporting HiDPI
When removing a repository, the repository.xml file now gets updated immediately
Visualizations: Tick interval input field now allows to set much larger values for datetime axes as its using milliseconds as a unit to split the chunks
Updated the Step by Step In-Product Tutorial content
Added more search tags to various performance and aggregation operators
Improved error message when download/deserialization of data from a remote repository occurs
Improved error message when SSL certificate was invalid when attempting to connect to a RM Server repository.
Improved logging when trying to connect to a RM Server and unusual exceptions occur, e.g. more details about why SSL connection failed, what the network problem is, etc.
Bugfixes:
Fixed issue that could cause Studio to stop starting and be stuck at the splash screen forever.
Fixed an issue where storing datasets in a database using the automatically created primary key was not possible.
Declare Missing Value no longer crashes if the expression mode is selected and the expression itself returns a missing value. Instead, it will evaluate to false and thus NOT set a missing value for that row.
Fixed models and other IOObjects coming from extensions not being identified correctly in Server repositories.
Fixed Auto Model not being able to use results of a Join operator in some cases.
Fixed broken properties when storing data tables in rare cases.
It is no longer possible to create RapidMiner Server repositories with an invalid name.
Filter Examples now correctly resolves all macros in parameters, including in custom filter attribute names.
Fixed error that could sometimes cause result tables not being able to move to Auto Model via the button in the Results tab.
Fixed an issue that caused Visualizations to not appear on certain Linux systems.
Fixed file chooser icons on OS X.
Fixed bug for scoring in Deployments: if column types are incompatible, they are actually dropped now (which was documented as such but did not happen)
Auto Model will now be restored if the user cancels a deployment by closing the deployment dialog
Other:
It is no longer possible to create legacy connections and other connections which have been replaced with the new repository connection objects in RapidMiner 9.3. Existing connections can still be edited and used, but this functionality will be removed eventually as well. Make sure to migrate existing legacy connections to repository connection objects! See documentation for reference.
Development:
Added caching for connections based on ConnectionAdapterHandler to reduce connection count and give possibility to clean connections up after it is no longer needed (e.g. the process is finished).
GlobalSearch is no longer available in headless mode (aka command line, job container execution, etc)

New in RapidMiner Studio 9.5.0 (Nov 20, 2019)

New Features:
Added ability to upgrade RapidMiner Studio independently from Server. You can now connect to and access data and processes on older Server versions (9.0 and above) with any current or future Studio version! Processes and data are stored as-is on Server, which enables effective collaboration with your colleagues. However, you need to be aware that while you are able to store processes with brand-new operators on older Servers, you obviously can only run processes that consist of operators that the old Server knows about.
Deployments: Deployments can now be copied from one to another deployment location (for example from a test to a production server)
Enhancements:
Improved performance of Principal Component Analysis and Weight by PCA operators
The Import Data dialog now detects files with non-lowercase file extensions
Fixed view order for Deployments view
Visualizations: Fixed various issues that could cause Studio startup to fail
Visualizations: Fixed various issues that could cause them to not be displayed properly
Auto Model: Decision Tree and Random Forest are now using the latest (faster) implementations for regression problems
Auto Model: Increased the number of rows for which local explanations are turned on by default
Auto Model: Loading results from a folder are now adding them to the result list as well
Auto Model: Shows the total number of feature sets and generated features on the overview as well if automatic feature engineering has been turned on
Auto Model: The performance tab now shows the gain calculations based on the confusion matrix instead of the predicted data set
Auto Model: New deploy button in the overview table for each model
Auto Model: Clicking a model in the overview table will show the details for the selected model
Auto Model: Prevent another deployment while another deployment is currently performed
Turbo Prep: Nominal column handling is now consistent to the default behavior of Auto Model
Turbo Prep: Sort first join keys alphabetically instead of by ID-ness
Google Storage connection is now replaced by the more general Google Cloud Services connection that can connect to all supported Google services (Google Cloud Storage, Google BigQuery [requires In-Database Processing extension]). Just select the access scopes you want to use
Bugfixes:
Fixed all predictions being 0 in Transformed Regression when using no transformation and no z scale
Fixed the Import Data dialog failing when trying to read an XLSX file which did not have a lowercase file ending
Repository location chooser now opens as expected if the process is stored in a read-only repository
Visualizations: Exporting as PDF now also works without internet access
Visualizations: Fixed broken names in Czech Republic map
Time Series: Added error handling if the indices attribute was also selected as a time series attribute, or as the horizon attribute
Deployments: Fixed a bug which broke a local installation of Model Ops after the user connected to an existing remote location with email connection
Model simulator now works with date / time columns again
Development:
Improved exception handling when Belt tables cannot be converted to example sets

New in RapidMiner Studio 9.4.1 (Sep 26, 2019)

New Automated Model Ops:
Follow the fully automated data science path: prepare your data using Turbo Prep, create prediction models via Auto Model and finally put them into production with Model Ops.
Deploy the most promising models with one click and score new data via flexible web services or in the UI.
Track model performance on an intuitive dashboard and swap easily to the best performing one. Setup an email alert to get notified if a model outperforms the one in production.
Evaluate each model with respect to their financial impact instead of pure Data Science metrics.
Detect changes in data and their impact on model performance early to address problems.
Use our integrated dashboard to keep track of data drift and model performance.
New map visualizations:
Visualize geospatial data with the new map visualizations. You can choose from multiple map types with many different configuration options, as well as dozens of maps for geographic regions, continents, and countries. Available map types:
Choropleth maps: Used to display numeric values associated to regions (e.g. a country or a state) via a color gradient
New Automated Model Ops:
Follow the fully automated data science path: prepare your data using Turbo Prep, create prediction models via Auto Model and finally put them into production with Model Ops.
Deploy the most promising models with one click and score new data via flexible web services or in the UI.
Track model performance on an intuitive dashboard and swap easily to the best performing one. Setup an email alert to get notified if a model outperforms the one in production.
Evaluate each model with respect to their financial impact instead of pure Data Science metrics.
Detect changes in data and their impact on model performance early to address problems.
Use our integrated dashboard to keep track of data drift and model performance.
New map visualizations:
Visualize geospatial data with the new map visualizations. You can choose from multiple map types with many different configuration options, as well as dozens of maps for geographic regions, continents, and countries. Available map types:
Choropleth maps: Used to display numeric values associated to regions (e.g. a country or a state) via a color gradient
Categorical maps: Used to visualize regions that belong to a number of distinct categories
Point maps: These maps offer latitude and longitude support and display a marker for each coordinate on the selected map
Improved Auto Model
Auto Model features several improvements under the hood as well as a few more visible enhancements:
All predictive processes generated by Auto Model are now much cleaner, well-structured, and can be understood way easier.
Cost-sensitive learning has been added to show the costs / benefits in the validation result. This allows to solve problems (e.g. fraud detection) that involve highly imbalanced data sets (e.g. credit card transaction data).
New data prep and modeling capabilities:
Several new operators have been added to ease and enhance data preparation and machine learning:
New operators Replace All Missings, Handle Unknown Values, One Hot Encoding and Append (Robust) to easily prepare data for modeling and scoring.
New operator Rescale Confidences (Logistic) to rescale confidences even for classification with more than two classes.
New operator Cost-Sensitive Scoring: Novel approach for cost-sensitive learning which works for more than two classes.
New operators Multi Label Modeling and Multi Label Performance to train and validate a combined model for multiple label columns in a single step.
Enhanced time series forecasting:
New operators have been added for
Forecasting multiple horizons of a time series with any machine learning model (Multi Horizon Forecast)
Validating performance of multi horizon forecasts (Multi Horizon Performance)
Sliding window validation for time series data science problems
Enhanced data source connection framework:
All RapidMiner-supported connectivity extensions on the Marketplace now use the new data source connection framework, which includes handling connections to
MongoDB
Cassandra
Splunk
Solr
Mozenda
Enhancements and bug fixes:
The following pages describe the enhancements and bug fixes in RapidMiner Studio 9.4.1 releases:
Categorical maps: Used to visualize regions that belong to a number of distinct categories
Point maps: These maps offer latitude and longitude support and display a marker for each coordinate on the selected map

New in RapidMiner Studio 9.3 (Jun 17, 2019)

New Features:
Completely reworked how connections (JDBC, as well as any other connections like Twitter, Amazon S3, Dropbox, etc.) work:
Connections are now self-contained and stored per repository. This means that when you create a connection, everything you need to use it will become part of the connection entry in the repository.
We have added great flexibility when it comes to injecting certain settings of a connection on-the-fly by having added so called Sources for values. The settings can be anything from credentials, to URLs (or part of URLs), and other parameters. For starters, only Macro and RM Server Vault are available as Sources, but the list will grow over time as any extension can add their own Sources!
Have a central DB connection where each user should use his own credentials? Create the single connection template on RM Server, indicate that the credentials are injected, and then use our new RM Server Vault as a Source where each user can securely store their credentials!
You can now easily share a connection with your colleagues via a Server.
They will also work on any execution node, without you having to manually add the JDBC driver to all nodes yourself.
To sum it all up, connections are now vastly more powerful than before. They are no longer necessarily statically defined, but instead they can be dynamically altered during runtime to grab the latest credentials, tokens, etc. Of course, you can still put everything that is needed into the connection and be done with it.
Not all features of the new connections are accessible through a UI. For extremely advanced and powerful features like chaining different value providers for injection (e.g. Server Vault → CyberArk → DB) or using (injectable) placeholders that build up values of other keys, administrators can create the connection manually (it's a ZIP archive, after all). They can create the configuration JSON to suit their needs, and then upload the ZIP to RM Server. This, together with the injection mechanism, makes connection templates a reality, allows admins to manage connections at scale, utilizing commandline tools to build up and distribute them.
The entire mechanism for connections and their Sources is highly extensible and new Sources and connection types can easily be added by extensions. We foresee a whole host of new connections and Sources to become available over the next few months.
Auto Model can now be executed on RapidMiner Server instead of locally
Users can select if the execution should happen locally in RapidMiner Studio or if processes should be pushed to a connected Server instead. The latter allows to close RapidMiner Studio and fetch the results later from the Server instance.
Jobs can be added to any queue the user has access to.
Results will be stored on the Server and can be loaded back into Auto Model after completion. Loading of partial results is supported as well.
If RapidMiner Studio is kept open while the execution happens on the Server, results will be loaded dynamically, and the progress is shown. The execution of all remote processes can also be stopped in this case.
Time Series Analysis features:
New Default Forecast Model
Predicts always the same forecast value for all future values
Can be used as a baseline model to compare other forecast models against it
New operator Default Forecast:
Trains a Default Forecast Model
The forecast value can be calculated by last value, mean in window, median in window or mode in window
Last value and mode in window can be used to even create a forecast model for nominal time series
New function and Seasonal Forecast Model
Predicts future values by evaluating a polynomial function to forecast the trend of a time series
Adds or multiplies the values of the seasonal component to the forecasted trend values
New operator function and Seasonal Component Forecast
Trains a function and Seasonal Forecast Model
The operator performs a decomposition (Classic- or STL Decomposition) to determine trend and seasonal component of the input time series
A polynomial function is fitted to the trend component
The function and the seasonal component are provided as the function and Seasonal Forecast Model to the model output port
New operator Autocorrelation / Autocovariance
Calculates dependency functions (autocorrelation function, autocovariance, partial autocorrelation function) for an input time series
Enhancements:
Write Excel now supports creating multiple sheets. Sheet names can be specified via the sheet names parameter
Write Excel now supports collections of example sets as input
Added Close all other results action for Result tabs, found in the right-click popup menu
Improved handling of mandatory parameters that were not set
Meta data from repository entries loaded by Retrieve operator are annotated with the repository location
Added forward macro checkbox to Schedule Process which allows you to forward all current macros from the calling process to the scheduled process
Write Database now defaults to a batch size of 100
The operators Map, Replace and Rename by Replacing now have a more convenient regex dialog that can store the replacement value as well
Added new function under Advanced functions named attribute(Nominal attribute_name) to the expression parser. This function evaluates the input and retrieves the value of the attribute with the name specified by the (resolved) input.
Added a new option Insert as attribute for inserting macros in the UI of the expression parser (e.g. for Generate Attributes).
Improved meta data for Nominal to Binomial for attributes where the nominal mapping is not clearly defined
Explain Predictions now offers the calculation of model-specific global weights based on the level of support and contradiction each attribute value contributes to the local explanations
Turbo Prep now uses the new visualizations for its Charts view
Auto Model now tracks more run times, including the time needed for scoring 1,000 rows and training the model on 1,000 rows in addition to the total process execution run time. The overview table also show small badges pointing out the best and fastest models
Auto Model now offers to save all results at the end of a local execution. Those results can be loaded instead of re-running the modeling
Auto Model now offers a list of recent data sets as well as a list of recent results as part of the first step
Auto Model now offers to override the selection of columns for text processing
Auto Model now shows the number of created models, the number of evaluated feature sets, and the number of generated features during a run
Auto Model now shows the importance of all attributes for each model in addition to the model-independent global weights in the General section of the results
Visualizations: Bubble charts (Scatter with a size column) can now display more than 5,000 data points
Visualizations: Scatter3D now also supports a numerical color column
Visualizations: Scatter Matrix now also supports a numerical or date_time color column
Visualizations: Added the highly requested color group option to line/bar/column/area/streamgraph plots. Each distinct value in this column becomes an individual plot element, to allow for easy logical grouping of data without pivoting. The column can be of any type.
Visualizations: Aggregation group-by now also supports numerical columns, it will take each distinct number and convert it to a category
Visualizations: If the group-by column is numerical or date-time, the groups are now sorted in ascending order
Visualizations: X-Axis column and aggregation group-by column are now linked, i.e. changing one also changes the other. This makes switching between aggregation/no aggregation more intuitive and easier to follow
Moving Average Filter now offers to specify the left and right side of the simple filter individually instead of being symmetric
Improved operator help for Loop Examples
Added a positive class parameter to Performance (Binominal Classification) which lets the user manually decide what the positive class is.
Visualizations: Heatmaps with aggregation enabled can now also be grouped by two columns at the same time, resulting in a 2D table-like structure with cells for each value combination of the two group-by columns. If you want to plot multiple value columns, you can still group by a single column as before.
Copied/pasted operators that have references to other copied operators will now correctly update their parameters.
When replacing an operator in place, parameters that are shared between both operators are kept.
Repository entry copies are now simply enumerated at the end of their name instead of suddenly starting with "Copy of". This will make finding the copy in large repositories much more straightforward
Repository entries can now be directly copied in-place without having to select a target folder first
Updated default Oracle jdbc driver class
Bugfixes:
Fixed a rare bug in Log operator where a process seemingly was not stopping when it was done
Fixed a rare bug that could freeze the UI
Switching tabs is now only possible with a left-click
Fixed schema retrieval in the parameters for some databases (e.g. MySQL)
Fixed rare exception in automatic sparsity detection when creating example sets via the new data core
Fixed error that could occur when starting Studio in relation to Academy Global Search entries
Fixed error message display in expression property dialog for very long errors
Fixed Real to Integer when encountering infinity values
Fixed a bug in Compare ROC that deleted prediction/confidence columns in the input example set in some cases
Fixed handling of non-finite values for integer and real column grouping attributes in Pivot
Fixed UI becoming broken when the macro sort order in the Context panel was changed, an empty macro was already in the context, and the user tried adding another macro
Fixed a problem that could result in Studio endlessly starting when switching between Win32 and Win64 versions on the same machine
Fixed links to educational materials in Auto Model and Turbo Prep
Fixed rare bug which could occur for Automatic Feature Engineering if feature generation was enabled with high complexity settings in combination with H2O models
Visualizations: OS X 10.11 will now have working HTML5 visualizations again
Visualizations: Fixed matrix data (e.g. correlation matrix) visualizations showing the wrong chart type
Visualizations: Fixed Scatter3D dots sometimes not being displayed
Fixed rare cases that no correct Exception was thrown in Extract Aggregates, Extract Mode and Extract Coefficients (Polynomial Fit)
Fixed expected input for the inner 'model' port of Forecast Validation
Fixed run-time problems in Replace Missing Values (Series)
Fixed the Retrieve operator to update output meta data after a repository entry was removed or created
Remove Unused Values now also sorts mappings that do not have unused values
Link button icons no longer look pixelated on macOS
Visualizations: Wordcloud now takes actual number of distinct words into account for the limit check, instead of also counting words that do not actually occur
Dialog about cancelling Progress threads with dependent tasks is now shown in front of the Progress dialog
It can no longer happen that Progress threads are still displayed in the Progress dialog even if they are already done
Development:
Added SwingTools#setPrompt(String, JTextComponent) method which can be used to set a prompt in a text field (gray help text displayed when the field is empty)
Added com.rapidminer.gui.actions.CopyStringToClipboardAction which can be used to copy any dynamically supplied string to the system clipboard
Added com.rapidminer.gui.ProgressThread#setDependencyPopups method to prevent popups that ask about aborting Progress threads with dependent tasks

New in RapidMiner Studio 9.2.1 (Mar 19, 2019)

New Features:
Converted old simple charts of the following data types to the new HTML5 visualizations: Weights, Kernel Models, Correlation Matrices, and Rainflow Matrices.
Added RapidMiner Academy learning content to the Global Search.
Enhancements:
Removed unnecessary process validations while editing operator parameters
Visualizations: Histograms now support datetime columns
Visualizations: Treemap no longer forces a name column (although it obviously makes a lot of sense using one)
Visualizations: Boxplot now has an y axis description if Group by is used
Visualizations: Boxplots can now only support 100 value columns at the same time, down from 500 (which was unreadable)
Visualizations: Large heatmaps (> 10,000 values) now render considerably faster and are less sluggish. If you plot more than 1 million values on a heatmap, it will however still take a while. Note that this means that large heatmaps can no longer be exported as an SVG (it will be an image instead).
Visualizations: Heatmaps can now support 500 value columns at the same time, up from 400.
Visualizations: Linear regression lines in Scatter plots now show their linear function in the tooltip
Visualizations: Improved some heuristics behind the automated chart selection when opening data for the first time
Visualizatons: Fixed missing chart update animations for some settings
Visualizations: Added reset button to text fields with custom input (e.g. axis min/max values) which resets the field back to its default value
Visualizations: Should now work on vanilla Ubuntu 18.04 out of the box, without having to install "libgconf-2-4"
Visualizations: Fixed possible freeze of the chart when looking at tooltips of a Vector plot
Visualizations: Attribute values can be a bit longer now before they are cut off
Bugfixes:
Fixed slow popup dialog for selecting attribute subset and user defined attribute ordering
Fixed deadlock in statistics calculation in case of 6 or more result sets being opened at the same time
Fixed some memory leaks
Fixed Apply Feature Set returning wrong meta data
Fixed an issue that some User Errors did not appear correctly
Fixed an error that sometimes appeared when stopping parallel operators, e.g. Optimize (Grid)
Fixed an issue where encrypted settings could not be read
Fixed an issue where operator notes were not moved when the operator was pushed out of the way
Fixed a crash issue with Print/Export Image function on macOS
Fixed a rare bug where CTAs would flicker in a multi monitor setup with Studio showing in a monitor above the primary monitor
Fixed meta data for integer or date-time column grouping attributes in Pivot
Time Series: Replace Missing Values (Series): fixed case when 'replace infinity' is true and 'skip other missing' is false, that empty strings or infinity values are replaceed with missings instead of just keeping these values.
Time Series: Extract Coefficients (Polynomial Fit): Fixed missing meta data information of the indices attribute for the fitted output port
Time Series: Process Windows: Fixed name of Window ID attribute in meta data
Visualizations: Fixed division by zero error in Histogram plots for histograms on columns with just a single value
Visualizations: Fixed jitter not doing anything if only a single distinct value existed in the data for numerical x/y columns
Visualizations: Fixed rare error when trying to display a linear regression line on weird data sets

New in RapidMiner Studio 9.2.0 (Mar 19, 2019)

Bugfixes:
Cross Validation now applies Bessel's correction on the performance variance and standard deviation.
Connecting operators in an infinity loop no longer freezes RapidMiner Studio.
Fixed unhelpful error message: "Error while training the H2O model: {0}"
Fixed a rare bug in Log operator where a process seemingly was not stopping when it was done.
Fixed cause for sliders sometimes looking a bit broken.
Fixed rare bug in feature set navigator in Auto Model which could lead to misaligned plots and tables
Fixed rare bug in Automatic Feature Extraction which could lead to a wrong selection of final feature set
Fixed bug for data sets from read-only repositories shown in the results view and opened in Auto Model
Time Series
Fixed calculation of first quartile, median and third quartile in Extract Aggregates
Fixed a bug for all attributes selection when a filter type is selected which checks all Examples individually.
Fixed a bug in Apply Forecast operator, in case it was executed inside a parallel operator.
Fixed a bug for Windowing and Process Windows in case parameters were wrongly configured
Fixed Cross Validation returning the test example set with duplicate rows if multiple performance vectors were connected inside the cross validation. This did not affect any performance metrics.
Development:
Added utility class PersistentContentMapperStore. This class can be used to store arbitrary information in the local user cache. This can be used to store configurations of results for repository objects, or even things identified via a hash. Example usage of this are the HTML5 charts which save their configuration that way.
Added utility class ColorChooserUtilities for opening a HSL color chooser
Added DistinctColorSlider and LinearGradientColorSlider UI components where users can select and change a list of distinct colors / linear color gradients conveniently
Added ExtendedJListTransferHandler class that allows re-ordering inside a JList via drag&drop
Added new interface CleanupRequiringComponent which GUI result components can use to indicate they need to clean something up after a result has been closed by the user. It is called whenever a result tab has been closed.
Added "BETA" tag support for result visualization cards (the cards on the left shown when viewing results in the Results view). Add a gui.cards.I18N_KEY.beta = true flag to your i18n properties to indicate a result renderer as a Beta version.
Packages com.rapidminer.gui.plotter and com.rapidminer.gui.new_plotter were deprecated and will be removed in the future.
New Features:
Replaced old charts and advanced charts with new, powerful HTML5 visualizations. There are lots of new plot types and capabilities to explore! Main features:
New chart types: Step Line, Spline, Area, Step Area, Spline Area, Range (Line, Step, Column, Errorsbars), Streamgraph, Bellcurve, Funnel, Pyramid, Heatmap, Treemap, Sankey, Packed Bubble, Vector, Wordcloud
Enhanced existing chart types and new ones with features like multi-attribute selection, grouping, stacking options, inversion, and displaying as a radar chart (for select charts)
Multiple y axes supported
Added plotline support (annotated marker lines on x/y/z axes)
Chart configurations are now automatically saved. You configure a chart for your data set, close Studio, come back the next day, and when you look at the data again, the same chart you configured will be there again!
Some plots can be combined with other plots. You can add as many of those combinable plots to a single charts as you want!
Allows you to quickly select the basic settings to get started, but also to fine-tuning even minor chart details
Have multiple series in a single chart (e.g. something grouped by labels)? Try hovering over and clicking the legend items to highlight and hide the respective series!
Auto Model:
Added support for textual data
Added feature selection for clustering
Added Fast Large Margin and Multiclass Logistic Regression learners
Improved feature extraction from dates (calculate all pairwise differences and differences to today)
Added predictions vs. label chart for regression
Added correlation as performance criterion for regression
Explain predictions is now optional in Auto Model and is only automatically activated for smaller data sets
Significantly improved runtimes of Auto Model for larger data sets
New text analysis operator for feature extraction for text, adding sentiments, and language detection: Text Vectorization
New operator for assigning batch numbers to data rows: Generate Batch
Introducing the new Create ExampleSet Operator to create example sets from functions, numbers, dates, etc for quick prototyping
Cloud Connectivity:
Added connectivity to Azure Data Lake Storage (Gen 1):
Read Azure Data Lake Storage
Loop Azure Data Lake Storage
Write Azure Data Lake Storage
Time Series:
New Operator: Extract Coefficients (Polynomial Fit)
It fits a polynomial function to the time series and provides coefficients and (if selected) the discrepancy as features
It also provides the fitted function evaluated on the index values of the time series on an additional output port
New Operator: Exponential Smoothing
It smooths a time series by a factor alpha
New Operator: Lag
It lags (move) time series attributes to each other
Enhancements:
Improved CPU utilization of parallel processes (e.g. when using nested Loops).
Pre-run check and better error descriptions for Filter Examples wrong and correct predictions
Attribute selection dialogs and comboboxes now display the type (numeric, nominal, date_time) of the attribute
Attribute selection dialogs now properly sort the available attributes on the left in a human-readable way
Improved meta data generation and propagation for several source operators
Combobox popups are now as wide as their content needs them to be, regardless of the actual combobox width. This can look a bit funny sometimes, but it's much more useful to be able to actually read the contents than go for nicer looks.
Better information in Auto Model for cases and settings where longer runtimes can be expected
Dialogs opened by extensions are no longer displaying a warning icon next to them
Changed style of tutorials to incorporate RapidMiner Academy
Improved default parameters for Gradient Boosted Trees
All "Legacy Result Access" operators are now deprecated, existing processes that are still using these operators will continue to work. Please use the operators Store and Retrieve in future processes:
Use Retrieve instead of Read Model, Read Clustering, Read Weights, Read Constructions, Read Performance, Read Parameters, Read Threshold and Read.
Use Store instead of Write Model, Write Clustering, Write Weights, Write Constructions, Write Performance, Write Parameters, Write Threshold and Write.

New in RapidMiner Studio 9.1.0 (Dec 14, 2018)

New Features:
The Aggregate Operator got the percentile function where the percentile can be changed in the aggregation attributes functions list. It is possible to use an integer like 75 or a floating point value like 80.5 here. It is of course also possible to use a macro here.
Split the setting to keep operators connected upon disabling or deleting them into these settings:
Drop or bridge operator connections upon deletion
Drop, bridge or keep connections upon disabling
SSL certificates stored in .RapidMiner/cacert are now trusted on startup. See trust-certificates for more information.
Added support to open operator tutorial processes directly from the web.
Enhancements:
The "Import Data" dialog for CSV files will try to guess the best matching date format and preselect date for attributes that contain mostly matching date entries
The "Import Data" dialog for Excel files does now differentiate between date, time and datetime columns specified in Excel
Improved CSV import wizard to use the structure found in the header or starting row
Parse Numbers and the Data Import wizards now support exponents in numbers with a leading '+' for positive exponents, e.g. "5.9876E+7"
Improved Cross Validation error handling when the Performance port is not connected
The XML Panel does no longer hide default values
Split thread settings in foreground and background threads (for the currently opened process and processes running in the background, respectively)
Updated bundled Java for Windows and OS X to version 8u181. This should fix right-click issues on OS X
Added support for aggregation functions for Pivot operator and improved performance
When moving operators in the Process view, connected operators will be rearranged and moved to the right if necessary
Bugfixes:
For large ExampleSets with more than ~71.5 million rows, the result table will compress the height of each row a bit to accomodate. Data sets with more than ~86 million rows will only display the first ~86 million rows and show a warning that the rest is cut off.
Fixed an issue that could cause Studio to be stuck for up to ~2 minutes on start-up.
Fixed very rare process error when working with attribute weights.
X-Means item count of cluster model will now show the correct size.
Fixed an issue where (temporary) Access files could not be deleted in a RapidMiner process.
Development:
Added registerLanguage method to the I18N class, which allows to add new languages to the Settings->Preferences->Language selection. The i18n is picked up by providing resource bundles in the usual form of for example GUI_ja.properties and Error_ja.properties. If you want to get a list of not-yet-translated keys, add a file called translation_help.txt in your .RapidMiner folder. After you shut down Studio with your new language selected, it will write all keys for which it did not find the translation in it. This should help you identify keys that you still need to translate.
Added the OperatorPortActionRegistry to add actions to operator ports.
Added identifier for last delivering port to the IOObject's userdata via IOObject.getUserData(DeliveringPortManager.LAST_DELIVERING_PORT)
Added support for parameter dependencies and hidden state to the settings dialog.

New in RapidMiner Studio 9.0.3 (Oct 4, 2018)

New in RapidMiner Studio 9.0.2 (Sep 6, 2018)

New in RapidMiner Studio 9.0.1 (Aug 16, 2018)

New in RapidMiner Studio 9.0.0 (Aug 7, 2018)

New Features:
Added TurboPrep, your interactive data preparation in a data-centric UI
Added a new "admin configuration" feature (documentation here)
Operator Blacklisting
Extension Whitelisting
Telemetry
Studio Settings
Added new Time Series functionality
Added support for Google Cloud Storage with Read Google Storage, Write Google Storage, and Loop Google Storage operators. They work similar to their existing Amazon S3 and Azure Blob Storage counterparts.
Added new online repositories which contain up-to-date help content. These contents are used by our online educational materials.
Added concatenation function to Generate Aggregation
Enhancements:
Global Search results can now be navigated by keyboard
Operators can now be renamed by double-clicking on their name (indicated by a text cursor)
Improved operator renaming visuals when zoomed in/out of the process
Process panel in Design view can no longer be closed
Updated behavior for Result History panel outside of Result view
Uncloseable panels no longer have close buttons
Updated import wizards for Read CSV and Read Excel operators to make them consistent with the Add Data repository action
Added Remove All Breakpoints entry to Edit menu and right click context menus
A warning is shown for correlation matrices that could not be calculated
Improved the guessing for type of Quotes during CSV import
Improved the guessing on decimal separator in CSV import
Twitter operators now correctly warn about the rate limit when it is exceeded instead of throwing a generic error
Hyperlinks in process notes are now clickable and open the default browser
Repository actions that need write access are now grayed out when a read-only entry is selected
Inserting an operator via Global Search will now correctly grant focus to the Process panel, so you can immediately use the keyboard to manipulate the operator
Added workaround for a bug in the Amazon Redshift JDBC driver so that it can be used now
Saving a process in a read-only repository now offers the SaveAs dialog instead
Repository location chooser (for opening and for saving) no longer sometimes appears as a separate instance of RM Studio in the operating system taskbar
Bugfixes:
Clicking on a selected operator no longer sometimes selects an operator behind it
Fixed process panel sometimes being opened in other views
Fixed an issue where icons did not show up on Retina displays
Updated vulnerable libraries
Fixed potential UI freeze during the Import Data process
A rare error concerning parallel loops in combination with Generate Attributes was fixed
Fixed an issue that RapidMiner Studio always started in fullscreen mode on Mac OS X
Fixed results view not showing the latest result as the active tab
Development:
Added callback hook for DataImportWizardBuilder. The callback can be used to determine by the caller what should happen after the user has concluded the data import.

New in RapidMiner Studio 8.2.1.0 (Jun 28, 2018)

New Features:
Added possibility to disconnect from RapidMiner Server repositories
Enhancements:
Edit Access Rights dialog is now read-only if the user does not have enough permissions to make changes
The Generate Weight Stratification does now warn about mismatching data
Updated tutorial process for Loop Attributes
Bugfixes:
Fixed broken preview when using the Guess value types or Reload data buttons in the Import Configuration Wizard of the Read Excel and Read CSV operators, after manually changing the attribute selection or an attribute role.
Fixed a metadata problem with the Singular Value Decomposition operator showing the wrong type of preprocessing model.
Fixed a bug causing Aggregate to concatenate the same value multiple times even though only distinct was set.
It is no longer possible to toggle breakpoints if Process panel is not visible.
Write CSV is no longer writing Integer values as floating points.
Updated mode aggregation function of Aggregrate to take missing values into account.
Remember can now be used in every iteration of a parallel operator, instead of only the last. No execution order is guaranteed.
The New Revision server repository action does no longer block the UI.
Fixed bug preventing SVM Kernel Scatter Plot from displaying certain variables.
The macro command line argument -M does now work as expected when passed to the rapidminer-batch.bat launcher.
Fixed rare bug that could occur when looking at a subprocess of a parallel operator while zoomed out and trying to run the process.
Fixed pass through port of the Correlation Matrix operator (returned a subset of the input for some data sets).
Fixed missing visual indicator in the top bar for the currently selected view when resizing RM Studio horizontally.
Fixed spelling error in Direct Marketing template.
Fixed spelling error for mikro/makro.
Fixed a problem using undo/redo during a tutorial.
Fixed a rare bug that might occur on restoring a process on startup.
Fixed uncommon bug where Views will break when switching too fast between them.
Fixed bug making Apply Threshold use the wrong mapping.

New in RapidMiner Studio 8.2.0.0 (May 8, 2018)

Enhancements:
Double-click on an unconnected operator port will connect it to a matching output port of the process.
The menu View -> Show Panel is now scrollable.
Updated visualization of tutorial's next button to go to next tutorial or back to tutorial overview when reaching the end of a tutorial or a chapter respectively.
Removed search button from search bar and changed result dialog to open with one-click logic.
Creating a RapidMiner Server repository no longer stores the credentials automatically. However, if desired you can still do so by selecting the "Remember Password" checkbox when creating the repository.
Panels now always have proper tooltips.
Improved visualization of nested Operators.
Added primary parameter mechanic to some Operators; double clicking an Operator now opens the editor of a primary parameter. This also works for operators that have subprocesses. In that case, pressing the Alt-key while double-clicking activates the primary parameter.
Quickfixes now can be directly accessed after a process run fails from the error bubble.
Improved performance of FP Growth and added support for additional input formats.
The status bar (found at the very bottom of RM Studio) now more precisely displays possible actions when editing a process.
Pressing the arrow keys in the process panel when no operators are selected will now select the first operator.
Bugfixes:
Parallel operators now produce identical results when running in parallel and when running sequentially
Removed several sources for redundant undo steps
Fixed a bug that could lead to incomplete output of Execute Program
Fixed and improved on generic process runtime errors
Fixed erratic behaviour of EMClusterer
Date to Nominal does no longer remove the role of the selected attribute
Fixed a bug where results from Data to Similarity Data could not be processed further
Fixed an issue that could result in the "Drag here" annotation being shown in the process all the time when using the Global Search
Fixed a bug that allowed operators to connect to themselves
Fixed Web Analytics template

New in RapidMiner Studio 8.1.3.0 (Apr 19, 2018)

New in RapidMiner Studio 8.1.1.0 (Mar 7, 2018)

New in RapidMiner Studio 8.1.0.0 (Feb 6, 2018)

New Features:
Added Auto Model feature, a new working mode for rapid creation, comparison, and exploration of new models. It can be found as a new view at the top.
Added a powerful global search functionality which can be found in the top-right corner and activated via Ctrl+F shortcut. You can currently search for operators, repository contents, UI actions, and Marketplace content. See the documentation for more information if you are interested in more complex and powerful search queries (e.g. finding data/models that contain a specific attribute, or were last modified before a certain date, etc).
Enhancements:
New Process Templates upgraded to use the latest operator versions.
Read Excel now allows sheet selection by name.
Read CSV, Read XML and Read Excel have a new expert parameter read all values as polynominal, which allows the user to disable type guessing.
Hide passwords in the Password Manager dialog and store them with a stronger encryption.
Seach Twitter and Get Twitter User Statuses added support for 280-character tweets.
All Twitter operators moved from numerical to nominal attributes for user and status IDs.
Made the Views display at the top more dynamic on resizing to prevent squashed GUI elements for low(er) resolutions and to show more views for high(er) resolutions. To achieve this, both the Undo and Redo buttons for process editing were removed. You can still undo/redo via the top Edit menu, or by pressing Ctrl+Z/Ctrl+Y, or even via the new global search by searching for Undo or Redo.
Bugfixes:
Secured XML parsing against XXE vulnerability
Fixed a rare error when logging inside parallel operators
Fixed problem that caused Parse Numbers to fail if input was an empty value
Fixed a rare error when running Join, Replace Missing Values, or Add inside a parallel loop
Fixed handling of polynominal attributes in Apply Model when applying a Cluster Model
Updated Regularized/Linear/Quadratic Discriminant Analysis to avoid uncaught errors and give more information if an error occurs
Fixed uncaught Runtime Exception when using Loop Parameters and Optimize Parameters (Grid) with log_all_criteria
Fixed issues with duplicated or missing entries, as well as missing groups in the Manage Connections dialog
Refreshing folders in a RapidMiner Server repository no longer blocks the entire Studio interface
Renaming entries in a RapidMiner Server repository no longer blocks the entire Studio interface
Pressing Ctrl-A in an empty process no longer makes the process parameters disappear
Hotkeys for view switches now work properly from all views
Upgraded MSSQL JDBC driver to version 4.2
Upgraded PostGreSQL JDBC driver to version 42.2.1
Development:
The Global Search feature is highly flexible and open to extensions - look at com.rapidminer.search.GlobalSearchable and com.rapidminer.gui.search.GlobalSearchableGUIProvider to get started!
Unsigned 3rd party extensions can now call ParameterService#setParameterValue(String, String) without causing a SecurityException
Please note: We have accumulated lots of outdated code over the years. Anything that is annotated with @Deprecated will be removed at some point in the future. Removal will start with RapidMiner Studio 9.0, so please prepare your extensions by not using any deprecated code anymore. JavaDoc will help guide you to replacement classes/interfaces/methods.

New in RapidMiner Studio 8.0.1.0 (Dec 28, 2017)

New in RapidMiner Studio 7.6.1 (Sep 6, 2017)

New in RapidMiner Studio 7.6.0 (Sep 6, 2017)

New features:
Sending notification emails can now be configured in the preferences to make use of all modern connection security and authentication mechanisms like TLS 1.2 + PFS
Enhancements:
The sender of notification emails can now be configured in the preferences
Licenses are now valid for the full last day until midnight
Improved handling of infeasible parameter values for Self-Organizing Map
Changed default sampling type parameter for Validation operators to automatic
Write Message now has a parameter option to append to existing files instead of overwriting them
Logistic Regression and Generalized Linear Model learners now have a threshold output where they deliver a threshold value optimized for maximal F-measure
Improved handling of missing and infinite values for Normalize
Improved handling of missing or broken compatibility numbers in the process xml
Made behavior of add as label parameter consistent for all cluster operators
Improved checks for empty example sets in cluster operators
Improved shown capabilities for cluster operators and added quick fixes for inconsistent parameter selection
Reduced some internal logging by moving it behind the debug flag which can be activated in the preferences
Updated Java for Windows and Mac OS X to version 8u141
Bugfixes:
Fixed reproducibility of results when concurrent operators (e.g. Loops) are involved.
Changing the default connection timeout setting in the preferences now takes effect immediately.
Sending notification emails now uses the default connection timeout.
Fixed metadata of Flatten Clustering.
Fixed behavior of Loop Parameter inside parallel loops.
Removed unnecessary warning for clustering operators with nominal input data
Generate Weights (LPR) and Local Polynomial Regression now provide additional kernel parameters for the numerical measure KernelEuclideanDistance instead of failing
Fixed Gradient Boosted Trees renderer, it no longer shows wrong edge labels and incorrect value sets
Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning operators no longer crash the software if certain temporary folder permissions are missing
Logistic Regression and Generalized Linear Model learners now use 0.5 as the threshold as other binominal learners
Fixed behavior of Loop Attributes when only one attribute is selected for parallel execution
Fixed Average for Performance inputs that contain AUC
Fixed side-effects of Apply Threshold in other branches of the process
Fixed rare crash in Create Association Rules under certain parameter configurations

New in RapidMiner Studio 7.5.001 (May 10, 2017)

New in RapidMiner Studio 7.5.0 (May 10, 2017)

New features:
The first iteration of new data core that manages data sets in a much more efficient way has arrived! This results in both better performance and less memory usage for the vast majority of operators.
Added support for Microsoft Azure Blob Storage with Read Azure Blob Storage, Write Azure Blob Storage, and Loop Azure Blob Storage operators. They work exactly like their existing Amazon S3 counterparts.
Added support for Amazon Key Management Service (AWS KMS) for all Amazon S3 operators. You can now optionally add an encryption key id to your Amazon S3 connection to decrypt/encrypt files when working with Amazon S3.
Added a new mechanism to provide help, advice messages, and even important announcements to the user.
Enhancements:
Completely revised result graph interaction, presentation, and visualization (e.g. decision trees, clusters, etc.).
It is now possible to highlight the path to a node of a decision tree in the Results view.
Cluster nodes in the Results view are now scaled according to their relative size.
Undo and redo functionality is now much more intuitive when working with the process canvas. It will now not only restore the process state, but also restore canvas location, operator selection, and the zoom level.
Navigating up and down through subprocesses in the UI is now more user friendly. When entering a subprocess and later going back up, you will see the same part of the process you were looking at before entering the subprocess.
Remove Duplicates now features a new output port called duplicates which returns the examples identified as duplicates.
Fixed memory leaks for Handle Exception, Select Subprocess, and Branch.
Execute Script now caches the parsed scripts for significantly faster execution, especially inside Loop operators or other highly concurrent environments. General performance of script execution has also been improved. Also added operator tags and added a default example script to make usage of the operator easier. Last but not least, error messages now include the causing stacktrace for easier debugging.
Improved AutoMLP performance.
Loading context data shows progress now.
Added new global process macro: %{process_start} which captures the timestamp when a process was started.
It is now possible to close result tabs with the same shortcut as in your web browser: ctrl+w (command+w on OS X)
Added new tutorials for RapidMiner Server and RapidMiner Radoop.
Added some more usable date and datetime format defaults to choose from when importing data.
Added folder buildingblocks in the .RapidMiner directory which will also be searched for .buildingblock files on startup.
The dialog letting you know about an available RapidMiner Studio update now also displays the version number of the update.
Bugfixes:
Fixed a bug making all parallel Loop operators incredibly resource hungry when running hundreds of thousands of iterations
Error bubbles indicating the source of an error in the process now work correctly in nested loops again
Removed empty confidence columns when applying the model from Linear Discriminant Analysis, Quadratic Discriminant Analysis, Regularized Discriminant Analysis, Single Rule Induction, Subgroup Discovery
Regular Discriminant Analysis no longer ignores the alpha parameter
The median for Aggregate now takes the middle point of both middle values in case of an even number of values
Fixed error that made operators which use a connection (e.g. Read Salesforce) unusable after importing a process
Fixed layout of marketplace search link in operator panel
Fixed broken dialog title for package download error
Fixed broken configurable entries due to unnecessary escaping
Fixed delay when trying to view decision trees in the Results view
Fixed major memory leak for Loop, Loop Values, Loop Attributes, and Loop Files
Fixed some operator parameter help tooltips being cut off
Fixed behaviour of Fast Large Margin if learned with bias (parameter)
Fixed pdf/svg image export of the scatter matrix chart
Fixed some spelling errors
Fixed Linear Regression calculation in case use bias is not selected
Fixed confidences of Ada Boost in border cases
Logistic Regression and Generalized Linear Model no longer allow p-value calculation without adding intercept
Fixed problem when trying to delete extensions of which more than one version was installed
Developers:
Concurrency API introduced with 7.4.0 is now available for unsigned extensions

New in RapidMiner Studio 7.4.0 (May 10, 2017)

New features:
Processes can now be executed in the background of Studio while you work on a different process in the user interface. This feature is only available for users with a Large license.
New parallelized Loop operator.
New parallelized Loop Values operator.
New parallelized Loop Attributes operator.
New parallelized Loop Files operator.
Repository entries can now be sorted by date.
Users with Large licenses can now grant additional permissions to unsigned extensions.
Enhancements:
Added a few new templates which can be used as a starting point when creating a new process.
Improved performance of Polynominal Regression.
Improved performance of Linear Regression.
Improved error message in case a selected input attribute for an operator is of the wrong type.
Improved operator progress for Generate Massive Data and several segmentation operators.
Improved performance of LibSVM and Fast Large Margin when sparse input data is not in sparse data format.
Small performance improvements for several operators that read parameters unnecessarily often.
Performance improvement for operators that iterate over all attributes.
Optimize by Generation (Evolutionary Aggregation) no longer shows unnecessary popup.
Repository entry sorting by name now ignores capitalization.
Users with Large licenses can now grant additional permissions to unsigned extensions via a new setting in the Start-up tab in the preferences.
The Log table in the results panel now also uses the new UI look and feel.
Bugfixes:
Fixed useless cipher error when starting Studio for the very first time.
Fixed swapped title in models of Linear Discriminant Analysis and Quadratic Discriminant Analysis.
Fixed side-effects of application of preprocessing models in other branches of the process.
Fixed side-effects of Impute Missing Values in other branches of the process.
Fixed wrong behavior when dismissing confirmation dialog asking for interruption of currently running process.
Fixed Delete File not being able to handle relative paths.
Meta data calculation of Generate Nominal Data can no longer cause freezing.
Optimize by Generation (Evolutionary Aggregation) no longer does one iteration too much.
Fixed Number of threads setting having no effect for Decision Tree and Random Forest if it was set to 1 and then increased again.
Fixed rare error that could occur when displaying a grouped model in the results view.
Developers:
Added a temporary API for operators which should run in a parallelized fashion. Use the com.rapidminer.studio.concurrency.internal.ConcurrencyExecutionServiceProvider to access it.

New in RapidMiner Studio 7.3.1 (Jan 10, 2017)

New in RapidMiner Studio 7.3.0 (Jan 10, 2017)

Enhancements:
New parallel Cross Validation operator replaces X-Validation, Batch X-Validation, and X-Prediction.
Operator search now also searches for matching Marketplace extensions
Greatly improved Proxy UI and logic
Logistic Regression, Generalized Linear Model and Gradient Boosted Trees now return Attribute Weights output as well
Added reproducible parameter to Logistic Regression, Generalized Linear Model and Gradient Boosted Trees. If checked, the result is guaranteed to be the same, because the parallelization level is fixed.
Improved sorting for repository entries.
Performance improvement for Rule Induction and Perceptron operators.
Improved high DPI support.
Improved operator progress for Apply Model and Logistic Regression (SVM).
Improved welcome dialog layout.
Bug fixes:
Fixed NullPointerException in Logistic Regression and Generalized Linear Model with compute p-values on and solver set to AUTO on an input with large number of nominal values
Changed the default of the max_w2 parameter of Deep Learning to 10, as the operator help describes; it also became a non-advanced parameter
Fixed some minor tutorial inconsistencies
If there is a security error, Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning operators can recover without Studio / Server restart
Input data rebalancing in Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning no longer depends on the number of cores but the number of threads (configurable)
Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning operators are now loaded even if javafx package is missing from the Java Runtime Environment
Fixed multiple problems with the GSP operator
Operator progress now vanishes if operator is successfully stopped
Fixed operator progress animation being stuck sometimes
Fixed import excel data UI issues on Mac OS X
Fixed that in-Hadoop scoring of Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning models in Rapidminer Radoop no longer logs something for each row (leads to significant performance improvement)
Development:
Added a centralized API for data table creation: From now on a new ExampleSet should be created via an ExampleSetBuilder provided by the ExampleSets class instead of using MemoryExampleTable
Tweaked project structure for the open source core. This does not affect the functionality of RapidMiner Studio.

New in RapidMiner Studio 7.2.3 (Jan 10, 2017)

New in RapidMiner Studio 7.2.2 (Jan 10, 2017)

New in RapidMiner Studio 7.2.1 (Jan 10, 2017)

New in RapidMiner Studio 7.2.0 (Jan 10, 2017)

Enhancements:
Added search field for preferences dialog to help find specific settings.
Massive performance improvements for PostgreSQL connections.
Performance improvements for Write Excel operator.
Performance improvements for Nominal to Numerical operator.
Performance improvements for Set Minus operator.
Minor performance improvements for Join operator.
Minor performance improvements for FP Growth operator.
Errors while running a process now display a more meaningful headline in many cases.
Improved display of real numbers very close to integers. Increasing the fraction digits setting to 16 and higher will now lower eagerness to round extremely small numbers.
Deletion of repository items does not longer block the user interface.
Improved Data Editor handling for date cells.
Improved Data Editor appearance to match the Results view data display.
Data Editor dialogues will now be displayed relative to the application screen.
Improved progress display for several operators.
Improved loading and display of operator help.
Extensions will be sorted by name in the About Installed Extension menu and Manage Extension dialog.
Improved feedback in case we cannot open a URL in your browser.
Improved progress feedback for one-click extension installation via Marketplace web site.
Uninstaller.exe is now signed to prevent false-positives from some virus scanners.
Bugfixes:
Fixed some problems during CSV import if the file contents were not well-formed
Logistic loss of Performance (Classification) now returns the correct value (and not infinite)
Loop operator no longer ignores the timeout parameter
Loop Until operator now only shows timeout parameter when limit time parameter is checked
Removed weighted examples capability of k-NN operator info
Instead of failing, the Write Database operator throws now a UserError if the received example set contains no attributes
Write Excel will now cut off too long nominal values (limit is 32,767 characters) instead of producing broken Excel files
Generate Data now makes clear which functions support bounds and provide other parameters for those that do not support them
Changed the order of the parameters of Reorder Attributes to better reflect the dependencies between them
Fast Large Margin now throws meaningful error message when there is only one class in the training example set
Date to Nominal is no longer applied incorrectly to attributes with non-date types, but throws an error instead
The query builder UI of Read Database now quotes table and attribute names correctly. This fixes e.g. issues with running SQL queries on HSQLDB
Apply Model no longer accepts application parameters unsupported by the given model
Explicit design-time and execution-time error is shown when date format is invalid in Date to Nominal or Nominal to Date
When parsing numbers, lowercase "e" is now also accepted as exponent separator in scientific notation
Renaming or moving the currently open process in the Repository panel now updates the current path, so saving does not lead to process duplication
Fixed performance problem that could occur after inspecting graphs or certain charts
Pie, Pie 3D and Ring charts are now always circular and not stretched into an elliptical shape
Fixed possible GUI corruption in case a warning bubble appeared when a process was started from the Results view
Inspecting Association Rules in the Results view no longer causes random errors sometimes
The result table no longer cuts off the last character if the nominal value ends with a newline
Removed obsolete setting to specify whether extensions should be installed locally or globally
Shrink process works now when zoomed out
Stopping a process during the last operator no longer still produces results sometimes
Fixed rare StackOverflowError during advanced chart usage
Fixed misleading error when trying to connect to a very old RapidMiner Server
Fixed duplication of problems in the Problems panel when copying & pasting an operator
Fixed error when trying to paste operators into a process by using the global Edit menu

New in RapidMiner Studio 7.1.1 (Jan 10, 2017)

New in RapidMiner Studio 7.1.0 (Jan 10, 2017)

New in RapidMiner Studio 7.0.1 (Jan 10, 2017)

New in RapidMiner Studio 7.0.0 (Jan 10, 2017)

Enhancements:
Overhauled the entire user interface
Added new unified data import mechanism. So far supported: Excel, CSV, databases, and binary files.
New tutorials mechanism
New process template mechanism
Re-arranged operators in Operators panel to better fit analytic workflows
Renamed 'View' to 'Panel' and 'Perspective' to 'View'
New start-up dialog which replaces the old Welcome perspective
Moved "Synchronize Meta Data" toggle button from process panel to 'Process' menu in the top menu bar
Drastically improved performance when browsing data tables in the result view
Added support for high-dpi icons on OS X Retina Displays
Improved Operator documentation layout to increase readability
Improved Random Forest classification performance by adding a new voting mechanism
The "no result port connected" warning can now be turned off
Added option to specify a background image for a process when right-clicking on the process canvas
Changed keyboard layout: escape navigates to the parent process (if any), backspace deletes selected operators and annotations on OS X.
Added a pop-up menu item to the operator execution order mode as an alternative way to leave it
Dummy operators (e.g. from missing extensions) are now colored in red
Added a parameter for quickly downloading missing extensions for Dummy operators
The 'Generate Aggregation' operator now shows an error if the attribute filter does not return any attributes by default (can be deactivated)
Bug fixes:
Passwords for proxy server are stored encrypted now
Linear Regression now shows correct errors for coefficient estimation
Fixed an error in the ANOVA calculation that returned wrong values if the numbers of groups were equal to the number of examples
Association rules in the Results view are now displayed correctly filtered without user input
Selecting operators in the process panel no longer causes UI freezes
Rearrange operators will create only one undo step now
Fixed misbehavior of the undo action
Fixed a rare issue which prevented opening the 'Manage Database Connections' dialog
Process 'Tree' panel selection will once again navigate the main 'Process' panel
Fixed crash while starting Studio and immediately locking the computer on Windows
The view files in the .RapidMiner folder will no longer become huge due to duplicate information
Fixed re-attaching floating panels via pop-up menu
Fixed restoring hidden/detached panels sometimes vanishing
Fixed settings dialog layout when opened multiple times
Notification pop-ups (e.g. when adding unsupported operators) are now always in the correct location
Fixed parameter warning/error hint duplication when resizing Studio while the warning is being displayed
Fixed dialog size for long error messages
Fixed display of some characters on Windows 8/10 and Mac

New in RapidMiner Studio 6.5.2 (Jan 10, 2017)

New in RapidMiner Studio 6.5.1 (Jan 10, 2017)

New in RapidMiner Studio 6.5.0 (Jan 10, 2017)

Enhancements:
Completely overhauled problem and error notifications when running processes
All Learner Models will show an error rather than log a warning when applied on incompatible data
Repositories are now sorted by type and name
Improved churn template when using custom data
Improved performance when navigating RapidMiner Server repositories over a slow connection
Execute Process nesting depth is now limited to prevent endless loops; the maximum depth can be tweaked in the preferences
Added Netezza 7.0 JDBC support
Added a new "Move into new Subprocess" action that allows moving a group of selected operators into a Subprocess operator
Standard dialogs now support hyperlinks in the description
API: ParameterTypeText is now able to handle template text that is shown in the TextPropertyDialog if no text is set
API: Removed SassyReader and kdb dependencies, increased SLF4J API dependency to version 1.7.12
Bug fixes:
BUGFIX: Fixed possible startup problems when the _JAVA_OPTIONS environment variable is set
BUGFIX: Fixed rare cases of Studio becoming unresponsive because dialogs opened behind other dialogs
BUGFIX: When opening a process from the Server Processes view, confirmation is now required before an unsaved process is discarded
BUGFIX: Fixed rare problem when trying to save preferences
BUGFIX: Fixed some copy and paste problems of process notes
BUGFIX: Fixed Generate Data performance when selecting gaussian mixture clusters as the target function
BUGFIX: Fixed several problems when both Process and XML views were open and visible at the same time
BUGFIX: "Sample (Bootstrapping)" now duplicates examples when upsampling data
BUGFIX: Averaging of Performance Vectors can now handle additional or fewer classes after the first iteration
BUGFIX: Aggregate operator now supports non-alphanumerical attribute names for grouping
BUGFIX: Execution order is now up-to-date even if process validation has not finished
BUGIFX: Fixed computation of binary classification criteria (performance) for remapped binominal labels
BUGFIX: Decision Tree and Random Forest can now handle an unbounded number of different label values
BUGFIX: 'Principal Components Analysis', 'Generalized Hebbian Algorithm', 'Independent Component Analysis' or 'Principal Component Analysis (Kernel)' in combination with Apply Model no longer modify the original example set
BUGFIX: Decision Tree(rule) model edge labels now correctly display dates instead of Unix timestamps in the Results perspective
BUGFIX: Read Access and Write Access now work with 64-bit Java and Java 8
BUGFIX: Log operator no longer silently fails if duplicate column names have been entered
BUGFIX: Fixed rare case where the Chart view in the Results perspective was broken
BUGFIX: Fixed rare case where the date format field vanished in data import dialogs
BUGFIX: Context data is no longer loaded when the input port is not connected
BUGFIX: Generate Attributes no longer forgets roles in metadata if an attribute is overwritten
BUGFIX: Read Excel, Read CSV, and Read XML can now be stopped
BUGFIX: Metadata of Execute Process operators is no longer calculated if an endless process loop is suspected
BUGFIX: Loop Files operator now shows an error message if the directory is invalid or the user has insufficient privileges
BUGFIX: Fixed an error that occurred in Write Database with an empty JNDI name
BUGFIX: Fixed problems with reconnecting operators after the 'Replace Operator' action
BUGFIX: Fixed displayed number of combinations for integer parameters in Optimize Parameters (Grid)
BUGFIX: Fixed jumping to correct subprocess when clicking on the cause of a failed process in the error dialog
BUGFIX: Generate Attributes can now be stopped
BUGFIX: Fixed a bug that occurred when trying to install a non-existent extension via one-click installation
BUGFIX: Fixed reading of XLSX files with cells that contain mixed font formats
BUGFIX: Now max 100 attributes are shown in regex dialogs to prevent GUI freezes
BUGFIX: Fixed a rare bug that occurred while refreshing a remote repository with a remote database

New in RapidMiner Studio 6.4 (Jan 10, 2017)

Enhancements:
Improved Process history view
Connections to RapidMiner Server no longer require equal license editions for Studio and Server. For example, professional-level RapidMiner Studio can now connect to Enterprise-level RapidMiner Server.
Improved visual feedback for port and connection interactions in the Process view
Drastically improved Process view performance
Cleaned up right-click context menu in the Process view
RapidMiner Server connections are now editable in RapidMiner Studio
Breakpoints in subprocesses are now indicated in the top right corner of the Process view
Dragging multiple repository entries into a process is now possible
Updated keyboard shortcuts and mouse handling improves Mac user experience
Ctrl + Backspace is now available for text inputs and deletes an entire word instead of a single character
On opening, problem display only occurs if a critical problem was detected
In Select Attribute operators, numeric conditions now ignore blank spaces
Improved error message shown when class weights are specified for classes that do not exist
Added display of release platform to the About screen
Unmanaged extensions are now also loaded from ~/.RapidMiner/extensions if not specified otherwise in Preferences
All sample processes have been updated and improved to be compatible with the current version
Added new sampling type of automatic to the X-Validation operator
Operator search only expands groups with hits inside
Operator search is case sensitive when search term starts with an upper case letter
API: Added draw decorator and event hooks for the Process view. See ProcessRendererView#addDrawDecorator() and ProcessRendererView#addEventDecorator().
Bug fixes:
BUGFIX: Safemode dialog on startup is no longer sometimes hidden behind other windows
BUGFIX: Update Database now closes database connections after finishing
BUGFIX: Restarting after activating a license with more memory now correctly increases available memory on Windows
BUGFIX: A more meaningful error message is displayed when an invalid numeric condition is entered as a parameter
BUGFIX: Adding new database drivers via the Manage Database Drivers dialog no longer requires a restart
BUGFIX: Fixed rare error that could prevent the Manage Database Connections dialog from opening
BUGFIX: Fixed broken parameter help content for some operator parameters
BUGFIX: Calculation of a SOM-plot can now be cancelled
BUGFIX: It is no longer possible to drag operators out of the Process view
BUGFIX: Fixed rare error that could occur during automatic operator port connection
BUGFIX: Scrolling speed in the Process view is increased
BUGFIX: Fixed duplicate entry error in the History view
BUGFIX: Fixed Guess Types operator which occasionally took only the last numerical value into account
BUGFIX: A more meaningful error message is displayed when using Add generated primary keys for writing to MSSQL databases
BUGFIX: Fixed broken Execute Process operator help
BUGFIX: Disabled zoom functionality in Histogram Charts
BUGFIX: A more meaningful error message is displayed when using the Hyper Hyper operator with invalid input
BUGFIX: Principal Component Analysis operator works when applied on special attributes with missing values
BUGFIX: Fixed Read Excel operator encoding errors on Windows 8.1
BUGFIX: In Excel import wizard, wrong-typed values are parsed as missing instead of causing an error
BUGFIX: Removed unused parameter attribute type from Discretize by User Specification operator
BUGFIX: Fixed some broken templates and sample processes
BUGFIX: Clustering models now work with special attributes that contain missing values
BUGFIX: K-Medoids operator now always uses the selected measure type
BUGFIX: Fixed rare cases of broken standard coefficients for Linear Regression operator
BUGFIX: Right-clicking an operator now selects it before opening the popup menu (Linux/Mac)
BUGFIX: When installing extensions from Marketplace, dependencies are only added if not yet installed
BUGFIX: Marketplace dialogs now always open in the correct order
BUGFIX: The date functions of Generate Attributes operator now add correct metadata for new attributes
BUGFIX: Operator text parameter dialogs (e.g., the SQL query dialog) can now be closed by pressing Ctrl + Enter
BUGFIX: The log level of the Log view is now correctly restored on each start

New in RapidMiner Studio 6.3 (Jan 10, 2017)

New in RapidMiner Studio 6.2 (Jan 10, 2017)

Enhancements:
Added operators 'Publish to App' and 'Recall from App' and a new view 'App Objects' for RapidMiner Server App manipulations
Resizing the attribute name column in the Statistics view of process results is now possible
New processes can now be saved via save button or ctrl+s
Improved error messages for broken custom filters in the 'Filter Examples' operator
Improved error message when selecting special attributes in an operator despite special attributes not being included
Show Git revision of RapidMiner Studio release in About window
Improved speed and behavior of 'Decision Tree' and 'Random Forest' operators
API: Introduced AbstractConfigurator which deprecates the Configurator class. The AbstractConfigurator improves parameter dependency handling for Configurables
API: removed Encog dependency and all deprecated classes that used Encog
API: Added capability to allow parallel processing inside operators
Bug fixes:
BUGFIX: Fixes problems with single parameter selection for several Java implementations
BUGFIX: Fixed opening of stored results via the result history
BUGFIX: Operator port tooltips should no longer cover the port
BUGFIX: Charts should now display 'Missing' instead of '1.1.1970' for missing values in date attributes
BUGFIX: 'Update Database' should throw a more reasonable error message in case the database user lacks permission
BUGFIX: 'Neural Net' operator works again when applied on special attributes with missing values
BUGFIX: 'Neural Net' can no longer be applied on incompatible data
BUGFIX: The expression parser function round() now returns a missing value instead of 0 when applied on a missing value
BUGFIX: 'Sample (Bootstrapping)' operator now throws a reasonable error message in case the input example set is empty
BUGFIX: Moving colors in the color scheme dialog of Advanced Charts does not save duplicates anymore
BUGFIX: Fixed a bug which occurred when an optional password field was left empty
BUGFIX: Fixed overwriting an already existing file in Import Binary File Wizard
BUGFIX: Fixed a UI problem that occurred when a Collection with empty ExampleSets was displayed
BUGFIX: Fixed operator tree display in log view which is shown in case of a process error

New in RapidMiner Studio 6.1 (Jan 10, 2017)

Enhancements:
Overhauled Repositories view: Now multiple elements can be selected, copied, moved and deleted at the same time
Completely revised preferences dialog to make customization of RapidMiner Studio more accessible
Drastically sped up Log view for larger logs
Improved startup code to reduce launch problems. Also memory settings are now based on the actual free memory when starting for Win32 versions. Furthermore added property in 'System' tab in the preferences where the maximum amount of memory for RM Studio can be configured
Improved SQL editor dialog responsiveness
It is now possible to ignore meta data for the 'Filter Examples' GUI
'Weight by' operators: The default value of the parameter normalize weights is now false
API: Added support for parameter dependencies in the Configurable framework (see Configurator#getParameterHandler())
API: Added operator parameter type which can display a file chooser for arbitrary remote file systems (see ParameterTypeRemoteFile)
API: Added greater control over preferences internationalization and layout (see SettingsDialog)
Bug fixes:
BUGFIX: Results containing missing values are sorted correctly
BUGFIX: Update Database now throws a meaningful error when the input example set contains no attributes
BUGFIX: Improved error message when applying a PCA model to incompatible data
BUGFIX: More meaningful error message when a mandatory attribute is not selected
BUGFIX: Loop/Optimize parameters are not longer dismissed if selection changes
BUGFIX: Distribution Models will no longer be able to be applied on subsets of the training set or sets with same name but other type
BUGFIX: Log Operator now uses modern UI to show the result
BUGFIX: Fixed Linear Regression matrix calculation corner cases which could lead to missing values for standard error, t-stat, and p-value
BUGFIX: Fixed an issue that caused Top Down Clustering to fail
BUGFIX: Replace (Dictionary) maps each value only once
BUGFIX: Fixed an issue that sometimes caused data in the results perspective to be shown with a null source

New in RapidMiner Studio 6.0.8 (Jan 10, 2017)

New in RapidMiner Studio 6.0.7 (Jan 10, 2017)

New in RapidMiner Studio 6.0.6 (Jan 10, 2017)

New in RapidMiner Studio 6.0.5 (Jan 10, 2017)

Enhancements:
Improved copy and paste functionality of the process editor
Added new logging mechanism which can also be used by extensions to display their own logs in the default log view
Added parameter to Parse Numbers operator to show an error message or use missing values if a value can't be parsed
On lower screen resolutions smaller plot preview icons will be used
Aggregate operator throws an error when the example set does not contain attributes selected by the parameter "group by"
Improved the ability to stop the process while executing a Join operator
Ports of disabled operators are now highlighted to indicate that interaction is possible
Loop/Optimize Parameters GUI now automatically selects newly added parameter
Refreshing a repository folder is now possible regardless whether a folder or a data entry is selected
New chart type added: Web
Improved tooltip behavior
Improved resizing of subprocesses
Added parameter to Loop/Optimize Parameters which specifies how errors occurring in the inner process should be handled
Switching perspectives now remembers focused tabs and the position of all scroll bars
Bug fixes:
BUGFIX: Data readers will no longer automatically choose binominal as the value type to avoid import failures
BUGFIX: Saving a process can no longer freeze the user interface
BUGFIX: Storing/Reading models in XML representation works again when executing the process on RapidMiner Server
BUGFIX: Pasting process xml into the process view directly no longer messes up the layout and the connections
BUGFIX: Execute Process: Number of ports shown by operator matches ports used by embedded process.
BUGFIX: Weighting Operators which require a label attribute now throw an error if no label is present
BUGFIX: Superset and Union operators now fail with a better error message if the special attributes do not match
BUGFIX: macro() can now be used in the expression condition at Branch
BUGFIX: Loop Repository: using the parent folder name as filtered string does not throw an error anymore
BUGFIX: The Cumulative Variance plot for the PCA now displays the correct values
BUGFIX: Excel Operators show a human readable Error if wrong sheet is selected
BUGFIX: Aggregate now detects DATE_TIME in MetaData
BUGFIX: Predefined operator macros are working again
BUGFIX: Data import operators of extensions are no longer sometimes displayed as disabled for some licenses
BUGFIX: Use correct file filter for Loop Zip-File Entries file chooser
BUGFIX: Read and Update Database operators can now be stopped
BUGFIX: Generate Macro will no longer add unnecessary zeros to the end of numbers
BUGFIX: Reduced logging at Generate Function Set if NaN was generated
BUGFIX: Operators which provide a subset selection now show an error if selected attributes are not present
BUGFIX: Correct display of operator status when starting a process
BUGFIX: Catch errors when trying to parse empty strings to numbers
BUGFIX: Remember/Recall operators now use a more sensible default for the io object type
BUGFIX: Fixed endless loop in Logistic Regression
BUGFIX: Generate Data can now be stopped
BUGFIX: Import wizards now ignore the check for duplicate names regarding columns that are disabled
BUGFIX: Linear, Quadratic and Regularized Discriminant Analysis can now be stopped
BUGFIX: K-Means, Linear Regression and SVM now ignore missing values in special attributes, except for the label
BUGFIX: Generate Nominal Data operator can now be stopped
BUGFIX: The arrange operators function no longer adds horizontal space between operators unnecessarily
BUGFIX: Fixed Filter Examples operator failing on date filters for dates before 1970
BUGFIX: The Split operator correctly outputs missing values if the input value was missing
BUGFIX: The Replace (Dictionary) operator now displays a meaningful error message if the to or from parameters are left undefined
BUGFIX: The displayed error, when using an invalid expression in the Branch operator, now contains a link to the operator
BUGFIX: Fixed a rare error while loading extensions on startup
BUGFIX: RapidMiner remembers all tabs that are visible and keeps them focused between perspective switches
BUGFIX: Tooltips in New Operator Dialog are now correctly formatted
BUGFIX: The Loop Repository operator now shows an error when the selected repository location does not exist
BUGFIX: Building Block Numerical X-Validation now defaults to shuffled sampling
BUGFIX: Improved error handling when pasting an unsupported file into the process editor
BUGFIX: More meaningful error message when a wrong attribute is selected in some operators

New in RapidMiner Studio 6.0.3 (Jan 10, 2017)

Enhancements:
Added new dialog to create and manage various connections
Tasks (shown in the lower right corner) should no longer unintentionally block each other
Process result display creation should be much faster now
Added attribute statistics when hovering over a table header in the example set result view.
New order for special attributes in data and meta data result view
Execute SQL dialog now has syntax highlight and content assist (ctrl+space)
Extension can now declare more than one dependency
Added 'unmatched example set' output port to Filter Examples operator which outputs all examples that did not match the specified condition
Added parameter to De-Normalize operator to control handling of missing attributes
Added parameter to Execute Process which allows to control if process should fail if you define a macro which is not defined in the context of the embedded process
Added GUI parameter rapidminer.gui.plotter.default.maximum which defines the maximum size of an example set for which a default plot will be created
Bug fixes:
BUGFIX: Vote operator should be functional again
BUGFIX: Excel 2007 import no longer fails when the sheet contains nominal formula values
BUGFIX: Custom filters for the Filter Examples operator should no longer crash when selecting the 'matches' filter on empty input
BUGFIX: FindThreshold operator now throws error if the confidence role has the wrong name or does not exist
BUGFIX: Fixed bug preventing storage of Lift charts in the repository
BUGFIX: Fixed bug in expression parser which did not remove faulty expressions, leading to errors in later runs
BUGFIX: Fixed bug that prevented the usage of global process-related macros
BUGFIX: Loop Repositories operator can now be stopped
BUGFIX: Fixed recent processes being sometimes cut off in the Welcome perspective
BUGFIX: Fixed wrong default file extension for directory and file parameters
BUGFIX: Fixed rearranging of operators in subprocesses
BUGFIX: Fixed bug when creating charts for an empty example set
BUGFIX: Optimize Parameters Operator now interrupts with an understandable explanation when no performance values were delivered
BUGFIX: Fixed error with password fields when the password is less than 4 characters long
BUGFIX: Vector Linear Regression now checks for missing values
BUGFIX: Fixed scrolling when moving operators outside of visible area
BUGFIX: Support Vector Machine(LibSVM) can now be stopped
BUGFIX: Fix result of Join operator with only missing values in ID nominal attribute
BUGFIX: Decision Tree operators no longer fail with a cryptic error message when the label attribute contains missing values
BUGFIX: Generate Macro no longer proceeds if an error occurred during macro generation
BUGFIX: Using undefined macros as operator parameters now causes an error when executing the process
BUGFIX: Applying a k-NN model can now be stopped
BUGFIX: Logistic Regression (Evolutionary) can now be stopped
BUGFIX: NominalToNumerical can now be stopped
BUGFIX: Optimize Parameters (Evolutionary) can now be stopped
BUGFIX: Polynomial Regression can now be stopped
BUGFIX: Remove Duplicates operator can now be stopped
BUGFIX: Self-Organizing Map operator can now be stopped
BUGFIX: Support Vector Machine (Evolutionary) can now be stopped
BUGFIX: In most cases programs executed with Execute Program operator can now be stopped properly
BUGFIX: The chart selection menu in the results perspective should no longer appear in strange locations

New in RapidMiner Studio 6.0.2 (Jan 10, 2017)

New in RapidMiner Studio 6.0.1 (Jan 10, 2017)

New in RapidMiner Studio 6.0.0 (Jan 10, 2017)

New in RapidMiner Studio 5.3.14 (Jan 10, 2017)

New in RapidMiner Studio 5.3.13 (Jan 10, 2017)

New in RapidMiner Studio 5.3.12 (Jan 10, 2017)

Enhancements:
All operators that write files to disk will create missing directories
Disabling an operator with a sub-process does not disable its children operators anymore
Attribute parameters that are marked as mandatory but are not set will now cause an error when executing a process
RapidMiner creates a log file which logs exceptions
The operator tree will expand again when searching for operators
Clustering Algorithms will stop if processes is aborted
JDBC drivers updated
Macros in the macro view are by default ordered by macro name
Adds an API for adding custom functions to the Expression Parser
Improved performance of the import wizards
Tabs can now by minimized with Alt+Backspace instead of Ctrl+Backspace
Removed extensive logging if dockables are missing
Neural Net: Improved handling of attribute names
k-Means: Improved Metadata handling
k-Means: Applying nominal measures to numerical data is not possible anymore
Linear Regression: Improved missing values handling
Performance (Costs): Metadata checks for missing attributes
Map: Reduced the number of warnings shown in the log
Rename: Renaming attributes to an already existing attribute name is not possible anymore
Aggregate: Fixed error in median function that occured if ignore_missings was checked
Read CSV: Renamed 'escape character for quotes' parameter to 'escape character'
GSP: shows correct renderer in results perspective again
Loop Parameters: Show correct error if process is run without specifying parameters
Optimize Parameters: Improved keyboard handling of parameter dialog
Update Database: Fixed bug in case no columns are SET
Average: Improved error messages
Bug fixes:
BUGFIX: Fixed problems with uploading binary files to RapidAnalytics
BUGFIX: Fixed error in CSV import wizard
BUGFIX: Fixed memory leak in process result perspective
BUGFIX: Fixed error in Pareto Plotter
BUGFIX: Fixed error when calculating Cluster Density Performance of kMeans
BUGFIX: Fixed error in auto-wiring
BUGFIX: Fixed compatibility issues after copying and pasting operators
BUGFIX: Fixed bugs in the Regexp dialog
BUGFIX: Fixed bug in the "cut()" expression* Neural Net: Fixed model which gave different prediction depending whether the example set had a label or not
BUGFIX: Performance (Costs): Fixed error with missing prediction attribute
BUGFIX: Read Arff: Fixed handling of missing values in date attributes
BUGFIX: Expectation Maximum Clustering: Fixed missing values handling
BUGFIX: GSP: Fixed problems with binominal regualr input attributes
BUGFIX: Generate Attributes: Fixed removal of attributes if overwriting attributes, keep_all parameter removes all attributes
BUGFIX: Loop Parameters: Fixed parameter editor dismissing values
BUGFIX: Send Mail: Fixed bug which caused an error after the password is encrypted
BUGFIX: Numerical to Date: Fixed error in the attribute selector

New in RapidMiner Studio 5.3.10 (Jan 10, 2017)

New in RapidMiner Studio 5.3.9 (Jan 10, 2017)

New in RapidMiner Studio 5.3.8 (Jan 10, 2017)

New in RapidMiner Studio 5.3.7 (Jan 10, 2017)

New in RapidMiner Studio 5.3.6 (Jan 10, 2017)

New in RapidMiner Studio 5.3.5 (Jan 10, 2017)

New in RapidMiner Studio 5.3.0 (Jan 10, 2017)

Enhancements:
Development JDK was switched to Java 7 but code still is compatible with Java 6.
Added new and improved extensive documentation (often including tutorial processes) for almost all operators
Added improved RapidAnalytics support. New run button: "Run process on RapidAnalytics". Can only be used if the process is stored on a RapidAnalytics repository. Instantly runs the process on the RapidAnalytics server the process is stored on
Connecting ports in reverse order is possible now (Input port -> output port)
Run on RapidAnalytics-Dialog can choose execution queue
New operators: Create Archive File and Add Entry to Archive File allow to create zip files
New operator: Performance to Data
New operator: Throw Exception
New file system operators: Copy File, Move File, Delete File, Rename File, Create Directory
New operators for handling annotations: Annotate, Annotations to Data, Data to Annotations, Extract Macro from Annotation
Aggregate: new aggregation functions: sum (fractional), count (fractional), count (percentage) and string concatenation
Execute Program operator has File Object ports for stdin, stdout, stderr
Loop Attributes: new output port which collects the data from all iterations
Macros can be passed through the command line. Example: 'rapidminer //repository/home/test/process -Mkey1=value "-Mkey2=value with spaces"' will provide two macros named key1 and key2
Result Perspective: Added button in top right corner of a result tab to close all open results at once
Repositories View: Added button which navigates to the repository location of the currently opened process
Repositories View: Added popup menu item to open the selected entry in the OS file browser (only for Local Repositories)
Added new view: 'Macros': This view shows macros and their values in real time during process execution
More consistent handling of input sinks and output sinks of processes and subprocesses: Sinks can be moved up or down by dragging them with the left mouse key while shift key is pressed. Removed double click on process input sink, all actions on process sinks can now be trigged via a popup menu
Added resize button to text parameter editor dialog
Added resize button to text parameter editor dialog
Preferences menu buttons reworked: 'OK' button saves settings permanently and closes the dialog; 'Apply' button saves settings permanently and does not close the dialog
Generate Data: warn user if the selected number of attributes is not supported by the target function
The RM window title now shows the complete location of the current process to avoid confusion with multiple processes with the same name
Trying to create a new process/open another process/exit RM while a process is running now requires user confirmation
Local repositories which are inaccessible for any reason now have an annotation which shows that they are inaccessible
'Remote' repositories are now called 'RapidAnalytics' repositories.
Plugin loading: when loading manually installed plugins from the webstart or plugins folder with multiple versions, load the one with the highest version number
Added database metadata caching to improve performance. If you need to clear the cache, use the menu item 'Clear Database Metadata Cache' under the 'Tools' menu.
Declare Missing Values: allow to declare an empty string as missing, and ignore attributes which are not compatible to the selected mode.
Extract Macro: added optional list parameter to add unlimited macro name/value pairs when 'macro type' is set to 'data_value'.
'Synchronize Meta Data with Real Data' toggle button added in the top right corner of the process view. This will now propagate the real meta data to all reached input ports after process execution. This means that for example the operator after a breakpoint will have its meta data synched with the real data, therefore enabling e.g. attribute selection parameters to show the list of paramters available. Known Issue: Currently this information is lost once the operator updates itself, which happens for example if you deselected/select it again.
Loop Until: Added 2 checkboxes in order to choose whether you want a condition check depending on the example set or on the performance. 'condition before' is now deselected by default.
Select Parameters Dialog: Parameters from type ParameterTypeString are now treated like numerical Parameters (the Grid option is enabled now), in case you want to assign a row of numerical values to a String Parameter.
Select Parameters Dialog: All acceptable parameter types are now shown, even if the continuous/discrete mode is enabled.
Regular Expression Dialog: Dialog has a new tab: Regex Options. In this tab, the user can define options like multiline mode or case-insensitive matching. These options will be added to the pattern though embedded flag expressions.
Added new shortcut to toggle the breakpoint before an operator (Shift+F7)
Improved many error messages
Update Dialog: Revamped update dialog to show various lists of packages (search, most popular, bookmarks, etc. as well as functionality to log in/out)
Added a startup check for purchased but not installed extensions and a property to disable the check. Those Extensions can be directly installed from the dialog.
RapidMiner enters safe mode (not loading plugins) when startup was interrupted
Added new 'Export as PDF' action to the 'Print results or export' dropdown button
Deleted tooltips for the operator list of the OptimizeParameters (Grid) operator
UpdateDialog: Switched the positions of the install button and the link to the extension homepage
UpdateDialog: Checks if the user purchased the extension when returning to the dialog after hitting the "purchase" button
Replaced the standard Random function in the ExpressionParser with a custom one in order to involve the random seed/the RandomGenerator for the process
Updated JTDS driver to version 1.2.5
Loop Collection Operator has new parameters: 'set iteration macro', 'macro name' and 'macro start value'
Execute Process Operator: inverted the default values for all boolean parameters
Repository names now enforce a blacklist of invalid characters
Operators "Execute Process" and "Retrieve" are now named after the files your drop into the Process window
Nominal to Numerical: Parameter "default coding" is now set to dummy coding per default
Changed the help menu entry "Update RapidMiner" to "RapidMiner Marketplace"
Improved Remote Repository authentication
Improved data import wizards
Added default dialog options when running a process
Added attribute selector for the Extract Performance Operator
Database access: when several database connections with the same name exist, the once provided by the same server as the process is preferred
Bug fixes:
BUGFIX: Recent files are now updated on process save
BUGFIX: The condition on performance check at the Loop Until Operator now works properly if the performance decreases
BUGFIX: Editing context variables now immediately flags the process as changed
BUGFIX: If set to 'ask', the 'close previous results' dialog will no longer appear when resuming from a breakpoint
BUGFIX: Breakpoints can no longer be added to the root operator
BUGFIX: 'Store process here' popup menu action on another process will now correctly flag the process as saved
BUGFIX: Added error message for 'Find Threshold' operator when an invalid class name is entered as parameter
BUGFIX: The Pivot operator used on an empty example set no longer creates an example set with one example (filled with missing values)
BUGFIX: The De-Pivot operator now has much better error handling when trying to setup the index attribute as an already existing attribute
BUGFIX: Read Excel cannot open the Import Configuration Wizard, if the excel_file parameter is not set
BUGFIX: Nominal to Binominal can handle border cases with mapping containing less than 2 values
BUGFIX: Configure Repository dialog now saves user credentials fore remote repositores again
BUGFIX: Added error shown in Problems view when entering invalid regular expression for 'replace what' parameter of Rename by Replacing operator
BUGFIX: When moving a repository entry to another location which contains an entry with the same name asks for overwrite instead of showing error
BUGFIX: Execute Process operator potential error reporting improved
BUGFIX: Creating folders in repository with the same name but different capitalization (e.g. 'test' and 'Test') is now forbidden
BUGFIX: filtering numerical and date attributes with Filter Examples is possible again
BUGFIX: Wrong parameter format in Clone Parameters causes exception
BUGFIX: Fixed GUI problem when trying to schedule a process on RapidAnalytics without existing RapidAnalytics repositories
BUGFIX: Fixed possible data loss when trying to store data/processes/etc in the repository using invalid characters for the given filesystem by now showing an error instead of failing silently
BUGFIX: Fixed possible data loss when trying to move repository entries into their own folder
BUGFIX: fixed an initialization problem in the Cross Distances operator, which caused wrong calculation of distances in some rare cases
BUGFIX: Quickfix Dialogs no longer vanish right after showing (RCOMM2012)
BUGFIX: RapidMiner no longer blocks for a varying amount of time if a connection to an online server fails
BUGFIX: Focus issue with delete action
BUGFIX: Import Binary File no longer freezes the GUI
BUGFIX: New Plotters showed a "This should not happen" message once in a while and where unusable until restart of RapidMiner (Bug #1274)
BUGFIX: Real to Integer operator: don't convert missings to 0, but keep them as missing
BUGFIX: Fixed wrong cell selection on rightclick for some tables after reordering columns
BUGFIX: Fixed startup failure when trying to start RapidMiner with broken Plugins
BUGFIX: Optimized update routine. Instead of failing with an cryptic error if no admin rights available, RapidMiner no shows an dialog that asks for admin rights
BUGFIX: Fixed the "Select for installation" button: Error when content/behaviour changed according to factors like if the extension was purchased. Now leads to the extension website when purchased but not installed. Now reacts properly to double-clicks in the extension list The "Install" button turns to a disabled state when no extensions are marked for installation
BUGFIX: AccountService is only queried when we are logged in. The login state is saved internally
BUGFIX: Process undo steps are now reset when a new process is opened
BUGFIX: Using undo in a subprocess will no longer reset the view to the top-level of the process
BUGFIX: The welcome perspective now updates the recent files list, so opening a process via it now opens the correct selected process
BUGFIX: Drag&Drop from the OS to the RapidMiner Process design canvas now also works for .xlsx files
BUGFIX: Fixed several repository problems when trying to overwrite entries with themselves
BUGFIX: UpdateDialog: Purchase link now changes to "install" after logging in and the purchase button now redirects to the extension website
BUGFIX: Generate Aggregation operator can now handle the case of zero matching attributes
BUGFIX: Fixed several key shortcuts that worked in the wrong perspective. For example, it is no longer possible to delete operators while in the result perspective
BUGFIX: Drag&Drop of files (e.g. .csv/.xls) creates the corresponding read operator with the now correct filename parameter
BUGFIX: Drag&Drop of operators should no longer create operators halfway outside the process canvas
BUGFIX: Import wizards will no longer overwrite existing data without asking for permission first
BUGFIX: Import wizards will no longer accept wrong file types/invalid filenames in the first step
BUGFIX: Read SAS operator no longer causes an internal error when the data file could not be read
BUGFIX: "Wiki" links in the documentation tried to open a tutorial process, now open the corresponding wiki page
BUGFIX: Macro Editor will now remember entered values without having to press enter
BUGFIX: Opening the context menu on result tables will no longer deselect the currently selected cells
BUGFIX: Pressing "Delete" in a subprocess with the surrounding operator selected will no longer result in deletion of the whole subprocess

New in RapidMiner Studio 5.2.8 (Jan 10, 2017)

New in RapidMiner Studio 5.2.6 (Jan 10, 2017)

New in RapidMiner Studio 5.2.2 (Jan 10, 2017)

New in RapidMiner Studio 5.2.1 (Jan 10, 2017)

New in RapidMiner Studio 5.2.0 (Jan 10, 2017)

New in RapidMiner Studio 5.1 (Jan 10, 2017)

New in RapidMiner Studio 5.0 (Jan 18, 2010)

New in RapidMiner Studio 5.0 Beta (Jan 18, 2010)

New in RapidMiner Studio 4.5 (Jan 18, 2010)

New in RapidMiner Studio 4.4.2 (Jan 18, 2010)

New in RapidMiner Studio 4.4.1 (Jan 18, 2010)

New operators:
ForwardSelection
NeuralNetImproved
KernelNaiveBayes
ExhaustiveSubgroupDiscovery
URLExampleSource
NonDominatedSorting
Deprecated operators:
NeuralNet (use NeuralNetImproved instead)
NeuralNetSimple (use NeuralNetImproved instead)
Deprecated operators are also shown in context menu with
a light gray color now
The notification mail at the end of a process can now
also be sent by SMTP instead of sendmail
Most file based data input operators now provide an option
to skip error lines
Most file based example source operators (Arff, Excel,
DasyLab, Stata, SPSS, XRFF) as well as the IOObjectReader
and the new URLExampleSource now accept URLs instead of
a filename for the input source location
All discretization models now support the definition of
the desired number of digits for automatic interval name
determination
The LiftParetoChart now supports the definition of the
number of digits for the confidence intervals
Improved time display in status bar
Enabling / Disabling operator now works with CTRL-E
Fixed several issues in GUI thread handling which
might have lead to deadlocks and long GUI updates
on certain systems
Clean-up of nominal value mappings in process log table
in case of sorted top-k for reduced memory footprint
Implementation Details:
DistanceMeasure creation now is based on the operator
and gets the input container as well
Bugfixes:
NeuralNet and NeuralNetSimple did not properly work
on regression problems. While NeuralNetSimple could
be fixed, a new operator NeuralNetImproved is now
provided which should be used instead of NeuralNet
and NeuralNetSimple. Since this operator is also faster
and more scalable, it should be used instead of the
both old (and now deprecated) neural net
implementations
Fixed bug in renaming where decimal point characters
got lost
Fixed issue in model applying leading to a wrong
remapping of the label values afterwards if an
independent test set was used. Important: this bug
did not deliver wrong predictions but simply changed
the label values displaying.
Fixed several issues in GUI thread handling which
might have lead to deadlocks and long GUI updates
on certain systems
Fixed bug in bar chart for numerical group by columns
Fixed bug in DasyLab example source which sometimes
led to doubled characters at the end of feature names
Fixed bug in OperatorSelector for macro usage

New in RapidMiner Studio 4.4 (Jan 18, 2010)

New operators:
ExampleSetSuperset
ExampleSetUnion
MacroConstruction
CumulateSeries
FastLargeMargin
Split
Construction2Names
NeuralNetSimple
Parameters will now be adapted according to an operator
rename, for example the settings of operators like
the ProcessLog or the parameter optimization operators
are automatically corrected to the new operator names
Graphs like the similarity graph display the strengths
of the edges now by their color
Added new tree layout algorithm for the decision trees
preventing most overlapping, the old tighter version
is available as layout type "Tree (Tight)"
Decision trees now show the subtree size as tool tip
for the inner nodes, the edges are now darker for
larger subtrees and brighter for smaller ones
Decision trees are learned faster now due to internal
optimizations in the splitted example set handling
Tables like the (meta) data view now supports a new
context menu for common table operations like column
sorting or row / column selection
The "New Operator" dialog now also supports full text
search in the description texts of the operators
RapidMiner now stores all parameter values in the
process files including the default values which ensures
a better compatibility with future versions. The XML tab,
however, only shows the values differing from the default
Plugins can now define a class com.rapidminer.PluginInit
providing a method "initPlugin()" which will be invoked
during plugin initialization
Univariate and multivariate series windowing operators
now also support nominal attributes and even mixed
types in cases where the series is represented by
the examples (rows) of the data set
The range statistics of nominal attributes in the
meta data view now shows the values with highest and
lowest occurrency counts, sorts the values according
to the counts, and displays only an excerpt of the
occurring values if large amounts of different values
exist
List of recent files is now directly saved after opening
a new process and not only during shutdown
Changes in the process setup are now allowed even during
process runtime, e.g. when waiting at a breakpoint
NaiveBayes can now handle new nominal values during the
model application phase
Deprecated operators are now rendered with a gray color
in the new operator tab and dialog
Updated to the latest version of Weka (as of February 26th,
2009)
Updated to the latest version of Joone, optimized some
of the neural network default parameters
Added some new sample processes to the sample directory
as well as to the tutorial
ExampleFilter and most important discretization parameters
are no longer expert parameters
ArffExampleSource now states an error message in cases
where attributes containing a space which is not quoted
New binominal classification performance measures:
positive predictive value
negative predictive value
psep
Implementation details:
SplittedExampleSet has been improved leading to
faster data access times for operators like cross
validation or decision tree learning
Plugins can now define a class com.rapidminer.PluginInit
providing a method "initPlugin()" which will be invoked
during plugin initialization
Bugfixes:
fixed bug accuracy criterion for the revised decision
tree learner
Fixed bug in parameter list of ValueSubgroupIterator
Fixed bug in ExceptionHandling which sometimes led to
doubled outputs
Fixed bug in ProcessBranch which sometimes led to
doubled outputs
ViewAttributes did not add min and max statistics
so that those statistics where not calculated on
data table views
Fixed bug in Windows GUI start script (linebreak)
Fixed bug for surface 3D plot where x and y were
replaced by each other
Fixed paths to icons for building blocks
Fixed issue with ROC plots in cases where several
points with same confidence occurred
Fixed potential thread deadlock during the filling
of the plotter list
Fixed bug for distance weighted vote and k = 1
in NearestNeighbors
Fixed a bug in ChiSquaredWeighting for mixed-type
data sets where the number of bins was smaller than
the maximum number of nominal values
The default global random seed in the preferences
dialog was not allowed to be set to -1
The property keys of the preferences dialog were
editable
Fixed bug in PolynomialRegression
Range normalization now delivers maximum value
for constant attributes
Weighted precision and recall do now no longer
deliver NaN if a class did not occur

New in RapidMiner Studio 4.3.2 (Jan 18, 2010)

New operators:
LinearDiscriminantAnalysis
QuadraticDiscriminantAnalysis
RegularizedDiscriminantAnalysis
DasyLabExampleSource
FileIterator
ExceptionHandling
ChangeAttributeNamesReplace
ChangeAttributeNames2Generic
DateAdjust
MinMaxBinDiscretization
RainflowMatrix
Deprecated operators:
DirectoryIterator (use FileIterator instead)
Renamed parameters:
ExampleSetWriter:
quote_whitespace is now named quote_nominal_values
ExampleSetMerge can now handle missing values
RapidMiner does now better support counts for the in-
and output types which should considerably reduce the
amount of warnings if operators like IOConsumer,
IOMultiplier or ExampleSetMerge (reducing several objects
of the same type to one of the same) are used
FileIterator replaces DirectoryIterator and adds many
new features like recursive iteration, file name based
filtering, and a new macro for the parent path
Centroid based clusterings now support assigning unseen
examples to the nearest cluster on apply time
ProcessBranch now supports a branching with respect
to the existance of an input object
ClearProcessLog now also allows to remove the complete
logging table
The logging tables of the ProcessLog operator will now
not be generated during start up but during the first
operator usage (and also during the following if the
table was deleted in the meantime, e.g. in a loop)
Added support for different time zones, users can now
define the preferred time zone in the settings dialog
and time conversion operators are not able to respect
this setting
Date and times are now displayed in the system's local
settings
New plotter: Block
Added support for applying a log scale for the color
column for the Scatter plot and the new Block plotter
Data tables like those generated by the process log
are now de-coupled from the table used for plotting
preventing that the rows will be sampled and rows
would be removed from the data table
A double click on the region between two columns in
the table header now automatically resizes the left
column to a fitting size (known from Windows programs)
A double click on the same region while pressing CTRL
will resize all table columns according to the contents
GuessValueTypes now only works on regular attributes
and provides a parameter for extending it on the special
attributes (work_on_special)
AttributeFilter now also provides a new parameter
work_on_special
The operator Replace now also allows empty replace_by
values
The ExampleSetJoin operator now also works if the
id of the first example set is not part of the second
Guess value types can now handle missing values
CSVExampleSetWriter now supports the parameter quote_nominal
All feature selection and weighting operators now also
provide the possibility to log the names of the features
of the current generation's best individual
The Replace operator now supports capturing groups
The file based example source operators (ExampleSource,
SimpleExampleSource, CSVExampleSource...) now better
supports quoted strings and also escaped quotes (escaping
with ")
Implementation details:
The method Tools.quotedSplit(...) should now be used
instead of a regular split followed by the method
Tools.mergeQuotedSplits(...)
Bugfixes:
fixed bug in DBScan for empty cluster models
fixed bug for simple sampling in cases where a local
random seed was used
fixed bug in process logging to files which prevented
the writing of the first logged result
fixed bug in PSO optimization for cases where the fitness
should be minimized instead of maximized
fixed bug in binary performance measure which was not
delivering the fitness for specificity, sensitivity,
and youden index
fixed bug in meta data table viewer in cases where huge
numbers of long nominal values existed which caused a
crash of the Java Virtual Machine in some cases

New in RapidMiner Studio 4.3.1 (Jan 18, 2010)

New in RapidMiner Studio 4.3 (Jan 18, 2010)

New operators:
AccessExampleSource
Example2AttributePivoting
Attribute2ExamplePivoting
PolynomialRegression
Similarity2ExampleSet
ExampleSet2SimilarityExampleSet
Nominal2String
String2Nominal
Date2Numerical
Real2Integer
Numerical2Real
Nominal2Numerical
Numerical2Binominal
Numerical2Polynominal
- AbsoluteDiscretization
ConditionedFeatureGeneration
AttributeAggregation
SupportVectorCounter
MutualInformationMatrix
GaussFeatureConstructionOperator
ProductGenerationOperator
AbsoluteValues
MovingAverage
ExponentialSmoothing
SeriesMissingValueReplenishment
DifferentiateSeries
IndexSeries
Numerical2Real
Real2Integer
FillDataGaps
EnsureMonotonicity
WindowExamples2ModelingData
WindowExamples2OriginalData
ProcessLog2AttributeWeights
Mapping
Substring
Trim
Replace
AddValue
MergeValues
AttributeConstruction
ValueIterator
IOStorer
IORetriever
SQLExecution
ClearProcessLog
ProcessLog2ExampleSet
Data2Performance
Data2Log
Macro2Log
DataMacroDefinition
LiftParetoChart
Deprecated Operators:
Nominal2Numeric (please use Nominal2Numerical instead)
Numeric2Binominal (please use Numerical2Binominal instead)
Numeric2Polynominal (please use Numerical2Polynominal instead)
LinearCombination (please use AttributeAggregation instead)
AttributeValueMapper (please use Mapping instead)
AttributeValueSubstring (please use Substring instead)
AddNominalValue (please use AddValue instead)
MergeNominalValues (please use MergeValues instead)
New implementation of clusterings for more efficient computing and memory usage:
Reimplemented or adapted operators:
AgglomerativeClustering
ClusterModel2ExampleSet
DBScanClustering
ExampleSet2ClusterModel
FlattenClusterModel
KMeans
KMedoids
KernelKMeans
RandomFlatClustering
SupportVectorClustering
TopDownClustering
ClusterModelWriter
ClusterModelReader
TransitionMatrix
Removed operators:
AgglomerativeFlatClustering, use AgglomerativeClustering and FlattenClusterModel instead
BregmanHardClustering, use KMeans with BregmanDivergences instead
ExampleSet2ClusterConstraintList
MPCKMeans
TopDownRandomClustering, use TopDownClustering with RandomFlatClustering as inner learner
UPGMAClustering, use AgglomerativeClustering with average link instead
SimilarityComparator
The new AttributeConstruction operator supports infix
written formulas, a simple format for constants and
new calculation methods
Better support for special characters in process XML
Macros are now also supported in parameter lists and for
numerical parameters
Added new overwriting mode to the DatabaseExampleSetWriter
named "first overwrite, then append"
Replaced "append" parameter in ExampleSetWriter by the
new overwriting modes "none", "overwrite", "append",
and "first overwrite, then append"
ExampleFilter can now use regular expressions for the values
of the nominal attribute value filtering
New Plotter: Pareto Chart
New Plotter: Series Multiple
New Plotter: Scatter Multiple
The old scatter plotter has been divided into a new Scatter
plot and the new Scatter Multiple plot
Most plotters now support panning during zooming by
pressing the Ctrl Key while dragging the mouse
The file chooser in the modern look and feel now always
remembers the last directory from which a file was chosen
as an additional default bookmark (on the left)
Changed the order the in which models are added to the
grouped model (ModelGrouper), i.e. the last created model
will now be added as last one
The wizards of the database reading and writing operators
are now initialized with the last settings
The feature selection and feature weighting operators are
now based on double arrays which should lead to smaller
memory footprints
Added new performance measures:
sensitivity,
specificity,
Youden index,
relative error lenient,
relative error strict
The CachedDatabaseExampleSource operator has now a more
appropriate wizard
The plotters now provide consistent colors for classes
Improved the names of the features of the (multi-)variate
windowing operators
Multivariate windowing now also supports a name for the
label column in addition to the index
Multivariate windowing can now also applied without the
creation of a label and even with horizon 0
Improved the graph and plotter panel for long column / item
names, long names are now displayed in a short fashion and
the full name is shown as tool tip
DecisionTree now supports a new parameter min_size_for_split
Added new process branch conditions:
attribute_available,
min_examples,
max_examples,
min_attributes,
max_attributes
The viewers for symmetrical matrices like correlations etc.
now always shows the values of the first column
Improved the range names of discretized data
Added selection of criterion to AssociationRulesGenerator,
also improved the visualization of association rules by
adding a selector for the criterion used for the minimum
value slider
Added new option for Normalization. Now might chose from z-transformation,
range-transformation or the new proportional transformation via category selection.
LinearRegression is now also applicable on binominal
classification tasks
Added support for logging only the top-k or bottom-k objects
with the ProcessLog operator
Improved the parameter optimization / iteration dialog:
small numbers are no longer cut off, GUI is more consistent,
dialog now used icons
Improved the CachedDatabaseExampleSource operator and
database handling: now arbitrary tables are accepted and
primary keys (index) and / or mapping tables are
automatically handled
Integrated the latest version of the JFreeChart library
A dialog informs the users now if any unknown parameters
were part of the process during loading
A SimpleVoteModel now supports the output of textual
results
(Multivariate) Windowing on example based input representations
now keep the input id attribute
Added writing of intermediate weights for GeneticAlgorithm
(feature selection) and EvolutionaryWeighting (feature
weighting), both operators now also support the initialization
with attribute weights (e.g. from the last run)
Implementation Details:
Moved AnovaMatrix(Operator) into the package
com.rapidminer.operatir.visualization.dependencies
Moved all attributes based matrix operators
(correlation, covariance etc.) into the new package
com.rapidminer.operatir.visualization.dependencies
Moved aggregation functions into package
com.rapidminer.tools.math.function.aggregation
Bugfixes:
processes now only write the logged information from the
run, not the global information for example collected from
the GUI. Hence, the logging will also no longer directly
overwrite old log files right after loading
switch workspace and initial workspace selection now
prevent the selection of the RapidMiner main directory
and all subdirectories in order to prevent a recursive
copy
switched weight "direction" for corpus based weighting
fixed bug in evolutionary parameter optimization in
combination with logging
fixed bug in Wizard for ExampleSource preventing the
correct guess of value types (were always nominal)
fixed error in nominal re-mapping for cases where the
nominal values of training and test set did not match
fixed jittering bug in Histogram plots causing the bins
to drop out of the plotter
fixed minor bug in ExampleSetWriter which caused the
ExampleSource operator to state a warning
fixed bug if special characters were part of the process
XML
DistributionModel is updatable now
AttributeValueSubstring ignores missing values and is
able to extract single characters now
Fixed a GUI error only occurring in Java 6 Update 10
Fixed bug in FeatureSubsetIteration where the specified
maximum number of features was not used
Fixed bug in PerformanceVector writing from the result
dialog (Save button) which led to large data files and
long runtimes until the data was actually saved
Fixed bug in uninstaller which under certain circumstances
also removed non-RapidMiner files in the installation
directory

New in RapidMiner Studio 4.2 (Jan 18, 2010)

New operators:
Nominal2Date
Date2Nominal
KernelPCA
EqualLabelWeighting
StataExampleSource
FeatureSubsetIteration
RelativeRegression
AttributeValueSubstring
CachedDatabaseExampleSource
NameBasedWeighting
BatchProcessing
GroupModel
UngroupModel
Aggregation now supports multiple aggregations (also of
different attributes) as well as grouping by values of
multiple attributes. Aggregation attributes and functions
are now specified by a parameter list.
Added support for attributes with value types date, time,
and data_time: these can be created from nominal attributes
with the operator Nominal2Date for arbitrary date formats
Histogram plotters now support jittering and log scales
The database wizard is improved and now supports large
data sets which caused memory problems in the older
versions during table and attribute name retrieval
The statistics in the meta data view of data sets are
no longer calculated per default for data sets larger
then 100000 rows - the calculation is available from
the menu in the upper right corner
"ExampleSet" was renamed to "Data Table", the rows are
still called "Example" and the columns are still called
"Attribute"
The iteration through partitioned / splitted data sets
is now more efficient (especially for linearly splitted
sets)
All plotters can now handle missing values
Many plotters now support the plotting of absolute values
and / or sorting according to the plotted column
Removed time-consuming checks (including a full data
scan before plotting)
One-Class SVM for LibSVMLearner now properly supported
The new operators GroupModel and UngroupModel now replace
the automatic building of ContainerModels (merging
preprocessing with prediction models) and hence give the
user more control over the model building / grouping
process
AttributeSubsetPreprocessing now supports the inversion
of the specified regular expression
The operator AttributeSubsetPreprocessing was enhanced
so that it can now be applied on subsets defined similarly
to the new AttributeFilter operator. Hence, the subset
preprocessing can for example only be performed on
nominal or numerical attributes
The database example set writer now supports new
overwriting / appending modes
Instead of the "work_on_database" mode of the usual
DatabaseExampleSource operator we now recommend the
new operator CachedDatabaseExampleSource which will
keep the data in the database in a more efficient way.
However, please note that writing in such a table is
not directly possible and must be performed with a
DatabaseExampleSetWriter
Implementation Details:
optimized KNN for speed issues, gaining boost up to 13x
replaced NaiveBayes with highly efficient version
(changes: distribution plots now show conditional
probabilities without consideration of a priori probabilities,
heuristic use of kernels has been removed)
integration of RapidMiner is now easier since the location
of plugins and Weka can be properly defined with settings
and the definition of "rapidminer.home" is no longer
necessary
clean-up for value types (Ontology)
The ValueInterface now delivers Object instead of double,
i.e. the logging of nominal values is now also supported
New renderer service for providing the visualizations
of the results. This will replace the method
getVisualizationComponent() in the long run
added latest version of the chart library (as of July
13th 2008)
added latest version of Weka (as of July 13th 2008)
Bugfixes:
Fixed two bugs in new parameter wizard gui for string
and integer parameters
CSV- and SimpleExampleSource now accept lines which
correctly divided empty strings (i.e. missing values)
at the end of the lines
Fixed wrong number of bins for the square root number
of bins in the frequency discretization operator
Fixed closing behaviour of the switch workspace dialog
Changes in XML tab were not used if the tab was left
in other ways than by changing the tab to another one

New in RapidMiner Studio 4.1 (Jan 18, 2010)

New operators:
StratifiedSampling
AbsoluteStratifiedSampling
GuessValueTypes
UseRowAsAttributeNames
MemoryCleanUp
MaterializeDataInMemory
UncertainPredictionsTransformation
CovarianceMatrix
AttributeFilter
RandomSelection
FrequentItemSetUnificator
FrequentItemSetAttributeCreator
OperatorSelector
CostEvaluator
AttributeMerge
KennardStoneSampling
New 64 bit version for Windows x64 OS now provided;
other 64 bit systems are supported by using a
64 bit Java version
Parameter optimization operators now provide a nicer
wizard dialog for setting the parameters
All GUI elements provide now longer descriptions for
operators
SplitChain and AbsoluteSplitChain were moved from the
postprocessing into the meta group
Meta group was restructured and two subgroups (control and
other) were added
Fixed a memory leak in the result history which was affecting
the GUI for multiple processes if they were performed in a
single sequence
SOMDimensionalityReduction and SVDReduction are now able to create
a preprocessing model
BruteForce and GeneticAlgorithm feature selection now support
a minimum and maximum number of features and also the selection
of a exact number of features
RapidMiner now offers two different look and feels: modern
(recommended) and classic
Improved comment tab so that it already registers and saves
new text directly after it was typed (instead of changing
the tab)
DataStatistics (IOObject) now shows the standard deviation
like in the GUI instead of the variance
Robustified ExampleSource wizard: the same output files
as the input file are no longer allowed
Series Plotter does now no longer scale the axis ranges in
a way that zero must be contained
All SVM and other hyperplane models now supports the visualization
of a sortable data table for the coefficients (weights)
An error message now indicates if XML entities are used for
operator names which is not allowed
Anova calculator now allows value editing in table and the
specification of the significance level
Meta data views can now be correctly sorted according to sum
or unknown value columns
MissingValueImputation: added warnings in the case that not all
values could be imputed, improved attribute ordering (ascending
and descending sorting, sort by number of missing values), added
log messages
Naive Bayes distribution model now uses the same class coloring
for both numerical and nominal distributions
Latest available Weka version integrated (as of 2008/05/09)
Implementation Details:
The AttributeParser no longer supports batch generations
The ClusterModel reader is now able to read both compressed
and uncompressed files
PCA and GHA now use global covariance matrix calculation
Bugfixes:
LibSVMLearner now provides the correct range for the nu
parameter
Fixed bug in AttributeParser which prevents the correct
calculation for nested generations or cases where
the generation is divided into several operators
Fixed bug in value type guessing for numerical columns
with missing values
Fixed bug in ExampleSetTranspose for missing values
in nominal attributes
Fixed bug in DatabaseExampleSource Wizards for user
defined URLs
Parameter lists are now cloned correctly
Fixed bug for quoted input files occuring in some cases
where the quoted string was part of the line before
Fixed a bug for learning with example weights with the
JMySVM learner
Fixed a NPE if empty example sets were used as input
for feature selection operators
Fixed wrong normalization for confidences predicted by
distribution models (e.g. NaiveBayes)
AttributeEditor and ExampleSource wizard did not regard
the decimal point character (and quotes)
The value type guessing operators did not take a
possible decimal point character different from
'.' into account
Fixed tool tip for z-transform in Normalization operator:
changed "variance" to "standard deviation"
Fixed locale for Ok - Cancel dialogs to US locale like
the rest of RapidMiner
Fixed bug in operator tree which caused the reconstruction
of the expansion state to be faulty in some cases
Fixed statistics copy bug introduced in 4.1beta2 for
predicted label statistics

New in RapidMiner Studio 4.1 Beta 2 (Jan 18, 2010)

New operators:
ProcessBranch
FileEcho
ExchangeAttributeRoles
ChangeAttributeRole
SeriesPrediction
Deprecated operators:
ChangeAttributeType (use ChangeAttributeRole instead)
New version of chart plotting library
New plotter: Series
Removed the numerical sample sizes for the tree and rule learners
Introduced different shapes for plotter points
Use bigger strokes for plotter lines
Added max_items parameter for FPGrowth
Changed default mode for view creation of preprocessing models
Added signum generator for manual feature generation and for
generation with YAGGA2
Relief can now handle missing values
Changed default data representation back to double because too
high number of rounding errors otherwise for larger data ranges
Implementation Details:
Introduced AttributeDescriptions and AttributeTransformations
in order to lower large memory consumptions due to clones
and to avoid re-wrappings for new views on the example
set view stack
removed clone of mappings for clones of nominal attributes
Changed DataRow methods from package private to protected
ConditionedExampleSets no longer support dynamical
conditions
Changed default data representation back to "double"
The visualization of integers and the nominal statistics
calculation are now based on longs instead of integers
Bugfixes:
Fixed MAJOR bug introduced in 4.1beta in example sets /
views which occured after a new view was created on
top of a splitted example set (e.g. in a cross validation)
and has hidden the partition then
Fixed some problems (due to too much cloned objects, see
above) which caused much more memory usage in 4.1beta
Fixed bug in PredictionTrendAccuracy calculation
Fixed wrong linefeeds in unix start scripts
Fixed bug in aggregation function selection of the chart
plotters
Fixed ID handling bug for example sets (views) which
prevented the correct application of Id-based operators
like the ExampleSetJoin operator
Fixed bug in table index assignment of view attributes
Fixed bug in SortedExampleSet
Fixed bug in some plotters based on JMathplot
Removed remapId() call in IdUtils which increased the
runtime of some clustering schemes (especially DBScan
and SupportVectorClustering)
Fixed bug in RuleLearner for nominal attributes
Fixed bug for (operator / parameter) pair parameter
values for the parameter iteration and optimization
operators
Fixed wrong name for continous attributes in C45 loader
ConditionedExampleSet caused some problems if the base
attributes for conditions were removed after the
filtering
Fixed a bug in getNominalValue(Attribute) of Example
which delivered the first nominal value instead of
missing values
File filters do now accept lower and upper case
extensions
Fixed wrong colors after sorting a column of the
ANOVA matrix
Removed unnecessary statistics registration in nominal
attributes consuming unused memory and runtime
Fixed rounding error in the stepwise parameter
operators
Removed data representation type query during first
startup since rounding errors are often too high
AbsoluteSampling produced sample with duplicates

New in RapidMiner Studio 4.1 Beta (Jan 18, 2010)

RapidMiner GPL is renamed to RapidMiner Free and is licensed
under the General Public License version 3 (GPLv3) now
New operators:
SingleMacroDefinition
MissingValueReplenishmentView
Perceptron
SugroupDiscovery
ExcelExampleSetWriter
CSVExampleSetWriter
several new data generators
New preprocessing models for discretization and nominal to
binominal filter, these operators now create only a new view
on the data as default instead of actually changing the data
ArffExampleSource and XrffExampleSource now support sampling
Improved Windows installation
New icons and look and feel for GPL version
Added graph visualization for association rules
Added new filter modes for association rule visualizations
The non-GPL version now natively supports Oracle, IBM DB2, and
Microsoft SQL Server without the need of an additional driver
installation
The availability check for JDBC database drivers was improved,
the same applies for the corresponding dialogs
The database operators and wizards can now work with table
and column identifiers containing spaces and other special
characters
Improved performance of DecisionTree and RuleLearner for
data sets containing numerical values
Improved encoding handling for input operators, configuration
wizards, and attribute editor
New default encoding: 'SYSTEM' which uses the standard encoding
of the underlying operating system
All performance criteria now support example weights for
calculations if possible (and available)
New rule evaluation methods available for
AssociationRuleGenerator
Diagonal of confusion matrix is now marked by a different color
All clustering schemes do now use MixedEuclideanDistance
as default
The chart plotters (pie, bars) are now more robust for larger
data sets
The chart plotters (pie, bars) now provide the possibility for
the selection of an aggregation function type and use distinct
values only
KMeans now provides a warning for data sets containing
missing values
The sometimes slightly annoying dialog asking for saving the
process can now be deactivated
Passwords are now encrypted in XML (also in files) ensuring
that passwords cannot be read from process files
New Plotter: Distribution
Changed operator numbering in operator info dialog for
inner operator conditions
The error messages and the error stack trace in the details frame
can now be copied via Ctrl-C
Data files written by the ExampleSource configuration wizards
are now compatible to the standard parameters
ExampleSource now uses quoted nominal values as default
New visualization for NaiveBayes models
Operator trees do now not longer change their expansion status
after saving them or after process stops
Multiple paste operations are now possible after copy
Decision trees show now the size of leaf nodes through the
height of the frequency bar
More evaluation measures added for association rules
ExampleSources now also allow the usage of no comment charaters
Increased the default size for the file chooser and the text
dialogs
Text dialogs like the SQL editor do now keep linefeeds and
tab information
Changed the default minimum support of FPGrowth to 0.95 and added
an option to decrease the support until a minimum number of
frequent item sets was found. The latter working mode is the
default now.
KMeans cluster models now provide a parallel plotter
visualization of the cluster centroids
New macro: %{v[OpName.ValueName]} which will be replaced
by the current value of the specified value of the operator
Added cross-entropy as a new classification criterion
Ranges of discretized attributes now contain information about
the numerical thresholds
Changed default criterion for RuleLearner from accuracy to
information gain
Added default data management type to the initialization
screen
Icons for all tabs (non-GPL version)
Latest Weka version (as of 30/11/2007)
Implementation Details:
New init method also allowing the easy definition of
additional operators
ParameterSet now provides access to parameter values
New views (example sets) in order to improve the integration
into other products
Changed signature of startCounting(ExampleSet) to
startCounting(ExampleSet, boolean) in MeasuredPerformance
(see above)
All Models now have to return the transformed example set
instead of changing the values by side-effect. This was
necessary to allow the usage of views and view models
Bugfixes:
Fixed wrong license texts
Removed the file weka.jar from the free version which
was accidentally included in the last release. Weka is
of course still part of the GPL version of RapidMiner
Fixed templates (SimplePerformance was renamed to
Performance)
Fixed example visualization in cluster models (wrong
examples were shown in some cases)
Wizard from Welcome Screen did not change into edit mode
Faulty wizard files were fixed
Faulty building block files were fixed
Fixed bug in the calculation of the confidence of
association rules
Fixed bug if several manual feature construction were
applied in a row (overwriting old generated columns)
Unknown values of nominal attributes were not correctly
encoded in Arff files
Problem with example encoded multivariate series in the
MultivariateSeries2WindowExamples operator
Fixed bug for ranking in TransformedRegression
Fixed bug in RapidMiner initialization for user defined
operators.xml streams
Configuration Wizard of ExampleSource did not use correct
encoding from process root operator
After deleting the contents of a password field it was
still part of the process setup (empty string in XML)
Changed result set scrolling type to "sensitive" which
is necessary for the Microsoft SQL Server 2005
Fixed bug in XrffExampleSetWriter which did not properly
escape XML characters
Using Save for a ParameterSet result did not work
Fixed Weka related bugs in the online tutorial
Fixed a possible stack overflow error in the
RepeatUntil meta chain
Fixed problem with Microsoft SQL Server 2005 with
respect to the scrolling / updating behavior
ProcessLog got an error if the value "best_length" of a
feature operator should be logged
Fixed error in the k-distance plot which calculated a wrong
x-axis offset for certain settings
SparseFormatExampleSource did not trim the sparse array which
caused higher memory usages
Removed data view icon for some of the plotters since an
error in a third-party library caused problems after
activation
RuleLearner did not use numerical attributes twice
Fixed error in attribute editor which has added empty
data lines after re-opening the edit dialog

New in RapidMiner Studio 4.0 (Jan 18, 2010)

New operators:
Performance (could be used in most cases instead of the
now deprecated PerformanceEvaluator)
ClassificationPerformance
BinominalClassificationPerformance
RegressionPerformance
UserBasedPerformance
SingleRuleWeighting
MPCKMeans
Almost all process setups will now also correctly work if the
nominal values of training and test data are not defined or
are not defined in the same order
The somewhat big operator "PerformanceEvaluator" is now
deprecated and was divided into several smaller operators
which now fit the different learning task types.
Added compatibility checks for the example sets for
prediction models between training and application data
Added a filter for the New Operator tab
Added learning for numerical attributes for rule learners
Renamed lowest verbosity level to "all"
Improved visualization of performance criteria
Added automatical ROC curve visualization for AUC criterion
Added averaged ROC curves
Added deviation plotter
Improved ExampleSetMerge
Improved rule learning on numerical data sets
Improved tree learning on numerical data sets
Added k-distance plot for similarity measures in the
similarity visualizations
Changed AUC calculation to a more pessimistic calculation
which better fits the ROC plots
Operator info is now available in context menu of
operator list in new operator tab
Added example visualizations after clicking a node
in the graph view of similarity visualizations
Improved the speed confidences are set for LibSVM models
Latest Weka version included (as of 30/07/07)
Implementation Details:
Revised clustering operators and introduced improved
abstract clustering
The global logging can now be specified either by
general properties or via the method
LogService.initGlobalLogging(...)
Attribute.getStatistics(...) is now deprecated, please
use ExampleSet.getStatistics(Attribute, ...) instead
Changed the log verbosity of the process informations
at the beginning and the end of process executions
Plugins can now define own building blocks in their
resources directory (each bb file is described by a
line in the file "buildingblocks.txt")
Improved closing of streams in error cases
Bugfixes:
Removed unnecessary parameters from RandomForest
Fixed attribute name bug (not case sensitive) causing
errors in some preprocessing operators if features with
the same name but different cases exists
Fixed bug in Anova and T-Test calculation (wrong degree
of freedom)
Fixed bug during weight normalization which lead in many
cases to a concurrent modification exception which was
covered by a process change message
Removed possible bug in UPGMA-Clustering
Graph View of Cluster Models did not work
Added missing clone in discretization operator which
might have caused problems in cases where the discretization
was added into an iterating chain (like validation chains)
Streams are now not automatically closed during XML (de-)
serialization
Rule learners did not produce greater equal conditions
Plotters now can handle missing values for plot columns
Mikro-averages of attribute weights were not correctly
calculated
Fixed bug if a data set (.aml) is re-loaded containing
confidence attributes
Added missing option for k in the k-distance plots
(similarity visualizations)
Fixed notification error (double beeps) after a process
was stopped in a breakpoint
Fixed a bug which made it impossible to save neural net
models

New in RapidMiner Studio 4.0 Beta 2 (Jan 18, 2010)

New operators:
BatchXValidation
BatchSlidingWindowValidation
AttributeCopy
ExampleSetTranspose
AssociationRuleGenerator
RelevanceTree
CHAID
Tree2RuleConverter
Removed operators:
RegressionTree (may be re-added in later releases)
Ripper (replaced by RuleLearner)
Renamed operators (old operator names are deprecated now):
ExperimentEmbedder operator was renamed
to ProcessEmbedder (see below)
ExperimentLog operator was renamed to
ProcessLog operator (see below)
API change: Renamed Experiment to Process
(the old class Experiment is still available for
compatibility reasons but deprecated)
API change: OperatorService.createOperator(Class)
is now the preferred way for operator creation
and does no longer need a cast (generics)
Added correct file encodings to all IO operators
Renamed log verbosity "minimum" to "all" and log
verbosity "maximum" to "off"
Added meaningful default and range values for the
parameters of the ParameterOptimization operators
Replaced Tip of the Day dialog by the tip in the
Welcome screen
Changed all Weka parameters to non-expert parameters
(available in beginners mode)
SVMWeighting now supports more than 2 classes
All weighting schemes now return normalized results
Completely revised tree and rule learners
Completely revised tree, cluster model, and similarity
visualization
Latest release of LibSVM integrated (2.84)
Latest release of xstream integrated (1.2.2)
Latest release of Jung integrated (2.0alpha2)
Added table view for experiment log results
Added text views for learned tree models
Added text views for learner kernel models
Added text view for logistic regression model
Added Anova kernels for JMySVM and EvoSVM
Removed obsolete temp file service
CommandLineOperator now uses a higher log
verbosity for the output of the command
Improved output of Naive Bayes models
Improved context menu for attribute editor
Example visualization now automatically added after
IdTagging
Improved standard example visualization
Bugfixes:
Added missing dialog if more than one special attribute
with the same name was defined with the ExampleSource
configuration wizard
Log view panel was not resizable
Special attributes were no longer special after
AttributeSubsetPreprocessing on special attributes
LibSVM multi class issues fixes (no confidences)
Bugfix in the fast example set to sparse transformation
causing problems in Weka learners (and maybe the LibSVM)
Dichotomization did not properly work
Parallel plotter did not properly work for special
attributes
Fixed missing Id problem for top down clusterers
Fixed wrong nominal value writing for attribute editor
Column colors were not transferred if columns were
moved in data views
The AttributeConstructionLoader did not properly
created attributes for the identical function
(no construction at all)
Normalization did not work properly work on nominal
attributes
AttributeSubsetPreprocessing did not properly keep
the old attributes
Replace operator (context menu) of operator chains
added (2) to the inner operators even if the names
were not used in the process setup
Spearman's Rho and Kendall's Tau now deliver 0 if not
defined (e.g. for default model) instead of NaN
Fixed problem with delegate attribute unwrapping in
some feature selection cases in combination with
cross validation operators

New in RapidMiner Studio 4.0 Beta (Jan 18, 2010)

"YALE" was renamed to "RapidMiner"
New operators:
DensityBasedOutlierDetection
LOFOutlierDetection
DistanceBasedOutlierDetection
PCAWeighting
SVMWeighting
Relief
InfoGainWeighting
InfoGainRatioWeighting
ChiSquaredWeighting
SymmetricalUncertaintyWeighting
PSOWeighting
FPGrowth
LinearRegression
NaiveBayes
NeuralNetLearner
LogisticRegression
DecisionStump
DecisionTree
ID3
ID3Numerical
RegressionTree
RandomTree
RandomForest
Prism
Ripper
OneR
NearestNeighbors
AdditiveRegression
Stacking
Vote
MetaCost
CostBasedThresholdLearner
Binary2MultiClassLearner
SVDReduction (from clustering plugin)
KMedoids (from clustering plugin)
KMeans (from clustering plugin)
KernelKMeans (from clustering plugin)
SupportVectorClustering (from clustering plugin)
AggomerativeClustering (from clustering plugin)
AgglomerativeFlatClustering (from clustering plugin)
UPGMAClustering (from clustering plugin)
TopDownRandomClustering (from clustering plugin)
TopDownClustering (from clustering plugin)
DBScanClustering (from clustering plugin)
RandomFlatClustering (from clustering plugin)
ExampleSet2ClusterModel (from clustering plugin)
FlattenClusterModel (from clustering plugin)
ClusterModel2ExampleSet (from clustering plugin)
ExampleSet2Similarity (from clustering plugin)
ClusterModel2Similarity (from clustering plugin)
SimilarityComparator (from clustering plugin)
Bootstrapping
WeightedBootstrapping
BootstrappingValidation
WeightedBootstrappingValidation
MissingValueImputation
ExampleSetMerge
ExampleSetCartesian
XrffExampleSource
XrffExampleSetWriter
DatabaseExampleSetWriter
IOSelector
LinearCombination
AttributeSubsetPreprocessing
ModelVisualizer
ModelUpdater
LabelTrend2Classification
Sorting
AddNominalValue
ExampleRangeFilter
Numeric2Polynominal
PartialExampleSetLearner
SlidingWindowValidation
GroupBy
GroupedANOVA
ANOVAMatrix
Aggregation
Renamed operators:
AttributeSetWriter /-Loader into
AttributeConstructionsWriter / -Loader
Renamed all operators starting with Y- into
the names without this prefix
Added W- to all Weka operators, old experiments
can be loaded though
AverageLearner (was deprecated) now revised and
renamed into AttributeBasedVote
Deprecated operators:
Numeric2Binary (use Numeric2Binominal instead)
API CHANGES: please refer to
http://sourceforge.net/forum/forum.php?thread_id=1698583&forum_id=390413 and
http://sourceforge.net/forum/forum.php?thread_id=1730986&forum_id=390413
for details
The clustering plugin is now part of the YALE core
Drag'n'Drop for operator trees
New Icons (please refer to the license files for
informations about the icons)
New Look and Feel (please refer to the license
files for informations about the look and feel)
Improved general speed, most YALE runs now use
less 60% of the runtime needed before
Added page setup and print preview dialogs
Improved printing
New file chooser and added favorites to it (in the
left part of the dialog)
Tool tips can now be painted over multiple lines
allowing more informations about the operators and
parameters
New view menu
Result History viewer showing textual descriptions
of all experiment results in the session so far; allows
also the calculations of Anova for different results
Parameter values are now always saved at focus losses
or during resizing operations
All tables (viewers) can be sorted by clicking on the
table headers (at least all tables where this makes sense)
Speed up of plotter initialization which was the reason
for the long times needed for displaying data sets
GUI is now able to immediately stop a running experiment
Improved capability to use YALE as library which makes
necessary that the Ant target "copy-resources" must be
performed before starts (see implementation details
below)
All file formats were changed (sorry!) and are now
based on XML
Grid based parameter optimization / iteration operators
now support another format for parameter definition:
[start;end;step]
XVPrediction can now also handle confidences for
problems with more than two classes
Improved automatic closing of files and temp file
deletion after major experiment changes
Added graph view for BayesianNet models
Added textual and graphical view modes for models which
are capable of both, e.g. decision trees and Bayesian Nets
Added possibility to invert the result of an ExampleFilter
Added possibility to connect several attribute value
conditions for an ExampleFilter
Added new performance criteria: Spearman's rho and
Kendall's tau
Added option for AttributeWeightsApplier allowing for
changing just the data view instead of the actual data
table
The data representation type "sparse_array" was renamed
to "double_sparse_array"
Added new data representation types "short_array",
"short_sparse_array", and "boolean_sparse_array" allowing
for more efficient data handling
The univariate Series2WindowExamples operator now again
supports sets of examples if the time series is encoded
as attributes
The (meta) data tables now support text selection allowing
for copy and paste into other applications.
Performance Vector results can now be selected and copied
Example Set views can now be selected and copied
All displayed results now provide a "Save..." button
Use JTable for confusion matrices
Use JTable for correlation matrix (DataTable)
Added HSQLDB JDBC driver
Full platform compatible line feeds
ResultWriter can now also write results into single files
instead of the global result file defined in experiment
Improved LearningCurveOperator now using better dynamically
growing training sets and a fixed test set
Allow the definition of number of digits for the
ExampleSetWriter format
Added log scale to usual scatter plotter
Added several chart plots (new bars 2D and 3D, pie charts
2D and 3D, bubble plotter)
ExampleSetWriter now support zipped data files
Added initial support for updatable models, currently
only the updatable models from Weka are supported, other
will follow
Added another replenishment type 'zero' for the
MissingValueReplenishment operator
Added source definition for all IO objects, i.e. the
results do now show which operator was the creator
(only shown in result view if more than one result of
the same type was created)
Allow complete data scan for value type guessing now
in ExampleSource configuration wizard
Added weighted performance measures for weighted means
of the per-class recalls and precisions
Model writing and loading works for zipped files (gz)
Changed attribute statistics handling and displaying
Implementation Details:
The Ant target "copy-resources" must be performed
before starts are possible
new initialization methods available Yale.init(...)
allowing the specification which parts of YALE should
be initialized
Revised database access handling. Statements are now
always closed
changed name of method getVisualisationComponent into
getVisualizationComponent
no longer necessary to register operators in an
experiment (done automatically during adding)
no longer necessary to implement the abstract
OperatorChain method getNumberOfSteps()
Completely revised the example set / attribute /
example table data core of YALE which leads to much
better implementations of the core classes and more
possibilities for extensions. Please refer to the
YALE forum for an in-depth description of the
changes
attribute statistics are now handled in a different
way, all statistics are queries now with a statistics
name string
most actions are now part of own packages
replaced shuffled partition building by a version
reflecting the way Java shuffles collections
improved efficiency of WekaInstancesAdaptor by finding
YALE weight attribute only once instead anew for each
example
removed static field in class Yale for the current
experiment
The class Main was renamed into YaleCommandLine
Added possibility to define default values for
attributes
BinaryAttribute was renamed to BinominalAttribute
Newest versions of all libraries
PropertyValueCellEditor can now be registered in
PropertyTable allowing plugins to provide new
editors for new parameter types
The same applies for PropertyKeyCellEditor
Averagable: compareTo now implemented in subclasses
Averagable: cloneAveragable(Averagable) is now
deprecated, please use copy constructors
Added ParameterTypeText for longer text inputs
XML serialization now uses object streams
Bugfixes:
IOObjectWriter / - Reader did not work for Windows
executable due to library typo
LibSVM regression models could not be saved
Bugfix in PermutationOperator which uses all
attributes of the ExampleTable instead of only using
those currently selected in the ExampleSet
Exception in list property editors after one row was
deleted
Use default GUI properties in cases where loading of
properties did not work
Colons in attribute names were not supported by the
AttributeWeightsLoader / -Writer. Replacement by XML
format fixes this problem
Percent (%) in parameter strings were replaced by
the method expandString(String) which was not desired
The new format for short commands is %{a} now
new CSV operator which better supports quoting and
column separators
fixed problem for category parameters if the check
value was a string of the index number
fixed bug for number of components = -1 in GHA models
fixed error for regular attributes with special names
when written into sparse format
Fixed bug for RVM model writing
Fixed bug for data transformation into the
association rule learning format of Weka
Removed error if a parameter for a non-existing
special attribute was in the special format of the
ExampleSetWriter

New in RapidMiner Studio 3.4 (Jan 18, 2010)

New operators:
MultivariateSeries2WindowExamples
EvolutionaryParameterOptimization
IOObjectReader
IOObjectWriter
AGA
YAGGA2
SPSSExampleSource
ExcelExampleSource
LiftChart
ROCChart
MacroDefinition (see below)
Removed operators:
NelderMeadParameterOptimization
PatternSearchParameterOptimization
Deprecated operators:
NaiveBayes, SimpleNaiveBayes, and NaiveBayesUpdateable
(replaced by Y-NaiveBayes)
LibSVM (use LibSVMLearner instead)
Changed parameters:
DatabaseExampleSource: replaced "driver", "urlprefix",
and "databasename" by "database_url" (can be easily
defined with help of the new configuration wizard, see
below)
ExampleSource now support zipped data files
Added new data representations backed up by non-double
arrays which will need less memory in case where no double
precision is needed
All IO objects also providing a loading operator are now
directly be saveable from the result tab
SimpleExampleSource is now able to automatically guess the
value types
The Attribute Editor has now some additional features:
Context menu on row: "Use row as attribute names" which
is nice for example for CSV files
Table Menu: "Guess all value types" which re-guesses
all value types which might be practical after declaring
one of the rows as names
Reminder during closing if the data file and attribute
description file were not saved before
New configuration wizards for more sophisticated input
operators like ExampleSource or DatabaseExampleSource
(available via the "Start configuration wizard..."
button of these operators)
New item in Tools menu "Show database drivers" which lists all
available JDBC drivers
JDBC drivers can now be defined via adding them to the
CLASSPATH or by copying them into lib/jdbc
Free JDBC drivers for MySQL, PostgreSQL, Microsoft SQL Server,
and Sybase included
The file resources/jdbc_properties.xml can be used to define
driver dependent settings like URL prefixes etc.
Improved the directly working on database mode (DatabaseES)
Improved data saving for ExampleSets
Added macro definitions. Macros can be defined with the
operator MacroDefinition and used with %{my_macro}. Several
predefined macros exist like %{experiment_name},
%{experiment_file}, and %{experiment_path}
The minimum and maximum colors for plotters can now be
specified in the properties dialog
Improved error messages for Weka learners and attribute
evaluators
Density and SOM plotters now support example visualization
Density and SOM plotters now use buffered images (more
efficient)
Allowing both attribute and example representations for
Series 2 Window Examples operators
Improved logging for both the message viewer and into files
Improved EvoSVM
Added several non-psd kernels for JMySVM and LibSVM as well
as support for returning the original optimization fitness
New operator dialog shows now deprecation information
Generating feature operator do now provide a parameter for
the total maximal number of attributes
PerformanceEvaluator: improved handling of input
performances
Robustified plotters in cases where the given data contain
missing values
An environment variable YALE_OPERATORS_ADDITIONAL will now
be regarded and set by the start scripts (for user written
operators)
IOConsume operator now allows deletion type "delete_all_but"
Implementation Details:
the method getInput(Class) of Operator / IOContainer
do now deliver the correctly casted instance (no casts
necessary any longer)
checkIO() of Experiment is now also able to check for
given input objects
Removed parameter number editors based on JSpinner
because of rounding and transformation problems (see
below)
Installer now uninstalls old versions
Windows launcher now allows external classpath settings
ExampleSet.getSize() is deprecated now, use size()
instead
ExampleSet.getExampleReader() is deprecated now, use
iterator() instead
Deprecation infos are now defined in operator in
operator description files
Bugfixes:
Fixed bug in Windows start scripts which did not allow
for space in filenames and paths
Attribute weighting schemes do now provide correct
error messages for missing label
IOContainer reading and writing did not work
Description of the column separators did not match
the actual implementation of ExampleSource and
SimpleExampleSource
Export did not work for unnamed experiments
Numerical parameter fields rounded to zero for small
values (only in YALE 3.3)
Better error message in case of non-decomposable data
sets in RVMLearners
SOM is now not longer applicable to data sets
containing missing values
In version 3.3 there was a problem introduced if YALE
should be started via "java -jar yale.jar" which did
no longer work without defining the property
yale.home. Should be fixed now
Additional performance criteria were not stored in XML
Added missing close statements for database handling,
prevent errors if already closed
Fixed bug during statistics calculation if a column
only contains missing values
Exception was thrown by feature binary generators if
the generated value was NaN or infinite
Fixed LibSVM model application bug for high class
skews

New in RapidMiner Studio 3.3 (Jan 18, 2010)

New operators:
Y-AdaBoost
Y-Bagging
MultiCriterionDecisionStumps
RVMLearner
Gaussian Process Learner
ExperimentEmbedder
OperatorEnabler
ExampleSetJoin
Numeric2Binary
Permutation
Removed operators:
JViToPlotter (added most important functionality
directly in YALE, other will follow)
Deprecated operators:
RenameAttribute (replaced by ChangeAttributeType
and ChangeAttributeName)
YALE is now available as exe-file for Windows systems
YALE now provides a Windows installer
Newest Weka version (CVS from 2006/08/04)
YALE now provides actual ensemble learners for more than
one inner learner
Search and Replace for XML tab
Save as "building block" in order to ease future experiment
setup
All validation operators are now able to optionally produce
the model of the complete data set
Changes log verbosity of command line operator from
MINIMUM to MAXIMUM
Overworked all parameter optimization operators
Double click on operator in tree view now toggle breakpoint
status
Users can specify a search string and capabilities in the
new operator dialog now
New operator dialog is not longer modal and provides an "add"
button. This allows for multiple operator insertions without
recreating the dialog (and its settings)
New operator tree properties allowing to filter disabled
operators or expansion of the complete tree
Debug mode which adds a breakpoint after each operator
Disabled operators are now more clearly marked
Default file extension for all IO files now
String property values will no longer be deleted when
editing is started, the value will be used after losing
the focus
Added support for automatic parameter optimization of
nominal parameters
Exceptions for feature filter (skip all ... but not ...)
(Meta) data views are now backed up by tables which are
much faster than old HTML views
Added new (high-dimensional) plotters and jitter function
for plotting, overworked old ones
More intelligent availability checks for plotters and
automatic downsampling if number of data points is too
high
Added support for plotting and logging nominal values
and parameters
Data set plotters can now also consider feature weights
Range of integer parameters now use infinity symbol
Total number of parameter combinations is now logged
(parameter optimization operators)
(Almost) all randomized operators can now use own local
random seeds
The current memory usage can now be logged as a value of
the experiment operator (root)
All internal kernel based methods now provide the same
data and plot view component
Faster conversion to Weka instances for sparse examples
Improves parameter guessing for Weka operators
Improved tutorial and added section about data creation
from Java applications
Implementation Details:
new package structure for feature operators
new package structure for operators
new package structure for GUI
new package structure for preprocessing
used now JUnit 4.1 for testing
code clean-up (no Eclipse-warnings)
ExampleTable is an interface now
copy-resources is now not longer necessary,
plugins have to place their resources in
edu/udo/cs/yale/resources
Statistics now renamed in DataTable (in new package
called datatable)
createName(...) of AttributeFactory now handles own
counters for each name
prepareRun() is now autumatically invoked and must not
be invoked any longer before the run of an experiment
Bugfixes:
Validation check did not work in all cases
escaped XML characters for attribute description file
writing
JMySVM, EvoSVM, MyKLR, and MultiModel cannot be read
from files (fixed)
Result file was not resolved against experiment
location
Tooltips for string parameters did not always have been
shown
Streams for result output were not closed
Temporary directories are now deleted at the end of
experiment if delete_temp_files is set to directly
(default)
Resize bug after changing the name of an operator
in tree view
Fixed problems if two ExampleSetGenerators with the
same target function were used in the same experiment
Removed unnecessary check during loading of sparse
examples
not all operators with inner loops did invoke
inApplyLoop()
Bugfix for IteratingOperatorChain if timeout was -1
Dynamic parameter %t did not work for filenames under
Windows
At the end of a a Pattern param opt run the result was
not properly created
Windows start scripts did not work if spaces were
part of the paths

New in RapidMiner Studio 3.2 (Jan 18, 2010)

YALE requires now JAVA 1.5 or higher
New operators:
ThresholdCreator
AttributeWeightsCreator
WeightGuidedFeatureSelection
CFSFeatureSetEvaluator
ConsistencyFeatureSetEvaluator
AttributeCounter
WeightedPerformanceCreator
CompleteFeatureGeneration
Series2WindowExamples
TransformedRegression
SimpleExampleSource
PCA (new version)
FastICA
GHA
ComponentWeights
HyperplaneProjection
SplitSVMModel
RemoveCorrelatedFeatures
WeightOptimization
TFIDFFilter
MinimalEntropyPartitioning
EvoSVM
PsoSVM
EvolutionaryFeatureAggregation
PlattScaling
SplitChain
All operator chains now define conditions which must be
fulfilled by inner operators.
New model concept: models which are used for prediction
purposes (prediction models) can now be combined with
models for preprocessing, e.g. a z-transformation model.
This allows for fairer evaluations without using information
about the training data which might have been collected
during preprocessing.
Preprocessing models, e.g. a normalization model can be applied
with the same parameters on the test set
Improved Operator Info Screen (F1) which now also shows
conditions for inner operators. This eases experiment design
for new users
PerformanceEvaluator adds new criteria to input
performance vectors now
Evolutionary feature operators supports multiobjective
optimization now
Feature operators now allow an arbitrary number of
inner operators
Added new VectorGraphics package (freehep) version 1.2.2
New Weka version 3.5.2 (current CVS version of Weka)
The attribute type "string" of Weka is now also supported
Renamed two parameters of SparseFormatExampleSource:
"attributes" is now called "attribute_description_file",
"attribute_file" is now called "data_file"
AUC as a parameter of PerformanceEvaluator instead of
ThresholdFinder
ExampleSetWriter now resolves the relative path of the
data file
Tutorial now reflects the development since Yale 3.0
More example filter types for ExampleFilter operator
Added filters for Data View
Added parameter sample_ratio to example source operators
Speed up of experiments by preventing IO logging if
not necessary
GUI does not hang any longer after stopping an
experiment and a message is shown that the current
operator will be finalized
all regression performance criteria can now handle nominal
labels regarding the confidence for the desired true class
relative_absolute_error now renamed to
normalized_absolute_error
Implementation Details:
YALE is now completely type safe, i.e. no warnings
occur by compiling with Xlint:unchecked
Population operators now work on objects of class
Individual instead of directly working on
AttributeWeightedExampleSets
Added method getSpecialAttribute(String) to
ExampleSet interface. This allows a faster retrieval
of special attributes
UndefinedParameterError will be thrown if an
operator asks for the value of a non-optional
parameter with no default value and no user
defined value
The abstract method checkIO of OperatorChain was
replaced by getInnerOperatorCondition()
Removed deprecated method initApply()
added new check method performAdditionalChecks()
reworked package structure for feature operators
improved memory management for BayesianBoosting
Replaced method getValue() of averagables (like
performance criteria) by getMikroAverage().
Operators should use getAverage() which returns
the makro average if possible and the mikro
average otherwise
Bugfixes:
Update of Data View did not properly work
ThresholdApplier did not properly overwrite the
crisp predictions
error in root mean squared error calculation for
data sets with different sizes
wrong plotting of threshold values in ROC curves
new operator was not properly selected after replacing
an operator via the context menu. Therefore the old
parameters were not removed in the GUI
LibSVM used Math.random() and was therefore not
deterministic
Replace " by " in XML parameter descriptions
In some cases the variance of a performance
criteria became negative. Fixed now.
Bug in RemoveUselessAttributes since attribute stats
were not longer calculated

New in RapidMiner Studio 3.1 (Jan 18, 2010)

New operators:
IOMultiplier
PerformanceLoader
T-Test
Anova
DataStatistics (usefull only for command line, see
implementation details)
Removed operators:
old parameter based Weka operators (were deprecated)
MultipleLabelLearner and
MultipleLabelPerformanceEvaluator (please use
MultipleLabelIterator instead)
Drastically reduced runtime (see implementation details)
Improved attribute editor (added views on data,
load series data, icons, nicer error messages)
Binary classification performance criteria mark the
positive class
Predict confidences for both binominal and polynominal
classifications tasks
Confidences are now automatically set after applying
a classification model for all learners, the parameter
use_distribution is therefore not longer supported
ExampleSetWriter can also write prediction confidences
now. The dense data format and the special format was
slightly adapted to reflect this change
Attribute ranges can also be specified in meta data view
Splitted default noise of NoiseOperator in label_noise
and default_attribute_noise
New Weka version 3.4.6 integrated
Nicer error messages for many data reading problems
IteratingPerformanceAverage can now handle all types of
averagable vectors and also more than one inner
performance vectors
The Yale color plotter shows now a legend with a mapping
of the colors to the values for these colors. This also
applies for the scatter plot based on the color plotter
Sanity checks before learning if the used learner can
learn from the given data set (using the predefined
learner capabilities)
Uses (p) for initialization probability of feature
selection algorithms instead of (1-p)
The counter for the automatic creation of attribute names
is resetted before an experiment will be started
A new breakpoint type for breakpoints in operator apply
loops
CSVExampleSource now uses the first line for attribute
names
Implementation Details:
The position of the Weka Jar file can now be defined
via an environment variable WEKA_JAR
Removed the construction of attribute weights from
example if this is not necessary (this drastically
decreases the desired time for example constructions)
Improved the calculation of example set statistics
Removed the recalculation of attribute statistics
after data changes. Statistics are now only calculated
if they are needed (including display purposes in the
graphical user interface)
Attribute is an interface now, different classes of
attributes introduced. As a consequence attributes,
can only be constructed with help of the
AttributeFactory class
Added a FastExample2SparseTransform class which
provides methods for fast sparse representation
creation, especially for SparseArrayDataRows
Removed check if an attribute is already part of an
example set before it is added. This also improves
runtime
FilteredExampleSet is now called ConditionedExampleSet
Failing during operator initialization (during start up)
does not prevent loading the following operators any
longer
Bugfixes:
Bugs in SparseArrayDataRow
Copy of IOContainer was shallow. This bug might have lead
to a wrong parameter optimization behavior for complex
feature selection experiments
Implemented missing method in ConditionedExampleSet
Fixed size bug in ConditionedExampleSet
Key strokes for cut, copy, and paste did not work
Syntax highlighting for description tag did not work
Opening a new experiment kills experiment thread now
Saving of settings did not always work
Changing from XML view to other views caused empty
status bar
Error in change detection after modifying the
experiment in XML view
Range update in data view did not work for two changes
at the same time

New in RapidMiner Studio 3.0 (Jan 18, 2010)

New operators:
FeatureNameFilter (using regular expressions)
FeatureValueTypeFilter (replaces FeatureTypeFilter)
FeatureBlockTypeFilter
operators for all Weka tasks instead of specifying
the Weka operator with a parameter (see below)
MultipleLabelLearning
MultipleLabelPerformanceEvaluator
MultipleLabelIterator
AverageBuilder
RenameAttribute (renaming and type changing)
Data generators for testing purposes
MinMaxWrapper for linear combinations of average and
minimum values (which might lead to more stable
optimizations)
CorrelationMatrix (which can also produce feature
weights)
SimpleBinDiscretization
SimpleFrequencyDiscretization
Single2Series
PerformanceWriter (in addition to the ResultWriter)
ParameterCloner
ParameterSetWriter
GridParameterOptimization (replaces old ParameterOpt.)
NelderMeadParameterOptimization
PatternParameterOptimization
ParameterIteration (which simply iterates through
given parameter combinations instead of optimize them)
IOConsumer (consumes unused outputs)
ARFFWriter
WrapperXValidation (replaces old MethodXValidation)
SimpleWrapperValidation (replaces old
SimpleMethodValidation)
NominalExampleSetGenerator
JViToPlotter (additional to build in plotters)
Removed operators:
The external operators for the C versions of MySVM,
SVMLight, and C45 are not longer part of the Yale
core. Please use the Java implementations JMySVM,
LibSVM, and J48
LegalNumberExampleFilter was replaced by the
operator ExampleFilter. This operator can handle both
missing values and user defined value conditions
MethodXValidation was replaced by WrapperXValidation.
The old operator was not able to handle mere feature
weighting methods additional to selection
ParameterOptimization (see above). In addition, the
parameter parameter_file was removed from all
parameter optimization operators
SimpleMethodValidation (see above)
FeatureTypeFilter was replaced by an improved
FeatureValueTypeFilter
BatchedValidationChain
Improved data management and statistics. Yale can handle
larger data sets now
Undo and Redo function
Several new performance criteria including MinMaxCriterion
for weighted linear combinations of the minimum and
the average of arbitrary criteria
Some operators are deprecated now. Deprecated operators
provide messages during application and validation and
should not longer be used
New plotter concept, introducing Yale color plotter,
GnuPlotPlotter for 3D plots, scatter plots, and
distribution plotter (histograms). Plots are only
automatically created for smaller data sets (settings)
In addition to the new plotter concept the operator
JViToPlotter can be used to plot some of the IOObjects
of Yale. The current version at least supports ExampleSet
and some numerical models
Syntax highlighting in message viewer and XML editor,
colors can be specified in the preferences dialog
New Weka version 3.4.5 integrated
New LibSVM version 2.8 integrated
Generic operator classes and operator sub types. This
allows the building of generic operators with one class
for several operators. This feature is used for the new
Weka operator style where each learning scheme matches
one Yale operator (and not a parameter of an operator)
Added Learner Capabilities. Each learning scheme can now
define which type of data set is supported by the learner
Added stratified sampling for cross validation on data
with a categorical label. This ensures that the subsets
provide the same class distribution than the whole data
set
Added several additional selection and crossover schemes
for evolutionary feature operators.
Learners and performance evaluators can now deliver
the input example set as output if this is desired.
This also applies for models and ModelApplier.
New structure of settings dialog
(Optional) Tip of the Day at startup
Automatical update check during start-up (once in a month,
no personal data is transmitted or collected).
Command line version waits at breakpoints and can be
resumed by pressing enter
Only a user defined amount of lines will be logged,
the default is 1000. This value can be changed in the
settings dialog
Since massive logging may slow down experiments the
default log verbosity for new experiments is "init"
Removed some verbosity levels which were not frequently
used
Plugins can also provide a GenericOperatorFactory in their
operator description file which can be used to register
additional generic operators
Improved operator group structure in GUI and package
structure
Improved Javadoc documentation, at least all classes
should have a class comment
Learners cannot write the model directly into a file any
longer. Please use the operator ModelWriter for this
purpose.
Implementation details:
ATTENTION: Since operators should know their own
operator description the usage of the empty operator
constructor is not longer allowed. Operators must be
created with
OperatorService.createOperator(String name)
The usage of empty operator constructors is not longer
allowed for operator creation!
Using Arff loader from Weka instead of KDB package
Changed the method name getIdAttribute() to getId()
in ExampleSet, some methods from Example were removed
Added a copy method to Parameters
It is now possible to query examples by their id
It is also possible to query examples by their index.
This is only recommended for memory based example
tables and should not be used for iteration purposes.
Each operator which must iterate through complete
example sets should use ExampleReaders. However,
this change allows Yale to construct Weka instances
on the fly which drastically decreases memory usage
Operators can now define the default behavior for
input consumption and a parameter will be
automatically defined and queried. This allows that
some operators (like validation chains or performance
evaluators) can pass their input (the example set
for example) to the following operators
Added two helper methods getDeliveredOutputClasses()
and getAllOutputClasses(Class[] innerOutput). One
of these methods should be used to return the
delivered output of an operator chain at the end of
checkIO(). These methods reflect the consumation
behaviour changes. Please refer to the Yale tutorial
for further informations.
The implementation of the simple feature selection
operators was improved. The memory usage is reduced
especially in case of forward selection
SparseArrayDataRows need less memory than
SparseMapDataRows with the same runtime. This
datamanagement type should be used if data is
sparser than 50%
Using sparse array data rows after Nominal2Binary
filtering
Bugfixes:
bug in unix start scripts (plugins were not properly
loaded)
variance adaption in feature weighting
wrong conversion from Weka instances to Yale example
sets for data sets with more attributes than examples
Bug in average handling of validation operators mixed
up weights and performance values for some feature
operation experiments
strange plotting of some example sets
validation of experiments containing disabled
operators
fixed bug in database handling which prevents feature
selection to work correctly on example sets based on
databases (csv and dBase too)

New in RapidMiner Studio 2.4.1 (Jan 18, 2010)

New in RapidMiner Studio 2.4 (Jan 18, 2010)

New operators:
LearningCurveOperator,
StandardDeviationWeighting
PrincipalComponents,
WekaAttributeWeighting
C45ExampleSource
Obfuscator,
Deobfuscator
CorpusBasedWeighting
Removed operators: UPGMAClusterer and WekaClusterer are
now part of the Clustering plugin
Changed operators: the former implementation of
DecisionTreeLearner was removed since it was not able to
produce pruned decision trees. The internal representation
of Weka's J48 learner which was formerly known as Y45Learner
is now named DecisionTreeLearner.
Splitting of KDBExampleSource operator in four operators
which individually load ARFF, csv, bibtex, and dBase
files.
The parameter "mean_variance_scaling" of the
normalization operator is no longer of type category but
of type boolean.
The parameter $v[name] of the special format of
ExampleSetWriter can now be used for both regular and
special attributes
accuracy and classification error are now calculated for
both binary and multiclass problems. Additionally the
confusion matrix is displayed.
ThresholdFinder can deliver AUC (area under the ROC curve).
The maximum number of ROC-points which are plotted is
limited to 200.
All results are presented with the same number of digits
Forward selection (FeatureSelectionOperator) initially
checks if the used attribute are useless, i.e. all values
are equal, before it creates a new example set based on
this attribute.
Validation chains which split example sets recalculate
the attribute statistics. Therefore for each iteration
one data scan is performed. These additional costs are
paid to clearify the values and eases the usage of inner
operators which make use of the statistics
Implementation details:
recalculation of attribute statistics can be done
directly with a method from example set now instead
of the example table
The method initApply() of operator is now deprecated
The method getSpecialAttribute(String) of ExampleSet
was removed. Use getAttribute(String) for both regular
and special attributes.
Bugfix:
data writing of the experiment log operator at the end
of the experiment
statistics plot is removed at the beginning of a new
experiment
Y45Learner (now named DecisionTreeLearner) did not
allow to create unpruned trees
newline at the end of data files can now be omitted

New in RapidMiner Studio 2.3.3 (Jan 18, 2010)

New in RapidMiner Studio 2.3.2 (Jan 18, 2010)

New in RapidMiner Studio 2.3.1 (Jan 18, 2010)

New in RapidMiner Studio 2.3 (Jan 18, 2010)

New in RapidMiner Studio 2.2 (Jan 18, 2010)

New in RapidMiner Studio 2.1.1 (Jan 18, 2010)

New in RapidMiner Studio 2.1 (Jan 18, 2010)

New in RapidMiner Studio 2.0.3 (Jan 18, 2010)

New in RapidMiner Studio 2.0.2 (Jan 18, 2010)

New in RapidMiner Studio 2.0.1 (Jan 18, 2010)

New in RapidMiner Studio 2.0 (Jan 18, 2010)

New in RapidMiner Studio 2.0 Beta 2 (Jan 18, 2010)

Operators for concept drift simulation experiments and
several time window management and example weighting
approaches were added (see operators "ConceptDriftSimulator",
"BatchWindowLearner", "BatchWeightLearner").
Renamed global experiment parameter 'keep_temp_files' to
'delete_temp_files'.
LegalNumberExampleFilter replaced by more general operator
ExampleFilter. By implementing ConditionExampleReader.Condition
users can specify arbitrary conditions.
SparseFormatExampleSource: New parameter "attributes" allows
for an attribute description file similar to the ExampleSource.
If the old behaviour is desired, the parameter "format" must be
set to "separate_file".
DatabaseExampleSource: Separate query file replaced by new
parameters ("username", "databasename", ...). In case of long
queries the query (and only the query) can still be read from a
separate file (still specified by "query_file"). If the password
should not be written to the config file, it is queried when
needed.
Yale can now directly work on databases without copying the data
to memory first (alpha version!!!). If this behavior is desired,
the parameter "work_on_databases" must be set to true and the
parameter "table_name" must be the name of an existing table. Be
careful with this option since it will change the database.
FeatureGeneration: New parameter list "functions" allows
specification of attribute generation and selection in config
file. Was formerly specified in separate file (still working).
ExampleSetWriter: Output in sparse format and arbitrary user
defined format now possible.
PerformanceEvaluator: "comparator_class" allows for user defined
comparators of performance measures
Performance criteria measure micro and makro average and variance.
Special "id" attribute now supported ( tag in attribute
description files).
Memory of unused attributes (e.g. intermediately generated
attributes, predicted labels in crossvalidations) freed.
Weka models can now be displayed graphically.
Implementation details:
JUnit tests added
UserError introduced and exception handling improved.
Refactoring eases extensibility for user defined
custom operators.
Tutorial operator description automatically generated
from the JavaDoc comments in the operator source code
and the operator self description.