myExperiment - Workflows

RapidMiner

Uploader

Sebastian land

Using Remember / Recall for "tunneling" re... (1)

Download

This process shows how Remeber and Recall operators can be used for passing results from one position to another position in the process, when it's impossible to make a direct connection. This process introduces another advanced RapidMiner technique: The macro handling. We have used the predefined macro a, accessed by %{a}, that gives the apply count of the operator. So we are remembering each application of the models that are generated in the learning subprocess of the Split validation. Af...

Created: 2010-04-29 | Last updated: 2012-01-16

RapidMiner

Uploader

Matej MihelÄiÄ‡

Operator testing workflow (1)

Download

This workflow is used for operator testing. It joins dataset metafeatures with execution times and performanse measures of the selected recommendation operator. In the Extract train and Extract test Execute Process operator user should open Metafeature extraction workflow. In the Loop Operator train/test data are used to evaluate performanse of the selected operator. Result is remebered and joined with the time and metafeature informations. This workflow can be used both for Item Recommend...

Created: 2012-01-29

Credits: Matej MihelÄiÄ‡ Matko BoÅ¡njak

RapidMiner

Uploader

Matej MihelÄiÄ‡

Model saving workflow (RP) (1)

Download

This workflow trains and saves model for a selected rating prediction operator.

Created: 2012-01-29 | Last updated: 2012-01-30

Credits: Matej MihelÄiÄ‡

RapidMiner

Uploader

Matej MihelÄiÄ‡

Model testing workflow (RP) (1)

Download

This workflow measures performance of three models. Model learned on train data and upgraded using online model updates. Model learned on train data + all query update sets. Model learned on train data only.

Created: 2012-01-29 | Last updated: 2012-01-30

Credits: Matej MihelÄiÄ‡

RapidMiner

Uploader

Shaily

Connect to twitter and analyze the key words (1)

Download

Hi All, This workflow connects RapidMiner to Twitter and downloads the timeline. It then creates a wordlist from the tweets and breaks them into key words that are mentioned in the tweets. You can then visualize the key words mentioned in the tweets. This workflow can be further modified to review various key events that have been talked about in the twitterland. Do let me know your feedback and feel free to ask me any questions that you may have. Shaily web: http://advanced-analyti...

Created: 2010-07-26 | Last updated: 2010-07-26

RapidMiner

Uploader

Sebastian land

Looping over Examples for doing de-aggrega... (1)

Download

This process is based on (artificially generated) data that looks like it has been aggregated before. The integer attribute Qty specifies the quantity of the given item that is represented by the rest of the example. The process now loops over every example and performs on each example another loop, that will append the current example to a new example set. This example set has been created as empty copy of the original example set, so that the attributes are equally. To get access to and rem...

Created: 2010-04-29

RapidMiner

Uploader

Matej MihelÄiÄ‡

Iterate through datasets (1)

Download

This is a dataset iteration workflow. It is a part of Experimentation workflow for recommender extension. Loop FIles operator iterates through datasets from a specified directory using read aml operator. Only datasets specified with a proper regular expression are considered. Train and test data filenames must correspond e.g (train1.aml, test1.aml). In each iteration Loop Files calles specified operator testing workflow with Execute subprocess operator. Informations about training and t...

Created: 2012-01-29

Credits: Matej MihelÄiÄ‡ Matko BoÅ¡njak

RapidMiner

Uploader

Matej MihelÄiÄ‡

Metafeature extraction (1)

Download

This is a metafeature extraction workflow used in Experimentation workflow for recommender extension operators. This workflow extracts metadata from the train/test datasets (user/item counts, rating count, sparsity etc). This workflow is called from the operator testing workflow using Execute Process operator.

Created: 2012-01-29 | Last updated: 2012-01-30

Credits: Matko BoÅ¡njak

RapidMiner

Uploader

Matej MihelÄiÄ‡

Data iteration workflow (RP) (1)

Download

This is a data iteration workflow used to iterate throug query update sets.

Created: 2012-01-29

Credits: Matej MihelÄiÄ‡ Matko BoÅ¡njak

RapidMiner

Uploader

tomS

Transforming user/item description dataset... (1)

Download

This workflow provides transformation of an user/item description attribute set, into a format required by attribute based k-NN operators of the Recommender extension. See: http://zel.irb.hr/wiki/lib/exe/fetch.php?media=del:projects:elico:recsys_manual_v1.1.pdf to learn about formats of datasets required by Recommender extension.

Created: 2012-01-30

RapidMiner

Uploader

Matko BoÅ¡njak

Random recommender (1)

Download

This process does a random item recommendation; for a given item ID, from the example set of items, it randomly recommends a desired number of items. The purpose of this workflow is to produce a random recommendation baseline for comparison with different recommendation solutions, on different retrieval measures. The inputs to the process are context defined macros: %{id} defines an item ID for which we would like to obtain recommendation and %{recommender_no} defines the required number of ...

Created: 2011-03-15 | Last updated: 2011-03-15

RapidMiner

Uploader

Simon Fischer

RCOMM 2011 Challenge 2: Vodka or President? (1)

Download

This is a solution for Challenge 2 of the a live data mining process design competition "Who Wants to be a Data Miner" held at RCOMM 2011 in Dublin. Those of you who loved "You Don't Know Jack" will remember this task: To tell whether a certain word is the name of a vodka or the name of a leader of the Soviet Union. The RapidMiner process was allowed to download data from Wikipedia to make this decision. One input file contains a list of words for which two attributes "Vodka" or "Leader" wi...

Created: 2011-11-02

RapidMiner

Uploader

Simon Fischer

RCOMM Challenge 3: Fibonacci Numbers (Inte... (1)

Download

At the RComm 2010 (www.rcomm2010.org), an unusual competition was held. Titled "Who Wants to Be a Data Miner", three challenges were issued to the participants of the conference. In all challenges, participants had to design RapidMiner processes as quickly as possible. This is the original solution I had in mind for Challenge 2: "Fibonacci Numbers". It defines a macro n, recurses by applying itself using an "Embed Process" operator on n-1 and n-2, appends the results (so the length is F(n-1)...

Created: 2010-09-17 | Last updated: 2010-09-17

RapidMiner

Uploader

Sebastian Loh

Change Class Distribution of Your Training... (1)

Download

This example process shows how to change the class distribution of your training data set (in this case the training data is what ever comes out of the "myData reader"). The given training set has a distribution of 10 "Iris-setosa" examples, 40 "Iris-versicolor" examples and 50 "Iris-virginica" examples. The aim is to get a data set which has the class distribution for the label, lets say 10 "Iris-setosa", 20 "Iris-versicolor" and 20 "Iris-virginica. Beware that this may change some propert...

Created: 2011-01-21 | Last updated: 2011-01-21

RapidMiner

Uploader

Juk

kddcup98 direct marketing (1)

Download

RapidMiner supports Meta Learning by embedding one or several basic learners as children into a parent meta learning operator. In this example we generate a data set with the ExampleSetGenerator operator and apply an improved version of Stacking on this data set. The Stacking operator contains four inner operators, the first one is the learner which should learn the stacked model from the predictions of the other four child operators (base learners). Other meta learning schemes like Boosting ...

Created: 2012-03-15

RapidMiner

Uploader

Simon Fischer

RCOMM Challenge 3: Fibonacci Numbers (Impr... (1)

Download

At the RComm 2010 (www.rcomm2010.org), an unusual competition was held. Titled "Who Wants to Be a Data Miner", three challenges were issued to the participants of the conference. In all challenges, participants had to design RapidMiner processes as quickly as possible. This is the winning process of Challenge 2: "Fibonacci Numbers" by Matko BoÅ¡njak. This was the task: The n-th Fibonacci number is F(n)=F(n-1)+F(n-2), and F(0)=0, F(1)=1. Create a process that creates an example set with F(n)...

Created: 2010-09-17 | Last updated: 2010-09-17

RapidMiner

Uploader

Sebastian Loh

2. Getting Started: Retrieve and Apply a M... (1)

Download

This getting started process demonstrates how to load (retrieve) a model from the repository and apply it to a data set. The result is a data set (at the lab output for "labeled data" ) with has a new "prediction" attribute which indicated the prediction for each example (ie. row/record). You will need to adjust the path of the retrieve data operator to the actual location where the model is stored by a previews execution of the "1. Getting Started: Learn and Store a...

Created: 2011-01-17 | Last updated: 2011-01-19

RapidMiner

Uploader

Matko BoÅ¡njak

Content based recommender system template (1)

Download

As an input, this workflow takes two distinct example sets: a complete set of items with IDs and appropriate textual attributes (item example set) and a set of IDs of items our user had interaction with (user example set). Also, a macro %{recommendation_no} is defined in the process context, as a required number of outputted recommendations. The first steps of the workflow are to preprocess those example sets; select only textual attributes of item example set, and set ID roles on both of th...

Created: 2011-05-05 | Last updated: 2011-05-09

Credits: Matko BoÅ¡njak Ninoaf

Attributions: Datasets for the pack: RCOMM2011 recommender systems workflow templates

RapidMiner

Uploader

Simon Fischer

Crossvalidation with SVM (1)

Download

Performs a crossvalidation on a given data set with nominal label, using a Support Vector Machine as a learning algorithm. Inside the cross validation, the first subprocess generates an SVM model, and the second subprocess evaluates it. applying it on a so-far unused subset of the data and counting the misclassifications.

Created: 2010-04-29

RapidMiner

Uploader

Matko BoÅ¡njak

Download

This process executes the recommendation based on item to item similarity matrix. The inputs to the process are context defined macros: %{id} defines an item ID for which we would like to obtain recommendation and %{recommender_no} defines the required number of recommendations. The process internally uses an item to item similarity matrix written in pairwise form (id1, id2, similarity). The process essentially filters out appearances of the required ID in both of the columns of the pairwis...

Created: 2011-03-15 | Last updated: 2011-03-15

RapidMiner

Uploader

Matko BoÅ¡njak

Collaborative filtering recommender (1)

Download

This process executes a collaborative filtering recommender based on user to item score matrix. This recommender predicts one user’s score on some of his non scored items based on similarity with other users. The inputs to the process are context defined macros: %{id} defines an item ID for which we would like to obtain recommendation and %{recommender_no} defines the required number of recommendations and %{number_of_neighbors} defines the number of the most similar users taken into a...

Created: 2011-03-15 | Last updated: 2012-03-06

RapidMiner

Uploader

Mikolaj Morzy

PrzykÅ‚ad metody Stacking (1)

Download

PoniÅ¼szy przepÅ‚yw pokazuje wykorzystanie operatora Stacking do tworzenia meta-klasyfikatorów. Operator Stacking pozwala na zagnieÅ¼dÅ¼enie dowolnej liczby modeli bazowych, które bÄ™dÄ… równolegle uczone na zbiorze uczÄ…cym. Drugim operatorem zagnieÅ¼dÅ¼onym jest model klasyfikatora, który uczy siÄ™ na odpowiedziach modeli bazowych (czyli buduje model modeli odpowiedzi). W przykÅ‚adzie jako modele bazowe wykorzystano: drzewo decyzyjne, algorytm k-NN, sieÄ‡ neurono...

Created: 2011-05-25 | Last updated: 2011-05-25

RapidMiner

Uploader

Ingo Mierswa

Iterate over Attribute Subsets and Store A... (1)

Download

This process iterates over all possible feature subsets and stores a) the names of all attribute subsets, b) the number of used features, and c) the achieved performance in a log table which can then be further analyzed.

Created: 2011-07-07

RapidMiner

Uploader

Lawrynka

Semantic clustering (with k-medoids) of SP... (1)

Download

The workflow uses RapidMiner extension named RMonto (http://semantic.cs.put.poznan.pl/RMonto/) to perform clustering of SPARQL query results based on chosen semantic similarity measure. Since the semantics of the backgound ontology is used in this way, we use the name "semantic clustering". The SPARQL query is entered in a parameter of "SPARQL selector" operator. The clustering operator (k-medoids) allows to specify which of the query variables are to be used as clustering criteria. If more ...

Created: 2012-01-29

RapidMiner

Uploader

Lawrynka

Semantic clustering (with AHC) of SPARQL q... (1)

Download

The workflow uses RapidMiner extension named RMonto (http://semantic.cs.put.poznan.pl/RMonto/) to perform clustering of SPARQL query results based on chosen semantic similarity measure. The measure used in this particualr workflow is a kernel that exploits membership of clustered individuals to OWL classes from a background ontology ("Common classes" kernel from [1]). Since the semantics of the backgound ontology is used in this way, we use the name "semantic clustering". ...

Created: 2012-01-29 | Last updated: 2012-01-29

RapidMiner

Uploader

Lawrynka

Semantic clustering (with alpha-clustering... (1)

Download

The workflow uses RapidMiner extension named RMonto (http://semantic.cs.put.poznan.pl/RMonto/) to perform clustering of SPARQL query results based on chosen semantic similarity measure. The measure used in this particualr workflow is a kernel that exploits membership of clustered individuals to OWL classes from a background ontology ("Epistemic" kernel from [1]). Since the semantics of the backgound ontology is used in this way, we use the name "semantic clustering". This ...

Created: 2012-01-29 | Last updated: 2012-01-30

RapidMiner

Uploader

Lawrynka

Loading OWL files (RDF version of videolec... (1)

Download

The workflow uses RapidMiner extension named RMonto (http://semantic.cs.put.poznan.pl/RMonto/). Operator "Build knowledge base" is responsible for collecting data either from OWL files or SPARQL endpoints or RDF repositories and provide it to the subsequent operators in a workflow. In this workflow it is parametrized in this way, that is builds a Sesame/OWLIM repository from the files specified in "Load file" operators. Paths to OWL files are specified as parameter va...

Created: 2012-01-29 | Last updated: 2012-01-29

RapidMiner

Uploader

Simon Fischer

RCOMM Challenge 1: 99 bottles of beer (1)

Download

At the RComm 2010 (www.rcomm2010.org), an unusual competition was held. Titled "Who Wants to Be a Data Miner", three challenges were issued to the participants of the conference. In all challenges, participants had to design RapidMiner processes as quickly as possible. This is the winning process of Challenge 1: "99 bottles of beer" by Sebastian Land. This was the task: Design a process that produces an example set the rows of which form the lyrics of the well-known song "99 bottles of beer...

Created: 2010-09-17

RapidMiner

Uploader

Ingo Mierswa

Extended Operations for Nominal Values (1)

Download

This process shows examples for the extended operations for nominal values coming with one of the next RapidMiner updates (5.0.011 or 5.1.000). The operations are performed with the operator "Generate Attributes" and can be used directly within the expressions for the new attributes. The supported functions include Number to String [str(x)], String to Number [parse(text)], Substring [cut(text, start, length)], Concatenation [concat(text1, text2, text3...)], Replace [replace(text, what, by)],...

Created: 2010-10-05

RapidMiner

Uploader

Matko BoÅ¡njak

Content based recommender (1)

Download

This process is a special case of the item to item similarity matrix based recommender where the item to item similarity is calculated as cosine similarity over TF-IDF word vectors obtained from the textual analysis over all the available textual data. The inputs to the process are context defined macros: %{id} defines an item ID for which we would like to obtain recommendation and %{recommender_no} defines the required number of recommendations. The process internally uses an example set of...

Created: 2011-03-15 | Last updated: 2011-03-15

RapidMiner

Uploader

Jedrzej Potoniec

Semantic clustering with k-Medoids and ALC... (1)

Download

This workflow loads data from a configuration file for DL-Learner (http://dl-learner.org) and uses ALCN Semantic Kernel [1] to cluster those data with k-Medoids algorithm. [1] N. Fanizzi, C. d’Amato, F. Esposito. Learning with Kernels in Description Logics. ILP 2008

Created: 2012-05-30 | Last updated: 2012-06-07

RapidMiner

Uploader

Ingo Mierswa

Evaluating Multiple Models with Looped X-V... (1)

Download

This process shows how multiple different models can be evaluated with cross validation runs. This allows for the comparison of, for example, three prediction model types (e.g., ANN, SVM and DT) all at once under a x-fold cross validation. Having said that, the process actually performs the same cross validation several times using each time a different modeling scheme. It makes use of loops, collections, subprocess selection and macros and is therefore also an interesting showcase for more ...

Created: 2010-05-15

RapidMiner

Uploader

Ingo Mierswa

Pivoting with Consideration of Example Wei... (1)

Download

This process shows how the Pivoting operator can also consider example weights encoded by a special attribute with the role 'weight'. The first operators only create a demo data set for this purpose, the actual pivoting is performed by the last operator only. I recommend to place a breakpoint before the Pivoting and check the data (structure) in order to see how the Pivoting handles the weights.

Created: 2010-07-07

RapidMiner

Uploader

Awc

Find and replace missing values with other... (1)

Download

Sometimes, it is helpful to replace a numerical attribute containing a missing value with a value from another attribute in the example row rather than using a constant such as an average or maximum value. This process generates some missing values, detects them and replaces with a calculation based on the value of another attribute. (Purists will be alarmed at the mathematical impossibility of taking the square root of a negative number). The process detects missing values by subtracti...

Created: 2010-08-25 | Last updated: 2010-08-25

RapidMiner

Uploader

Simon Fischer

Cross tabulation via aggregation and pivoting (1)

Download

Creates a contingency table using the Aggregate and Pivot operators.

Created: 2010-08-25 | Last updated: 2010-08-25

RapidMiner

Uploader

Sebastian land

Constructing user defined linear regressio... (1)

Download

This process shows how one can construct a user defined Linear Regression model using the execute script operator

Created: 2010-09-02 | Last updated: 2010-09-02

RapidMiner

Uploader

Alexbeo

111 (1)

Download

This process shows how several different classifiers could be graphically compared by means of multiple ROC curves.

Created: 2010-09-04 | Last updated: 2010-09-04

RapidMiner

Uploader

Alexbeo

222 (1)

Download

This process shows how several different classifiers could be graphically compared by means of multiple ROC curves.

Created: 2010-09-04

RapidMiner

Uploader

Alexbeo

11111 (1)

Download

Reads collections of text from a set of directories, assigning each directory to a class (as specified by parameter text_directories), and transforms them into a TF-IDF or other word vector. Finally, an SVM is applied to model the input texts.

Created: 2010-09-04

RapidMiner

Uploader

Sebastian land

Defining positive class with Remap Binominal (1)

Download

This process shows how one can use the Remap Binominal operator to define which label value is treated as the positive class.

Created: 2010-05-11

RapidMiner

Uploader

Sebastian land

Setting an attribute value in a specific E... (1)

Download

This process will first generate an artificial dataset and tag it with ids. Then a Filter Examples Operator is used to get a dataset with exactly one example identified by it's id. Then a value is set in this example. Since the change in the data will be reflected in all views of the example set, a simply copy is passed by to the process' output. If you take a look at the attributes of the example with id 5, you will find the 12 there.

Created: 2010-05-14

RapidMiner

Uploader

Ingo Mierswa

Pivoting (1)

Download

This process shows the basics of Pivoting. A data set with three columns is loaded and partially generated. Afterwards, the data is rotated and missings are replaced by zero.

Created: 2010-05-15

RapidMiner

Uploader

Ingo Mierswa

Convert Nominal to Binominal to Numerical (1)

Download

This is a standard preprocessing subprocess taking nominal (categorical) attributes and introduces binominal dummy attributes before those are transformed to numerical which can be then used by learning schemes like SVM or Logistic Regression.

Created: 2010-05-15

RapidMiner

Uploader

Ingo Mierswa

Discard Attribute with More than x% Missin... (1)

Download

This process loops over all attributes and calculates the fraction of missings for each attribute. If this fration is larger than the fraction defined in the first "Set Macro" operator (macro: max_unknown), the attribute will be removed from the example set.

Created: 2010-05-15

RapidMiner

Uploader

Sebastian Loh

Write/Store Correlation Matrix to an Excel... (1)

Download

This process demonstrates how to write/store/export the correlation matrix of the "Correlation Matrix" operator to an Excel file. One need to generate a Report with the "Generate Report" operator and export the Report to an Excel file.

Created: 2010-05-19

RapidMiner

Uploader

Sebastian Loh

Apply Same Preprocessing to Learning and V... (1)

Download

This process demonstrates how to apply the same prepossessing work flow to learning data and test/validation data. Due to overview reasons the preprocessing is hidden in a Preprocessing subprocess. To perform the same preprocessing work flow on the learning and testing data, both are collected to a Collection of Data. Then the "Loop Collection" operator loops over each collection. The actual preprocessing is done inside the "Loop Collection" operator. In the example only a "Rename" is done...

Created: 2010-05-19

RapidMiner

Uploader

Sebastian Loh

Filter Wrong Predicted/Classified Samples (1)

Download

This process demonstrates how to filter validation samples which are predicted incorrectly. Thee key operator is "Filter Examples" that is set up to "wrong_predictions"

Created: 2010-05-19 | Last updated: 2010-06-08

RapidMiner

Uploader

Sebastian Loh

Exclude a Attribute/Variable (1)

Download

This simple process demonstrates how to exclude a attribute or set of attributes from an example set/data set. In particular the first "Select Attributes" filter excludes the Outlook attribute. The second filter excludes subset {Outlook, Humidity}.

Created: 2010-05-20

RapidMiner

Uploader

Haddock

Missing value count (1)

Download

This workflow enables users to filter examples by the number of missing values they contains; it inserts the numeric attribute 'Missings' - representing, for each example, the count of attributes with missing values.

Created: 2010-05-27 | Last updated: 2010-07-13

RapidMiner

Uploader

Andrea coletta

Linear Regression of Italian bookshops se... (1)

Download

Someone could help me? Why the correlation between the actual value and the predictive value of the attribute "Quantita" is so low?

Created: 2010-05-28