]>
Who Wants to be a Data Miner?
<p>One of the most fun events at the annual RapidMiner Community Meeting and Conference (RCOMM) is the live data mining process design competition "Who Wants to be a Data Miner?". In this competition, participants must design RapidMiner processes for a given goal within a few minutes. The tasks are related to data mining and data analysis, but are rather uncommon. In fact, most of the challenges ask for things RapidMiner was never supposed to do.</p>
<p>This pack contains solutions for these challenges of 2010, 2011, and 2012 where the ones from 2012 are grouped into their own pack. The tasks are quite tricky so you might learn something by looking at them.</p>
2011-11-02T17:54:06Z
2013-09-09T16:22:11Z
1575
575
RCOMM 2011 Challenge 3: RapidDraw
This is a solution for Challenge 3 of the a live data mining process design competition "Who Wants to be a Data Miner" held at RCOMM 2011 in Dublin.
The task was to generate a dataset that looks like a spiral when viewed in an appropriate plotter.
This process opens a file with three initial data points and subsequently adds more points to the data set in a loop, using macros to extract data values of the predecessors and a "Generate Attributes" operator to add new data points.
To view the result, use a "Scatter Multiple" plot, assign x and y tpo their respective axes, and click on "Points and Lines" to see the spiral.
2011-11-02T17:47:11Z
2011-11-02T17:47:11Z
548
517
RCOMM 2011 Challenge 2: Vodka or President?
This is a solution for Challenge 2 of the a live data mining process design competition "Who Wants to be a Data Miner" held at RCOMM 2011 in Dublin.
Those of you who loved "You Don't Know Jack" will remember this task: To tell whether a certain word is the name of a vodka or the name of a leader of the Soviet Union. The RapidMiner process was allowed to download data from Wikipedia to make this decision.
One input file contains a list of words for which two attributes "Vodka" or "Leader" with indicator variables are added when running the process. A second utility file contains links to Wikipedia pages on Vodka and leaders of the Soviet Union.
2011-11-02T17:44:45Z
2011-11-02T17:44:45Z
819
511
RCOMM 2011 Challenge 1: Hobbit Genealogy
<p>This is a solution for Challenge 1 of the a live data mining process design competition "Who Wants to be a Data Miner" held at RCOMM 2011 in Dublin.</p>
<p>As you certainly know, Balbo Baggins is the common ancestor of Balbo and Frodo Baggins. The file opened by the operator "Open Ancestor" contains a table with details about parentship in the Baggins family (insert a breakpoint after read CSV). Each example contains a parent and a child. Of course, the same parent can be contained multiple times in the data set. The task of Challenge 1 was to find all descendants (and their their respective levels) from Balbo.</p>
<p>This solution achieves this goal by starting with an example set with a single row (Balbo) and joining the parentship table. This yields all children of Balbo. Repeating this, all grandchildren are found. The process uses a loop to iterate down to level 6 (which is empty because no more information is provided for this level). Bilbo and Frodo are found on Levels 3 and 4, respectively.</p>
2011-11-02T17:43:08Z
2011-11-02T17:44:17Z
553
536
RCOMM Challenge 3: Fibonacci Numbers (Intended simple solution)
At the RComm 2010 (www.rcomm2010.org), an unusual competition was held. Titled "Who Wants to Be a Data Miner", three challenges were issued to the participants of the conference. In all challenges, participants had to design RapidMiner processes as quickly as possible. This is the original solution I had in mind for Challenge 2: "Fibonacci Numbers". It defines a macro n, recurses by applying itself using an "Embed Process" operator on n-1 and n-2, appends the results (so the length is F(n-1)+F(n-2)) and uses a "Branch" operator to stop the recursion. The live solution from the challenge is in workflow "RCOMM Challenge 3: Fibonacci Numbers (Live solution)".
2010-09-17T08:57:42Z
2010-09-17T08:57:42Z
584
783
RCOMM Challenge 3: Fibonacci Numbers (Improved live solution)
At the RComm 2010 (www.rcomm2010.org), an unusual competition was held. Titled "Who Wants to Be a Data Miner", three challenges were issued to the participants of the conference. In all challenges, participants had to design RapidMiner processes as quickly as possible. This is the winning process of Challenge 2: "Fibonacci Numbers" by Matko Bo????njak. This was the task:
The n-th Fibonacci number is F(n)=F(n-1)+F(n-2), and F(0)=0, F(1)=1. Create a process that creates an example set with F(n) rows where n can be defined as a macro of the process.
This solution does even more than the intended result: It creates an example set where row i contains F(i).
2010-09-17T08:56:32Z
2010-09-17T08:56:32Z
614
782
RCOMM Challenge 2: Broken Iris
At the RComm 2010 (www.rcomm2010.org), an unusual competition was held. Titled "Who Wants to Be a Data Miner", three challenges were issued to the participants of the conference. In all challenges, participants had to design RapidMiner processes as quickly as possible. This is the winning process of Challenge 2: "Broken Iris" by Nico Piatkowski. This was the task:
You are given a decision tree model (M) designed on the well-known Iris data set and unlabelled data (U) on which the model is to be applied. Unfortunately, the unlabelled data set misses one of the four original attributes (a4), so the model is not immediately applicable. Even worse, you also cannot recreate the model, since in last night's database crash the label column of the original labelled data set (L) was lost. However, this example set contains all four columns. The task is: Given M, U, and L, find a way to apply M to U and convince the audience that this does something useful. (The audience does not accept just re-creating the missing column and filling it with constant or random values.)
The solution uses L to create a regression model that predicts a4 from a1, a2, and a3 and uses this to add an attribute a4 to U. Then, M can be applied to U.
myExperiment workflow "RCOMM Challenge 2: Broken Iris (Preparation)" creates the original input M, U, and T from RapidMiner's Iris sample.
2010-09-17T08:55:15Z
2010-09-17T08:55:15Z
782
764
RCOMM Challenge 2: Broken Iris (Preparation)
This workflow creates the input for the RCOMM 2010 Challenge 2. The solution and description are in workflow "RCOMM Challenge 2: Broken Iris"
2010-09-17T08:54:13Z
2010-09-17T08:54:13Z
537
742
RCOMM Challenge 1: 99 bottles of beer
At the RComm 2010 (www.rcomm2010.org), an unusual competition was held. Titled "Who Wants to Be a Data Miner", three challenges were issued to the participants of the conference. In all challenges, participants had to design RapidMiner processes as quickly as possible. This is the winning process of Challenge 1: "99 bottles of beer" by Sebastian Land. This was the task:
Design a process that produces an example set the rows of which form the lyrics of the well-known song "99 bottles of beer". To those who do not know the lyrics, here they are:
99 bottles of beer on the wall, 99 bottles of beer.
Take one down and pass it around, 98 bottles of beer on the wall.
98 bottles of beer on the wall, 98 bottles of beer.
Take one down and pass it around, 97 bottles of beer on the wall.
...
2010-09-17T08:51:41Z
2010-09-17T08:51:41Z
675
787
Sudoku solving with RapidMiner (Who Wants to be a Data Miner 2012?)
<p>A fun event at the annual RapidMiner conference <a href="http://rcomm2012.org">RCOMM</a> is the live data mining challenge "Who wants to be a data miner?" where contestants solve tasks data analysis tasks within a few minutes. In 2012 the task was to (partially) solve a Sudoku puzzle. Processes 1 to 3 in this pack correspond to the three tasks whereas process 0 loads the initial data and task 4 is a bonus process that solves the entire Sudoku.</p>
<p>Make sure the processes are saved under the name they have on myExperiment because some are embedded as subprocesses by later processes.</p>
<p>See the comments of the processes to see how it works. Process 4 outputs a 9x9 table that contains the solved sudoku. Process 0 will import an input data set with the initial predefined numbers.</p>
2012-09-04T16:56:44Z
2012-09-04T16:58:26Z
1305
373
RCOMM 2013 Data Mining Challenge
<p>This pack contains the solution and the input data generator for one of the tasks of the RCOMM 2013 data mining challenge "Who Wants to be a Data Miner?". Participants had to solve the task within 10 minutes. The task was this: Given</p><ol><li>a variant of the Golf data set (found in the //Samples/data folder) where the attribute Outlook is missing,</li><li>a decision tree model built on the complete Golf data set, and</li><li>a utility data set containing only the three distinct values of Golf,</li></ol><p>create an example set based on the incomplete data set from (1) containing all possible values for the Outlook attribute which are compatible with the given decision tree (2), i.e. the prediction made by the decision tree matches the true label. This is obviously a superset of the original Golf data set.</p>
2013-09-09T16:20:45Z
2013-09-09T16:24:00Z
798
314
2011-11-02T17:55:14Z
2011-11-02T17:55:14Z
2011-11-02T17:55:20Z
2011-11-02T17:55:20Z
2011-11-02T17:55:26Z
2011-11-02T17:55:26Z
2011-11-02T17:55:33Z
2011-11-02T17:55:33Z
2011-11-02T17:55:57Z
2011-11-02T17:55:57Z
2011-11-02T17:56:06Z
2011-11-02T17:56:06Z
2011-11-02T17:56:12Z
2011-11-02T17:56:12Z
2011-11-02T17:56:19Z
2011-11-02T17:56:19Z
2012-09-04T16:58:57Z
2012-09-04T16:58:57Z
2013-09-09T16:22:11Z
2013-09-09T16:22:11Z
RCOMM 2010
RapidMiner Community Meeting and Conference 2010, Dortmund
2011-11-02T17:57:25Z
2011-11-02T17:58:15Z
RCOMM 2011
RapidMiner Community Meeting and Conference 2011, Dublin
2011-11-02T17:57:34Z
2011-11-02T17:57:57Z
2011-11-02T17:56:45Z
2011-11-02T17:58:29Z
2011-11-02T17:58:42Z
2011-11-02T17:58:53Z
2011-11-02T17:59:02Z
2011-11-02T17:59:11Z
2011-11-02T17:59:47Z
2011-11-02T17:59:53Z
2011-11-02T18:00:01Z
2009-11-03T16:22:45Z
2016-01-13T17:17:36Z
2016-01-13T17:17:36Z
2009-11-03T16:23:08Z
47c6e4c812ed28d889244e703d38c18eacf54ea5
IT
Data Mining, J2EE
2012-03-28T13:42:18Z
2011-11-02T17:54:07Z
2011-11-02T17:54:07Z
2011-11-02T17:54:07Z