Content based recommender system template

Created: 2011-05-05 21:06:32      Last updated: 2011-05-09 13:40:24

As an input, this workflow takes two distinct example sets: a complete set of items with IDs and appropriate textual attributes (item example set) and a set of IDs of items our user had interaction with (user example set). Also, a macro %{recommendation_no} is defined in the process context, as a required number of outputted recommendations.

The first steps of the workflow are to preprocess those example sets; select only textual attributes of item example set, and set ID roles on both of the example sets to the corresponding ID attributes. After the preprocessing, we exclude from the item example set all the items user had interaction with (i.e. we assume the user will not wish to see them recommended), and we analyze the rest using text mining analysis operators. The text is analyzed by first transforming it to the lower case, then tokenizing it by splitting on all non-letter characters, and then filtering the resulting tokens with respect to token length; filter out tokens shorter than 3 and longer than 25 letters. The last steps are stopword filtering (using English stopwords) and stemming using Porter stemmer algorithm. In the end, tokens are discarded if they show up in too few (less than 1%) or too many (more than 30%) documents. The result of the analysis is an example set of TF-IDF vectors of the items, and a bag of words obtained through this analysis.

This bag of words is then used for the textual analysis of the combined text from all the descriptions on the items the user had interaction with. The result of this analysis is also an example set of TF-IDF vectors, this time of our user's items, which is then used to calculate similarities to all the other items in the database. The final step, executed inside "Postprocessing" subprocess is to pick out the items that yield the highest similarity and format it into a new example set consisting of a recommendation and its similarity score.

 

Reference:

Bošnjak, M., Antulov-Fantulin, N. Šmuc, T. and Gamberger, D., Constructing recommender systems workflow templates in RapidMiner, Proceedings of the RapidMiner Community Meeting And Conference (RCOMM), 2011, Dublin, Ireland - In Press

Information Preview

Information Run

Not available


Information Workflow Components

Unavailable

Information Workflow Type

RapidMiner

Information Uploader

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (2)

(People/Groups)

Information Attributions (1)

(Workflows/Files)

Information Tags (5)

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (1)

Log in to add to one of your Packs

Information Attributed By (1)

(Workflows/Files)

Information Favourited By (0)

No one

Information Statistics

 

Citations (0)

None


Version History

In chronological order:



Reviews Reviews (0)

No reviews yet

Be the first to review!



Comments Comments (1)

Log in to make a comment

  • Monday 14 May 2012 17:42:16 (UTC)

    Where is the dataset to run this rapidminer processes?

    Thanks




Workflow Other workflows that use similar services (0)

There are no workflows in myExperiment that use similar services to this Workflow.