AIT_Matchbox_Scenariorig_dirlist_file_path00 /tmp/matchboxcollection/ 2012-11-24 11:09:19.449 UTC Path to directory on server where digital collection that we are going to analyse is located. Please note that path should contain '/' at the end. 2012-11-24 11:06:45.575 UTC matchbox_parameter00 This is one of the main matchbox parameters that are used for duplicate finding: all, extract, train, bowhist, compare. 2012-11-23 11:58:30.413 UTC all 2012-11-23 11:58:02.457 UTC resultsstderrstdoutmatchesmatchboxtarget_collection_path0parameter0STDERR00STDOUT00 This command starts duplicate finding process using the FindDuplicates python script of the matchbox tool. Matchbox tool support python in version 2.7. Execution starts from the directory where python scripts are located. If you use source code from Github, then it is a scape/pc-qa-matchbox/Python/ directory. The python script supports different parameter. Experienced user can apply extract, train, bowhist and compare parameters in order to execute associated step in the matchbox workflow for duplicate seach. The order of execution steps should not be changed, because each next step requires an output from a previous step. E.g. if you are going to repeat the comparison step you should have calculated required BOWHistogram files from bowhist step. 2012-11-23 11:50:29.432 UTC net.sf.taverna.t2.activitiesexternal-tool-activity1.4net.sf.taverna.t2.activities.externaltool.ExternalToolActivity D0A4CDEB-DD10-4A8E-A49C-8871003083D8 ffma <?xml version="1.0" encoding="UTF-8"?> <sshInvocation><sshNode><host>62.218.164.173</host><port>22</port><directory>/tmp/</directory><linkCommand>/bin/ln -s %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</linkCommand><copyCommand>/bin/cp %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</copyCommand></sshNode></sshInvocation> eb15da53-92d0-4dac-aa40-3101b4500997 cd /root/scape_training/scape/pc-qa-matchbox/Python/ python2.7 FindDuplicates.py %%target_collection_path%% %%parameter%% 1200 1800 parameter target_collection_path parameter parameter false false false windows-1252 false false false target_collection_path target_collection_path false false false windows-1252 false false false false true true 0 false net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 4 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokeparse_matchbox_stdouttarget_collection_path0matchbox_stdout0duplicates_result00duplicates_matches00net.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivity target_collection_path 0 text/plain java.lang.String true matchbox_stdout 0 text/plain java.lang.String true duplicates_result 0 0 duplicates_matches 0 0 workflow net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokematchboxtarget_collection_pathorig_dirlist_file_pathmatchboxparametermatchbox_parameterparse_matchbox_stdouttarget_collection_pathorig_dirlist_file_pathparse_matchbox_stdoutmatchbox_stdoutmatchboxSTDOUTresultsparse_matchbox_stdoutduplicates_resultstderrmatchboxSTDERRstdoutmatchboxSTDOUTmatchesparse_matchbox_stdoutduplicates_matches 65ab4417-bdf0-4e1e-9320-f639dcc62164 2012-11-24 11:15:44.116 UTC 7bb357fe-b454-45d5-805c-cfaaa3f5a8d3 2012-11-20 17:12:18.629 UTC 506fc52e-7052-4f57-8c15-83821665c03e 2012-10-02 09:56:29.495 UTC bd734d9c-3254-43b4-9442-eb1f1f7d1baa 2012-11-23 11:07:29.348 UTC 79894ee6-b109-41b7-95db-8178b7eb4121 2012-10-02 09:56:58.139 UTC c09b7d77-0242-4db7-bf11-478aff954570 2012-11-21 09:13:41.319 UTC a75c0dfe-924a-4859-be21-81c9dfdb8756 2012-11-21 09:40:26.367 UTC e26a0bd8-eb71-4fd3-9bc6-238d71792ed3 2012-10-01 12:46:11.91 UTC a9d0ee77-8030-4fde-9a34-6b2bbbda444c 2012-11-23 11:51:05.867 UTC f85240d9-2108-4057-8c42-78eef7ecf441 2012-10-01 12:27:16.65 UTC e14925cb-d66b-4e7b-a9e0-1e13c830011a 2012-11-23 11:58:27.839 UTC 9ac568e3-fef8-4bb5-b035-f3680e35e284 2012-10-02 12:28:00.33 UTC c7e45f7b-5d9b-4919-80f7-7b382337519e 2012-11-20 15:38:29.297 UTC 801043a1-195a-4b7d-8dd4-c1ebfa8d3b41 2012-10-01 12:37:43.3 UTC 0e016985-813e-49a5-936d-2f1eefdad7e4 2012-11-20 08:43:54.49 UTC dab7935c-3e34-4ca3-a11d-b1868081492f 2012-11-20 16:18:55.840 UTC ed1e8efe-d857-405a-97c7-291799a3711f 2012-11-21 09:05:08.569 UTC 0ce1f37b-c130-4473-996b-2e88763b69fc 2012-10-01 13:09:53.436 UTC 3c0afac5-efbf-4e1a-b925-c2209f616bf8 2012-11-20 17:13:31.282 UTC 66473d1e-e539-497e-874b-28f42198a8f3 2012-09-26 15:18:43.167 UTC eba169d0-fe3f-4c13-9c88-20c55a7d3287 2012-11-23 10:54:19.942 UTC 2359d4a8-331c-465a-816e-e7a8ca458fb7 2012-11-21 09:00:24.5 UTC Roman Graf 2012-11-20 09:40:50.269 UTC b97c6933-637a-4fd0-8219-744572488e87 2012-11-21 09:41:48.262 UTC e45eb43e-db18-4f67-96f0-0c8f87f9e103 2012-11-24 00:36:20.190 UTC 3d5e6fd9-92e7-4ee4-a22f-bd8e35fb9ce7 2012-11-24 11:06:47.626 UTC 716e0599-abfc-472c-b48b-c52a8da29cc9 2012-11-20 17:08:12.428 UTC dfba6ef3-8d56-43b0-80b4-6e0ab305de36 2012-11-21 09:46:39.503 UTC b917963a-ac73-4703-8029-94c4e01cb62d 2012-11-24 22:44:52.906 UTC 2a9e3405-7051-4feb-ba82-9ec74af23557 2012-10-01 12:38:49.60 UTC 9d4e0c47-0e68-48b0-abeb-7783b63e69f6 2012-11-24 00:47:36.970 UTC cee53b88-b229-42e2-828a-1c129f673db3 2012-11-21 09:55:19.903 UTC bdcaca78-f97c-413a-a163-60fc77e7a855 2012-11-20 09:50:53.743 UTC 696a6a59-a428-4203-9160-0867acbb4f3a 2012-10-01 13:32:02.56 UTC 052c4b7e-6fe0-4b79-a930-bc044bf64974 2012-11-21 08:56:31.399 UTC df5fe129-6ab9-48b5-b08b-c8039897cdaa 2012-11-20 15:08:50.723 UTC 28dec4d4-0136-43f6-97da-1ca8350d4a0b 2012-11-21 10:28:05.204 UTC ab0e7729-00c1-43c6-9b2c-d6fc7b699b68 2012-11-20 16:27:03.996 UTC ea666020-7cc0-4fbf-8bc6-b90c97058d19 2012-11-21 09:32:31.639 UTC e855f079-359a-43ac-a58a-010be09d0579 2012-11-21 11:07:14.687 UTC b181db9f-fcd1-4ded-aa20-6a44676d4bca 2012-11-21 11:09:30.692 UTC cda1023b-579c-4cfe-9d1a-d2837139d7fe 2012-11-20 09:46:20.938 UTC 854ab88c-8a73-4e83-96fd-17b3f29c0320 2012-11-23 10:52:17.155 UTC 5b2b5686-2a2c-4b25-a525-3d5c49ce8e48 2012-10-02 11:32:35.237 UTC f2c1dfa6-e950-4c0e-bd18-76595b9feba3 2012-11-20 15:09:23.139 UTC 464af147-ef2c-4450-aade-e98d4457f941 2012-11-20 16:41:55.606 UTC 9bc40904-37d8-4fbe-8b4e-fd92579960b1 2012-11-21 10:27:09.394 UTC bd471fcb-a737-4f28-8e03-7963427ea6c1 2012-11-21 09:17:50.674 UTC 959abfb3-70c5-4acf-8491-4e267cb9b796 2012-11-21 08:32:31.941 UTC 98c1b09e-e8d7-4df0-841a-a5c022db26c2 2012-11-21 10:13:48.31 UTC 3b62fea6-52d6-49b7-a40d-54ab4b265123 2012-11-23 10:56:04.776 UTC 318bf84e-aeac-4cf5-8fa9-1f18fc42a308 2012-10-01 12:36:43.490 UTC 5e5fd84b-87b8-491d-9443-f0e56ad6e49d 2012-11-20 16:39:44.620 UTC cf6b36d5-e60b-4fad-b222-357ba48098fc 2012-11-24 22:52:31.798 UTC f3ff978b-dbdb-419b-9c8d-eb2507f93e20 2012-11-23 11:10:53.369 UTC 42b5800e-6910-4023-90df-4c1e7fdae2e0 2012-11-21 10:24:20.257 UTC 839ac1a8-b355-4d1d-b03f-78645258105a 2012-11-21 08:54:42.78 UTC a8e3e886-d3c7-4bdd-be09-df5d8b25dd90 2012-11-24 17:30:39.543 UTC 55dfbf08-b751-4488-a4c9-664a2e37b65c 2012-11-23 10:05:12.973 UTC 87957b5f-cd2e-4b32-b460-684897840806 2012-11-20 09:45:27.21 UTC 4876a300-70a7-488e-80e7-18edb1c2c712 2012-11-21 09:47:28.214 UTC 29e62dd9-20e7-4851-abc5-17d0dbc3d805 2012-10-02 12:36:17.294 UTC 5342d907-7b84-428a-aeb0-39f9c4462f59 2012-11-21 09:54:33.524 UTC 81553fe0-9aa5-4fc7-a775-19e96144f650 2012-10-02 09:38:25.821 UTC 313b7ca5-8234-4a39-a419-9fe6cd7415c0 2012-11-21 08:39:58.703 UTC c64ca616-4587-48ec-83cb-6be0c42988d6 2012-11-20 15:11:19.903 UTC 58cd6aa9-078d-4920-8278-e60dd2e8a3aa 2012-11-23 11:17:53.648 UTC b9b4a8f3-6932-4e3b-a91b-fa28e2829bf4 2012-11-20 16:50:12.606 UTC 24c30e8d-a759-497e-ae9a-4c2c6cfb7bcf 2012-10-01 12:40:56.91 UTC In this scenario matchbox will find duplicates in passed digital collection. Each matchbox workflow step can be executed separately. User will get a list of duplicates in result. Matchbox in this scenario is installed on remote Linux VM. Digital collection is stored on Windows machine. This workflow starts duplicate finding process using the FindDuplicates python script of the matchbox tool. Matchbox tool support python in version 2.7. Execution starts from the directory where python scripts are located. If you use source code from Github, then it is a scape/pc-qa-matchbox/Python/ directory. The python script supports different parameter. Experienced user can apply extract, train, bowhist and compare parameters in order to execute associated step in the matchbox workflow for duplicate seach. The order of execution steps should not be changed, because each next step requires an output from a previous step. E.g. if you are going to repeat the comparison step you should have calculated required BOWHistogram files from bowhist step. 2012-11-24 22:55:21.450 UTC 8e1761a7-fbd2-4a98-bd96-821772e8145e 2012-11-21 09:10:50.578 UTC 8538abb1-b420-4d8e-be5f-64da636f59c1 2012-10-02 09:37:23.56 UTC ed396424-e724-4d49-b39c-1484605a3674 2012-11-21 08:33:22.31 UTC 72783ac8-1be8-47c7-834e-90b211fd678d 2012-10-02 09:41:25.373 UTC c1fa8de6-c629-454f-aad5-9c2934225d93 2012-11-24 22:55:22.441 UTC AIT Matchbox Scenario Professional 2012-11-24 22:45:44.526 UTC 825b2806-395d-4ee8-90eb-f6d7987edb18 2012-11-21 10:04:26.93 UTC 0593404e-9083-411a-934b-f9cb11f3e331 2012-11-21 10:02:25.716 UTC 6d1c206f-f442-49d3-97b3-50cd4ac66b5f 2012-09-26 15:20:31.846 UTC a0ac15b5-549d-48f0-8d8d-dafb284f2f7b 2012-10-01 13:28:45.414 UTC d6aa25b8-1252-4cc7-be34-4aab3e41cbdd 2012-11-20 15:24:35.670 UTC 6fa652e4-4cd9-462e-adee-7698bd80de14 2012-10-01 12:33:37.287 UTC eded8e59-7047-49e6-a1d4-d274668117f2 2012-11-21 09:49:55.24 UTC c16b4b3d-a30d-44fb-a6db-470e4b08c1b8 2012-09-26 15:09:52.547 UTC ae5b7fa9-8547-447f-ad21-ed824065ebb8 2012-11-23 10:49:06.92 UTC ad7bfaeb-d3d6-4236-be2f-54571a7a4749 2012-11-21 08:55:58.950 UTC a75b1e42-b8f7-4fa0-b2e3-85bc1ad05037 2012-11-23 11:56:50.28 UTC d431cdbd-1e94-4b39-9fbf-23a3eb96442b 2012-11-20 15:17:10.365 UTC 32195543-96d8-4c17-b762-3f72d090e6f2 2012-11-21 09:39:10.817 UTC 911c492f-f087-4993-8307-5e01bb6a2b5d 2012-11-24 22:45:47.483 UTC f9a02783-ec7c-4f8c-8c2a-f64975a90fb8 2012-11-21 10:10:52.956 UTC 680b09a0-e68a-4964-a85f-86c644e4ee8c 2012-11-23 21:46:33.738 UTC b571d13a-a3a3-4a23-a09a-7566fad78cc0 2012-11-23 13:11:27.222 UTC e7660841-1e6b-4481-a9a0-1006936fe9f5 2012-11-20 15:08:37.410 UTC 2efd4378-62f5-41ea-bf6f-dd7d50cd732e 2012-10-01 13:08:54.255 UTC 1d141c97-dfb9-4b99-9729-fbf438886b9f 2012-11-21 10:00:42.446 UTC dc3e6904-14ef-4e62-8d96-15b0742a7503 2012-11-20 15:27:56.827 UTC 857c050a-49b5-4606-bcdb-220455d64291 2012-10-01 12:33:52.730 UTC 32d73c43-4cf5-4abe-a452-816bd12dc063 2012-10-01 12:56:53.971 UTC 767caa14-d01a-4ac2-96d1-3418835b805d 2012-11-21 09:43:35.719 UTC