This workflow generates additional files required for handling DOI creation: the DOI URL mapping required for the DOI deposit, and a set of sql update statements to insert the DOIs into an eprints database.
Note that it is extremely important for this workflow to use the same CSV file as was used with the DOI record generator, as well as the same seed number.
Location of the source CSV with data for creating records.
/file_location/eprint.csv
Shim to read in the file, location provided by a string constant.
net.sourceforge.taverna.scuflworkers.io.TextFileReader
Takes a flat CSV input file and splits it into a list.
\n
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
Takes a single string output and converts it to a list.
\n
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
Takes the list input and creates a 2-deep list.
";"
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
Creates a sequential integer series for assignment to DOIs.
// A. Wiggins 4/29/2009
count = trigger.size();
delim = "\n";
out = new String();
for(i=0; i < count; i++){
out = out + (i + Integer.valueOf(seed));
out = out + delim;
}
number_sequence = out;
seed
trigger
number_sequence
Ensures two-digit month values.
// A. Wiggins 4/29/2009
import java.util.regex.Pattern;
String jan_regex = "1";
String feb_regex = "2";
String mar_regex = "3";
String apr_regex = "4";
String may_regex = "5";
String jun_regex = "6";
String jul_regex = "7";
String aug_regex = "8";
String sep_regex = "9";
if (Pattern.matches(jan_regex, month)) {
formatted_month = "01";
} else {
if (Pattern.matches(feb_regex, month)) {
formatted_month = "02";
} else {
if (Pattern.matches(mar_regex, month)) {
formatted_month = "03";
} else {
if (Pattern.matches(apr_regex, month)) {
formatted_month = "04";
} else {
if (Pattern.matches(may_regex, month)) {
formatted_month = "05";
} else {
if (Pattern.matches(jun_regex, month)) {
formatted_month = "06";
} else {
if (Pattern. matches(jul_regex, month)) {
formatted_month = "07";
} else {
if (Pattern.matches(aug_regex, month)) {
formatted_month = "08";
} else {
if (Pattern.matches(sep_regex, month)) {
formatted_month = "09";
} else {
formatted_month = month;
}
}
}
}
}
}
}
}
}
month
formatted_month
Aggregates the individual URL mappings into a single file to send to the registry agency.
// A. Wiggins 4/29/2009
delim = "\n";
count = doi.size();
out = "" + delim;
for(i = 0; i < count; i++) {
out = out + doi.get(i);
out = out + delim;
}
out = out;
import_file = out;
doi
import_file
Creates a unique DOI string following the format <DOI.base>/<source>.<year>-<month>.<sequential number>.
// A. Wiggins 4/29/2009
//replace any spaces in the name of the source repository
formatted_source = source.replaceAll("\\s","");
//assemble metadata elements and numberic list into DOI
doi = "doi.base/FLOSSmole." + formatted_source + "." + year + "-" + month + "." + seed;
source
year
month
seed
doi
Generates a pairlist of DOIs and associated URLs for the files.
// A. Wiggins 4/29/2009
doi_records = doi+" - "+url;
doi
url
doi_records
Reads the 2-deep input list and splits out the values into separate variables.
// A. Wiggins 4/29/2009
eprintid = file.get(0);
year = file.get(3);
month = file.get(4);
url = file.get(6);
source = file.get(11);
file
eprintid
year
month
url
source
Creates sql update statements for each new DOI.
// A. Wiggins 6/5/2009
sql_insert = "UPDATE `eprint` SET `id_number` = \"" + doi + "\" WHERE `eprintid` = \""+eprintid+"\";";
doi
eprintid
sql_insert
Aggregates the individual sql update statements into a single file.
// A. Wiggins 6/5/2009
delim = "\n";
count = sql.size();
out = "" + delim;
for(i = 0; i < count; i++) {
out = out + sql.get(i);
out = out + delim;
}
out = out;
update_statements = out;
sql
update_statements
Seed number for generating unique DOIs.
Update statements to insert DOIs into an existing eprints database.