sig_1
A selection of 24 documents, created by jhermes
Merkels RE
ri-2123914534
ri1799463024
ri41608217
abf36670-8587-4e2f-aa37-10a1fe738b37
abf36670-8587-4e2f-aa37-10a1fe738b37
de.uni_koeln.spinfo.tesla.component.reader.tika.DefaultTikaReader
de.uni_koeln.spinfo.tesla.roles.expressions.impl.hibernate.data.Url
de.uni_koeln.spinfo.tesla.roles.expressions.impl.hibernate.access.UrlAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
1799463024
URL Detector
General information about this role: Detects URLs.
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Paragraph
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TParagraphAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff
-2123914534
Paragraph Detector
General information about this role: Detects paragraph boundaries.
de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataImpl
de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataAccessAdapterImpl
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
41608217
Dublin Core Metadata Generator
General information about this role: Generates Dublin Core metadata annotations.
If enabled (default), the reader will detect URLs and generate corresponding annotations.
true
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
true
false
Stephan Schwiebert
sschwieb@spinfo.uni-koeln.de
Department of Computational Linguistics, University of Cologne
http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html
A general purpose reader which uses Apache Tika, such that it supports various formats, like RTF, PDF, ODF, HTML and MS Office. Note, however, that the structure of a document will not be extracted or annotated.
http://tika.apache.org
java.lang.String
de.uni_koeln.spinfo.tesla.component.simpletokenizer.SimpleTokenizer
de.uni_koeln.spinfo.tesla.roles.core.impl.hibernate.data.Token
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TTokenizerAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff
2043391654
Tokenizer
General information about this role: Detects linguistic tokens.
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Sentence
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TSentenceTokenAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff
-551536549
Sentence Detector
General information about this role: Detects sentence boundaries.
The locale that will be used to determine word and sentence boundaries. For best results, set this value to the language of the texts which are being processed. See http://download.oracle.com/javase/6/docs/api/java/util/Locale.html and http://download.oracle.com/javase/6/docs/api/java/text/BreakIterator.html for technical details.
default
false
If true, the tokenizer will produce annotations for whitespaces. If this option is enabled, nearly twice as much annotations will be generated, such that itis set to false by default.
false
false
If true, the type id of an annotation will be calculated by the underlying string in lowercase letters. This reduces the overall quantity of types produced (at least if used for texts which contain lots of capital letters), however, it might affect the quality of components which make use of the type ids generated by this component.
true
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
true
false
Stephan Schwiebert
sschwieb@spinfo.uni-koeln.de
Department of Computational Linguistics, University of Cologne
http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html
A quick and dirty layered tokenizer based on Java's java.text.BreakIterator.
Intention of this tokenizer was to test the exchangeability of SPre and to offer a tokenizer which is "failsafe", as it cannot be misconfigured.
Note, however, that other tokenizers will probably produce much better results than this one.
No external URL defined
de.uni_koeln.spinfo.tesla.component.statistics.TfIdfCalculator
de.uni_koeln.spinfo.tesla.roles.document.impl.data.Frequencies
de.uni_koeln.spinfo.tesla.roles.document.impl.access.TunguskaStatisticsAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter
402173111
TF/IDF
General information about this role: Provides access to the term frequency/inverse document frequency statistics.
-
Anchored Element Generator
Generates anchored elements.
de.uni_koeln.spinfo.tesla.roles.core.AnchoredElementGenerator
b470a2c7-f610-42ca-a159-5c5a5529a4a4
2043391654
de.uni_koeln.spinfo.tesla.roles.core.access.IAnchoredElementAccessAdapter
de.uni_koeln.spinfo.tesla.roles.core.data.IAnchoredElement
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
true
false
Stephan Schwiebert
sschwieb@spinfo.uni-koeln.de
Sprachliche Informationsverarbeitung
http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html
Calculates Term Frequency/Inverse Document Frequency of any data objects, based on their type id.
http://en.wikipedia.org/wiki/Tf%E2%80%93idf
de.uni_koeln.spinfo.tesla.component.statistics.DocumentLogLikelihoodCalculator
de.uni_koeln.spinfo.tesla.roles.table.impl.TabularSummary
de.uni_koeln.spinfo.tesla.roles.table.impl.TabularSummaryAccessAdapterImpl
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
-1484471700
Tabular Summary Generator
General information about this role: Generates summaries (tables) from generated results.
-
Tokenizer
Detects linguistic tokens.
de.uni_koeln.spinfo.tesla.roles.tokenizer.Tokenizer
b470a2c7-f610-42ca-a159-5c5a5529a4a4
2043391654
de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ITokenAccessAdapter
de.uni_koeln.spinfo.tesla.roles.tokenizer.data.IToken
-
TF/IDF
Provides access to the term frequency/inverse document frequency statistics.
de.uni_koeln.spinfo.tesla.roles.document.statistics.TfIdf
b447d2a9-8c38-4ee3-bfd7-0eebbbe77b59
402173111
de.uni_koeln.spinfo.tesla.roles.document.access.ITfIdfAccessAdapter
de.uni_koeln.spinfo.tesla.roles.document.data.IFrequencies
Number of types (with best log likelihood value) to export in result table
100
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
true
false
jhermes
hermesj@uni-koeln.de
spinfo
Calculates log-likelihood values (http://en.wikipedia.org/wiki/Likelihood_function) for types of individual documents in relation to all documents of the experiment.
/LogLikelihood
Merkels RE LL
Experiment Description
jhermes
hermesj@uni-koeln.de
none
http://texperimentales.hypotheses.org