sig_1 A selection of 24 documents, created by jhermes Merkels RE ri-2123914534 ri1799463024 ri41608217 abf36670-8587-4e2f-aa37-10a1fe738b37 abf36670-8587-4e2f-aa37-10a1fe738b37 de.uni_koeln.spinfo.tesla.component.reader.tika.DefaultTikaReader de.uni_koeln.spinfo.tesla.roles.expressions.impl.hibernate.data.Url de.uni_koeln.spinfo.tesla.roles.expressions.impl.hibernate.access.UrlAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter 1799463024 URL Detector General information about this role: Detects URLs. de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Paragraph de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TParagraphAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff -2123914534 Paragraph Detector General information about this role: Detects paragraph boundaries. de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataImpl de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataAccessAdapterImpl de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter 41608217 Dublin Core Metadata Generator General information about this role: Generates Dublin Core metadata annotations. If enabled (default), the reader will detect URLs and generate corresponding annotations. true false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. true false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Department of Computational Linguistics, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html A general purpose reader which uses Apache Tika, such that it supports various formats, like RTF, PDF, ODF, HTML and MS Office. Note, however, that the structure of a document will not be extracted or annotated. http://tika.apache.org java.lang.String de.uni_koeln.spinfo.tesla.component.simpletokenizer.SimpleTokenizer de.uni_koeln.spinfo.tesla.roles.core.impl.hibernate.data.Token de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TTokenizerAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff 2043391654 Tokenizer General information about this role: Detects linguistic tokens. de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Sentence de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TSentenceTokenAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff -551536549 Sentence Detector General information about this role: Detects sentence boundaries. The locale that will be used to determine word and sentence boundaries. For best results, set this value to the language of the texts which are being processed. See http://download.oracle.com/javase/6/docs/api/java/util/Locale.html and http://download.oracle.com/javase/6/docs/api/java/text/BreakIterator.html for technical details. default false If true, the tokenizer will produce annotations for whitespaces. If this option is enabled, nearly twice as much annotations will be generated, such that itis set to false by default. false false If true, the type id of an annotation will be calculated by the underlying string in lowercase letters. This reduces the overall quantity of types produced (at least if used for texts which contain lots of capital letters), however, it might affect the quality of components which make use of the type ids generated by this component. true false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. true false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Department of Computational Linguistics, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html A quick and dirty layered tokenizer based on Java's java.text.BreakIterator. Intention of this tokenizer was to test the exchangeability of SPre and to offer a tokenizer which is "failsafe", as it cannot be misconfigured. Note, however, that other tokenizers will probably produce much better results than this one. No external URL defined de.uni_koeln.spinfo.tesla.component.statistics.TfIdfCalculator de.uni_koeln.spinfo.tesla.roles.document.impl.data.Frequencies de.uni_koeln.spinfo.tesla.roles.document.impl.access.TunguskaStatisticsAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter 402173111 TF/IDF General information about this role: Provides access to the term frequency/inverse document frequency statistics. Anchored Element Generator Generates anchored elements. de.uni_koeln.spinfo.tesla.roles.core.AnchoredElementGenerator b470a2c7-f610-42ca-a159-5c5a5529a4a4 2043391654 de.uni_koeln.spinfo.tesla.roles.core.access.IAnchoredElementAccessAdapter de.uni_koeln.spinfo.tesla.roles.core.data.IAnchoredElement If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. true false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Sprachliche Informationsverarbeitung http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html Calculates Term Frequency/Inverse Document Frequency of any data objects, based on their type id. http://en.wikipedia.org/wiki/Tf%E2%80%93idf de.uni_koeln.spinfo.tesla.component.statistics.DocumentLogLikelihoodCalculator de.uni_koeln.spinfo.tesla.roles.table.impl.TabularSummary de.uni_koeln.spinfo.tesla.roles.table.impl.TabularSummaryAccessAdapterImpl de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter -1484471700 Tabular Summary Generator General information about this role: Generates summaries (tables) from generated results. Tokenizer Detects linguistic tokens. de.uni_koeln.spinfo.tesla.roles.tokenizer.Tokenizer b470a2c7-f610-42ca-a159-5c5a5529a4a4 2043391654 de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ITokenAccessAdapter de.uni_koeln.spinfo.tesla.roles.tokenizer.data.IToken TF/IDF Provides access to the term frequency/inverse document frequency statistics. de.uni_koeln.spinfo.tesla.roles.document.statistics.TfIdf b447d2a9-8c38-4ee3-bfd7-0eebbbe77b59 402173111 de.uni_koeln.spinfo.tesla.roles.document.access.ITfIdfAccessAdapter de.uni_koeln.spinfo.tesla.roles.document.data.IFrequencies Number of types (with best log likelihood value) to export in result table 100 false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. true false jhermes hermesj@uni-koeln.de spinfo Calculates log-likelihood values (http://en.wikipedia.org/wiki/Likelihood_function) for types of individual documents in relation to all documents of the experiment. /LogLikelihood Merkels RE LL Experiment Description jhermes hermesj@uni-koeln.de none http://texperimentales.hypotheses.org