mercredi 19 décembre 2012

Apache UIMA HMM Tagger FR Models

Download here: Models for the Apache UIMA Hidden Markov Model Tagger Annotator [1] (from the sandbox UIMA Addons)

The models concern the following tasks:
* Part of speech tagging (POS)
* Grammatical subcategorization (Subcat)
* Morphological inflection analysis (Mph) 
* Lemmatization(canonical form) 
* Ee analysis (POS + Subcat + Mph)

Models have been built with the addon's version 2.4 using the French Treebank corpus [2] (version 2010). The ftb licence does not prevent to distribute analysis results under whatever licence but it mentions that the ftb should be used only for research purpose.Consequently we restrict the use of these models only for research purposes.

To get the '.dat', unzip and have a look to the '/HMMTrainerTagger/french/' dir

[1] http://uima.apache.org/sandbox.html#tagger.annotator
[2] For more on the French Treebank, see Abeille, A., L. Clement, and F. Toussenel. 2003. `Building a treebank for French', in A. Abeille (ed) Treebanks , Kluwer, Dordrecht. http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php

Apache OpenNLP FR Models


Download here: the last version of the models for processing several common Natural Language Processing tasks in French with Apache OpenNLP  [1]

The models concern the following tasks: Sentence segmentation, Word tokenization, Part-of-Speech Tagging, Morphological inflection analysis*, Lemmatization*, Chunking, Person|Organization|Location Name Entity recognition**

Except for Named Entity models, models have been built using the French Treebank corpus [2] (version 2010). Its licence does not prevent to distribute its analysis results under whatever licence but it mentions that the ftb should be used only for research purpose. Consequently we restrict the use of these models only for research purposes. 


* To be used with the tagger 

** Named Finder Models have been built by Olivier Grisel. See for more detail [3]. 

To get the '.bin' files, unzip and have a loot at the '/opennlp/models/fr/' dir. 

[2] For more on the French Treebank, see Abeille, A., L. Clement, and F. Toussenel. 2003. `Building a treebank for French', in A. Abeille (ed) Treebanks , Kluwer, Dordrecht. http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php 

UIMA AS (Asynchronous Scaleout) Tutorial?


At the bottom of the Getting Started: Apache UIMA Asynchronous Scaleout [1] page, you find the following mention
See the README file [3] in the top level directory for instructions on deploying and testing standard UIMA example annotators as UIMA AS services.
Download for example the binary version of UIMA AS [2] and you fill find out the README in the top level directory. It contains various information for Installation and Setup, and examples for Starting the ActiveMQ Broker,  Deploying an Analysis Engine as a UIMA AS Asynchronous Service, Calling a UIMA AS Asynchronous Service, Migration from CPM to UIMA-AS...

Again the direct link to this README file [3].

[1]: http://uima.apache.org/doc-uimaas-what.html
[2]: http://uima.apache.org/downloads.cgi
[3]: http://svn.apache.org/repos/asf/uima/uima-as/trunk/README