What is it all about ?J-Safran (Java Syntaxico-semantic French Analyser) is a 100%-Java
free open-source software for manual, semi-automatic and automatic
syntactic dependency parsing in French. It supports other languages as
well, but does only include French models.
- Fast GUI for browsing / manual edition of dependency
trees (can be realized only with the keyboard for faster edition with
many keyboard shorcuts, but also supports mouse and menus)
- Includes state-of-the-art parsers with associated models (Malt Parser, MATE tools) and taggers (Treetagger, OpenNLP)
- Supports semi-automatic parsing: (a) pre-parse the whole corpus; (b) edit/correct dependency trees; (c) constrained re-parsing that preserves the previously edited links (only available with MATE for now)
- Multi-level annotations supported with infinite number of levels: e.g., dependency tree, semantic role labeling, coreference resolution, ...
- Standard I/O formats: text, CoNLL'06, CoNLL'08, CoNLL'09
- Smooth interface with JTrans, which allows to associate an audio speech file with the current transcription you are editing, automatically or semi-automatically align the audio and transcription, and listen to any segment you want to annotate, which greatly helps to disambiguate syntactic structures thanks to prosody cues.
- Supports annotation with Directed Acyclic Graphs (DAG) in addition to trees
- Supports annotation of labeled words sequences, e.g., for chunks, named entities, ...
- Supports a rule-based semantic role labeler for verbs that are tagged and parsed with the Frenc Treebanck annotation schema.
- Includes a powerful rule-based query and transformation language, which is useful for implementing rule-based annotations, automatically transforming the dependency trees from one annotation schema to another, ... Simple standard rules/expressions are automatically built for the current word sequences you are editing.
- ... and many more ! There is pitifully no up-to-date
documentation for JSafran, because it is constantly evolving and
improving over time (as for 2011). So, if you're interested, please
contact me directly, I'll be glad to hear from you...
Related resource: speech treebank for French
The main motivation for starting the development of JSafran in
2008 was to create a French speech treebank based on the ESTER
broadcast news corpus. We have thus annotated 50,000 words up to now.
This is not much, but it is the largest manually parsed French Treebank
of broadcast news speech we are aware of (Please contact me if you know
similar treebanks, I'll be grateful). It allowed us to train Malt and
MATE parsing models that give a LAS=77% accuracy, which is far better
than what can be achieved when adapting a written-text parser (trained
for instance on the French Treebank) to the ESTER corpus (we then get
less than 60%). We're still working in this direction to improve and
enlarge our treebank.