J-Safran : Java Syntaxico-semantic French Analyser

What is it all about ?

J-Safran (Java Syntaxico-semantic French Analyser) is a 100%-Java free open-source software for manual, semi-automatic and automatic syntactic dependency parsing in French. It supports other languages as well, but does only include French models.

Main features

  • Fast GUI for browsing / manual edition of dependency trees (can be realized only with the keyboard for faster edition with many keyboard shorcuts, but also supports mouse and menus)
  • Includes state-of-the-art parsers with associated models (Malt Parser, MATE tools) and taggers (Treetagger, OpenNLP)
  • Supports semi-automatic parsing: (a) pre-parse the whole corpus; (b) edit/correct dependency trees; (c) constrained re-parsing that preserves the previously edited links (only available with MATE for now)
  • Multi-level annotations supported with infinite number of levels: e.g., dependency tree, semantic role labeling, coreference resolution, ...
  • Standard I/O formats: text, CoNLL'06, CoNLL'08, CoNLL'09
  • Smooth interface with JTrans, which allows to associate an audio speech file with the current transcription you are editing, automatically or semi-automatically align the audio and transcription, and listen to any segment you want to annotate, which greatly helps to disambiguate syntactic structures thanks to prosody cues.
  • Supports annotation with Directed Acyclic Graphs (DAG) in addition to trees
  • Supports annotation of labeled words sequences, e.g., for chunks, named entities, ...
  • Supports a rule-based semantic role labeler for verbs that are tagged and parsed with the Frenc Treebanck annotation schema.
  • Includes a powerful rule-based query and transformation language, which is useful for implementing rule-based annotations, automatically transforming the dependency trees from one annotation schema to another, ... Simple standard rules/expressions are automatically built for the current word sequences you are editing.
  • ... and many more ! There is pitifully no up-to-date documentation for JSafran, because it is constantly evolving and improving over time (as for 2011). So, if you're interested, please contact me directly, I'll be glad to hear from you...

Related resource: speech treebank for French

The main motivation for starting the development of JSafran in 2008 was to create a French speech treebank based on the ESTER broadcast news corpus. We have thus annotated 50,000 words up to now. This is not much, but it is the largest manually parsed French Treebank of broadcast news speech we are aware of (Please contact me if you know similar treebanks, I'll be grateful). It allowed us to train Malt and MATE parsing models that give a LAS=77% accuracy, which is far better than what can be achieved when adapting a written-text parser (trained for instance on the French Treebank) to the ESTER corpus (we then get less than 60%). We're still working in this direction to improve and enlarge our treebank.