15em 7em
first release second release open anc
about contents encoding frequency data using xaira bugs & caveats
obtaining contributing contents encoding frequency data using Xaira bugs & caveats
contents using annotations download
home overview masc I download
annotations software source code frequency data publications contributor's FAQ
project people anc mailing list contact us site map

Note

This page describes programs that will be made available upon the second release of the ANC. Beta test versions of the programs can be downloaded from here.


ANC Tools


Since no tools currently support standoff annotations as used in the ANC we provide several tools to convert the document content and standoff annotations into XML documents. We also provide several Gate processing resources that enable it to load and save ANC documents.

Tools For Standoff Annotations in Gate

Gate was used extensively during the development of the ANC. Since Gate represents documents as annotations graphs it is a natural companion to the ANC. These plugins permit Gate to load ANC documents just like any other document type..

org.xces.creole.LoadStandoff

A processing resourse that will load standoff annotations from a cesAna document and add the annotations to the selected document.

org.xces.gate.XCESDocument

A language resource that allows Gate to load the document content and one or more of the standoff annotations at the same time. This resource will appear in Gate under "Language Resourse" -> "new" -> "XCES Document".

org.xces.creole.SaveContent

A processing resource that can be used to write the text content of a document to a file.

org.xces.creole.SaveStandoff

A processing resource that will save selected annotations to a standoff annotation (cesAna) XML file.

Tools for Converting the ANC to XML

We also provide a SAX "like" parser and a graphical front end to the parser that can be used to generate XML documents from the ANC files. However, care should be taken as the parser will also produce invalid XML if asked to merge edge sets that contain overlapping annotations. It is up to the user to ensure that the content and annotations form a valid XML document. The logical markup is always loaded with the content and will (should) always form a valid XML document. The other standoff annotations are optional, however only one set of part of speech tags should be loaded at once as it is not guaranteed that their token annotations do not overlap. These tools should be considered to be of proof of concept quality rather than production grade programs.

xces_parsers.jar

This is package that provides JAXP (like) set of classes including implementations of SAXParserFactory, SAXParser, and XMLReader. Note: this package does not implement the full JAXP API, just enough to get simple tasks accomplished. However, it is robust enough that it can be used as the "XML parser" with Saxon to apply XSLT style sheets to ANC documents.

anc_tool.jar

This is a graphical front end to the above parsers that can be used to preprocess the ANC files into XML.

Xoro

Xoro is a simple scripting language for Gate that was written during the development of the ANC. Xoro is not a deliverable of the ANC, however it has proved so useful that we decided to make it available. Please note that Xoro is still very early in the development stages. You can find more information on Xoro on our Xoro page.