Input Formats & Support Details¶
Simple Plain Text¶
Structured Plain Text (e.g., csv)¶
brat Annotation¶
The brat rapid annotation tool generates brat standoff format. Annotations are stored in a seconardy file (*.ann) while the original text is found in a plain text file (*.txt). This standoff format uses character offsets to locate spans: “All offsets all [sic] indexed from 0 and include the character at the start offset but exclude the character at the end offset.” See BioNLP Shared Task standoff format for a related format.
Limitations: The extraction engine currently only handles continous text-bound annotations for evaluation. Binary attributes can be extracted and included in the evaluation dictionary but are not scored themselves. Discontinous text-bound annotations, relations, events, multi-value attributes, normalizations, and notes are not supported.
Local sample configuration files (under config/):
- brat_problems_allergies_standoff.conf
XML Formats¶
UIMA CAS XMI¶
Local sample configuration files (under config/):
- CAS_XMI.conf
- i2b2_2016_track-1.conf
- uima_sentences.conf
- webanno_phi_xmi.conf
- webanno_problems_allergies_xmi.conf
- webanno_uima_xmi.conf
Other¶
Extra sample configuration files (via the ETUDE engine configs repository):
- i2b2/…
- n2c2/n2c2_2018_track-1.conf