text_extraction.py Functions¶
-
text_extraction.
create_annotation_entry
(begin_pos=-1, begin_pos_mapped=None, end_pos=-1, end_pos_mapped=None, raw_text=None, pivot_attr=None, pivot_value=None, parity=None, tag_name=None)[source]¶
-
text_extraction.
extract_annotations
(ingest_file, namespaces, document_data, patterns, skip_chars=None, out_file=None)[source]¶
-
text_extraction.
extract_annotations_brat_standoff
(ingest_file, offset_mapping, type_prefix, tag_name, line_type, optional_attributes=[], normalization_engines=[])[source]¶
-
text_extraction.
extract_annotations_csv
(csv_file, delimiter, tag_name, begin_column=None, end_column=None, text_column=None, optional_attributes=[])[source]¶
-
text_extraction.
extract_annotations_json
(ingest_file, raw_content, offset_mapping, annotation_path, tag_name, begin_attribute=None, end_attribute=None, optional_attributes=[], normalization_engines=[])[source]¶
-
text_extraction.
extract_annotations_plaintext
(offset_mapping, raw_content, delimiter, tag_name)[source]¶
-
text_extraction.
extract_annotations_semeval_pipes
(ingest_file, offset_mapping, tag_name, optional_attributes=[])[source]¶
-
text_extraction.
extract_annotations_tsv
(tsv_file, raw_content, offset_mapping, tag_name, optional_attributes=[])[source]¶
-
text_extraction.
extract_annotations_xml
(ingest_file, offset_mapping, annotation_path, tag_name, namespaces={}, begin_attribute=None, end_attribute=None, text_attribute=None, optional_attributes=[], normalization_engines=[])[source]¶
-
text_extraction.
extract_annotations_xml_spanless
(ingest_file, annotation_path, tag_name, pivot_attribute, parity, namespaces={}, text_attribute=None, optional_attributes=[])[source]¶
-
text_extraction.
extract_brat_event
(ingest_file, annot_line, tag_name, optional_attributes=[])[source]¶
-
text_extraction.
extract_brat_normalization
(ingest_file, annot_line, normalization_engines=[])[source]¶
-
text_extraction.
extract_brat_relation
(ingest_file, annot_line, tag_name, optional_attributes=[])[source]¶
-
text_extraction.
extract_brat_text_bound_annotation
(ingest_file, annot_line, offset_mapping, tag_name, line_type, optional_attributes=[])[source]¶
-
text_extraction.
map_position
(offset_mapping, position, direction)[source]¶ Convert a character position to the closest non-skipped position.
Use the offset mapping dictionary to convert a position to the closest valid character position. We include a direction for the mapping because it is important to consider the closest position to the right or left of a position when mapping the start or end position, respectively.
Parameters: - offset_mapping – a dictionary mapping character positions to
None
if the character is in the skip list or to an int, otherwise - position – current character position
- direction – 1, if moving right; -1 if moving left
Returns: character position if all skipped characters were removed from the document and positions re-assigned or
None
, on KeyError- offset_mapping – a dictionary mapping character positions to