sasctl.services.text_parsing#
- class sasctl._services.text_parsing.TextParsing[source]#
Bases:
Service
The Text Parsing API parses natural language text documents.
Parsing is a key operation in understanding your data. Parsing a document involves the following analyses:
Identifying terms used in the document
Recognizing parts of speech for each term
Identifying which terms are entities (person, country, and so on)
Resolving synonyms, misspellings, and so on
The output tables that are generated during parsing can also be used in downstream analyses such as topic generation.
Methods
delete
(*args, **kwargs)Send a DELETE request.
get
(*args, **kwargs)Send a GET request.
get_link
(obj, rel)Get link information from a resource.
head
(*args, **kwargs)Send a HEAD request.
info
()Version and build information for the service.
is_available
()Check if the service is currently available.
parse_documents
(documents[, caslib, ...])Performs natural language parsing on the input data.
post
(*args, **kwargs)Send a POST request.
put
(*args, **kwargs)Send a PUT request.
request
(verb, path[, session, format_])Send an HTTP request with a session.
request_link
(obj, rel, **kwargs)Request a link from a resource.
is_uuid
- classmethod parse_documents(documents, caslib=None, id_column=None, text_column=None, description=None, standard_entities=False, noun_groups=False, min_doc_count=10, concept_model=None, output_postfix=None, spell_check=False, override_list=None, stop_list=None, start_list=None, synonym_list=None, language='en')[source]#
Performs natural language parsing on the input data.
Creates a text parsing job that executes asynchronously. There are two different interactions for parsing: parsing documents in CAS tables and parsing documents that are uploaded directly.
- Parameters:
- documentsstr or dict or list_like:
Documents to parse. May be either the URI to a CAS table where the documents are currently stored, or an iterable of strings containing the documents’ text.
- caslibstr or dict, optional
URI of a caslib in which the documents will be stored. Required if documents is a list of strings.
- id_columnstr, optional
The column in documents that contains a unique id for each document. Required if documents is a CAS table URI.
- text_columnstr, optional
The column in documents that contains the document text to parse. Required if documents is a CAS table URI.
- descriptionstr, optional
Description to add to the text parsing job.
- standard_entitiesbool, optional
- noun_groupsbool, optional
- min_doc_countint, optional
Minimum number of documents in which a term must appear to be kept. Defaults to 10.
- output_postfixstr, optional
Text to be added to the end of all output table names.
- spell_checkbool, optional
Whether spell checking should be performed during parsing.
- concept_modelstr or dict, optional
URI of a table containing the concept LITI binaries to apply during parsing.
- override_liststr or dict, optional
URI of a table containing overrides for the keep and drop terms.
- languagestr, optional
Two letter ISO 639-1 code indicating the source language. Defaults to ‘en’.
- Returns:
- RestObj
The submitted job
See also
cas_management.get_caslib
cas_management.get_table