sasctl.services.text_parsing#

class sasctl._services.text_parsing.TextParsing[source]#

Bases: Service

The Text Parsing API parses natural language text documents.

Parsing is a key operation in understanding your data. Parsing a document involves the following analyses:

  • Identifying terms used in the document

  • Recognizing parts of speech for each term

  • Identifying which terms are entities (person, country, and so on)

  • Resolving synonyms, misspellings, and so on

The output tables that are generated during parsing can also be used in downstream analyses such as topic generation.

Methods

delete(*args, **kwargs)

Send a DELETE request.

get(*args, **kwargs)

Send a GET request.

get_link(obj, rel)

Get link information from a resource.

head(*args, **kwargs)

Send a HEAD request.

info()

Version and build information for the service.

is_available()

Check if the service is currently available.

parse_documents(documents[, caslib, ...])

Performs natural language parsing on the input data.

post(*args, **kwargs)

Send a POST request.

put(*args, **kwargs)

Send a PUT request.

request(verb, path[, session, format_])

Send an HTTP request with a session.

request_link(obj, rel, **kwargs)

Request a link from a resource.

is_uuid

classmethod parse_documents(documents, caslib=None, id_column=None, text_column=None, description=None, standard_entities=False, noun_groups=False, min_doc_count=10, concept_model=None, output_postfix=None, spell_check=False, override_list=None, stop_list=None, start_list=None, synonym_list=None, language='en')[source]#

Performs natural language parsing on the input data.

Creates a text parsing job that executes asynchronously. There are two different interactions for parsing: parsing documents in CAS tables and parsing documents that are uploaded directly.

Parameters:
documentsstr or dict or list_like:

Documents to parse. May be either the URI to a CAS table where the documents are currently stored, or an iterable of strings containing the documents’ text.

caslibstr or dict, optional

URI of a caslib in which the documents will be stored. Required if documents is a list of strings.

id_columnstr, optional

The column in documents that contains a unique id for each document. Required if documents is a CAS table URI.

text_columnstr, optional

The column in documents that contains the document text to parse. Required if documents is a CAS table URI.

descriptionstr, optional

Description to add to the text parsing job.

standard_entitiesbool, optional
noun_groupsbool, optional
min_doc_countint, optional

Minimum number of documents in which a term must appear to be kept. Defaults to 10.

output_postfixstr, optional

Text to be added to the end of all output table names.

spell_checkbool, optional

Whether spell checking should be performed during parsing.

concept_modelstr or dict, optional

URI of a table containing the concept LITI binaries to apply during parsing.

override_liststr or dict, optional

URI of a table containing overrides for the keep and drop terms.

languagestr, optional

Two letter ISO 639-1 code indicating the source language. Defaults to ‘en’.

Returns:
RestObj

The submitted job

See also

cas_management.get_caslib
cas_management.get_table