swat.cas.table.CASTable

class swat.cas.table.CASTable(name, **table_params)

Bases: swat.cas.utils.params.ParamManager, swat.cas.utils.params.ActionParamManager

Object for interacting with CAS tables

CASTable objects can be used in multiple ways. They can be used as simply a container of table parameters and used as CAS action parameter values. If a connection is associated with it (either by instantiating it from CAS.CASTable() or using set_connection()), it can be used to call CAS actions on the table. Finally, it supports much of the pandas.DataFrame API, so it can interact with CAS tables in much the same way you interact with local data.

The parameters below are a superset of all of the available parameters. Some CAS actions may not support all parameters. You will need to see the help for each CAS action on what it supports.

Parameters
namestring or CASTable

specifies the name of the table to use.

caslibstring, optional

specifies the caslib containing the table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

wherestring, optional

specifies an expression for subsetting the input data.

groupbylist of dicts, optional

specifies the names of the variables to use for grouping results.

groupbyfmtslist, optional

specifies the format to apply to each group-by variable. To avoid specifying a format for a group-by variable, use “” (no format).
Default: []

orderbylist of dicts, optional

specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables or it can be combined with groupBy variables when groupByMode is set to REDISTRIBUTE.

computedvarslist of dicts, optional

specifies the names of the computed variables to create. Specify an expression for each parameter in the computedvarsprogram parameter.

computedvarsprogramstring, optional

specifies an expression for each variable that you included in the computedvars parameter.

groupbymodestring, optional

specifies how the server creates groups.
Default: NOSORT Values: NOSORT, REDISTRIBUTE

computedondemandboolean, optional

when set to True, the computed variables specified in the compVars parameter are created when the table is loaded instead of when the action begins.
Default: False

singlepassboolean, optional

when set to True, the data does not create a transient table in the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
Default: False

importoptionsdict, optional

specifies the settings for reading a table from a data source.

ondemandboolean, optional

when set to True, table access is less aggressive with virtual memory use.
Default: True

varslist of dicts, optional

specifies the variables to use in the action.

timestampstring, optional

specifies the timestamp to apply to the table. Specify the value in the form that is appropriate for your session locale. Used only on output table definitions.

compressboolean, optional

when set to True, data compression is applied to the table. Used only on output table definitions.
Default: False

replaceboolean, optional

specifies whether to overwrite an existing table with the same name. Used only on output table definitions.
Default: False

replicationint32, optional

specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Used only on output table definitions.
Default: 1
Note: Value range is 0 <= n < 2147483647

threadblocksizeint64, optional

specifies the number of bytes to use for blocks that are read by threads. Increase this value only if you have a large table and CPU utilization by threads shows thread starvation. Used only on output table definitions.
Note: Value range is 0 <= n < 9223372036854775807

labelstring, optional

specifies the descriptive label to associate with the table.

maxmemsizeint64, optional

specifies the maximum amount of physical memory, in bytes, to allocate for the table. After this threshold is reached, the server uses temporary files and operating system facilities for memory management. Used only on output table definitions.
Default: 0

promoteboolean, optional

when set to True, the output table is added with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope. Used only on output table definitions.
Default: False

ondemandboolean, optional

when set to True, table access is less aggressive with virtual memory use. Used only on output table definitions.
Default: True

Returns
CASTable

Examples

Create a CASTable registered to conn.

>>> conn = swat.CAS()
>>> iris = conn.CASTable('iris')

Use the table as a CAS action parameter.

>>> summ = conn.summary(table=iris)
>>> print(summ)

Call a CAS action directly on the CASTable.

>>> summ = iris.summary()
>>> print(summ)

Use a CASTable as an output table definition.

>>> summout = conn.summary(table=iris,
...                        casout=swat.CASTable('summout', replace=True))
>>> print(summout)

Use a CASTable like a pandas.DataFrame

>>> print(iris.head())
>>> print(iris[['petal_length', 'petal_width']].describe())
__init__(self, name, **table_params)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self, name, \*\*table_params)

Initialize self.

abs(self)

Return a new CASTable with absolute values of numerics

all(self[, axis, bool_only, skipna, level])

Return True for each column with only elements that evaluate to true

any(self[, axis, bool_only, skipna, level])

Return True for each column with at least one true element

append(self, other[, ignore_index, …])

Append rows of other to self

append_columns(self, \*items, \*\*kwargs)

Append variable names to action inputs parameter

append_computed_columns(self, names, code[, …])

Append computed columns as specified

append_computedvars(self, \*items, \*\*kwargs)

Append variable names to computedvars parameter

append_computedvarsprogram(self, \*items, …)

Append code to computedvarsprogram parameter

append_groupby(self, \*items, \*\*kwargs)

Append variable names to groupby parameter

append_orderby(self, \*items, \*\*kwargs)

Append orderby parameters

append_where(self, \*items, \*\*kwargs)

Append code to where parameter

as_matrix(self[, columns, n])

Represent CASTable as a Numpy array

boxplot(self[, column, by])

Make a boxplot from the table data

clip(self[, lower, upper, axis])

Clip values at thresholds

clip_lower(self, threshold[, axis])

Clip values at lower threshold

clip_upper(self, threshold[, axis])

Clip values at upper threshold

copy(self[, deep, exclude])

Make a copy of the CASTable object

corr(self[, method, min_periods])

Compute pairwise correlation of columns

count(self[, axis, level, numeric_only])

Return total number of non-missing values in each column

css(self[, casout])

Return the corrected sum of squares of the values of each column

cv(self[, casout])

Return the coefficient of variation of the values of each column

datastep(self, code[, casout])

Execute Data step code against the table

del_action_params(self, \*names)

Delete parameters for specified action names

del_param(self, \*keys)

Delete parameters

del_params(self, \*keys)

Delete parameters

describe(self[, percentiles, include, …])

Get descriptive statistics

drop(self, labels[, axis, level, inplace, …])

Return a new CASTable object with the specified columns removed

dropna(self[, axis, how, thresh, subset, …])

Drop rows that contain missing values

eval(self, expr[, inplace, kwargs])

Evaluate a CAS table expression

exists(self)

Return True if table exists in the server

fillna(self[, value, method, axis, inplace, …])

Fill missing values using the specified method

from_csv(connection, path[, casout])

Create a CASTable from a CSV file

from_dict(connection, data[, casout])

Create a CASTable from a dictionary

from_items(connection, items[, casout])

Create a CASTable from a (key, value) pairs

from_records(connection, data[, casout])

Create a CASTable from records

get(self, key[, default])

Get item from object for given key (ex: DataFrame column)

get_action_names(self)

Return a list of available CAS actions

get_action_params(self, name, \*default)

Return parameters for specified action name

get_actionset_names(self)

Return a list of available actionsets

get_connection(self)

Get the registered connection object

get_dtype_counts(self)

Retrieve the frequency of CAS table column data types

get_fetch_params(self)

Return options to be used during the table.fetch action

get_ftype_counts(self)

Retrieve the frequency of CAS table column data types

get_groupby_vars(self)

Return a list of By group variable names

get_inputs_param(self)

Return the column names for the inputs= action parameter

get_param(self, key, \*default)

Return the value of a parameter

get_params(self, \*keys)

Return the values of one or more parameters

get_value(self, index, col, \*\*kwargs)

Retrieve a single scalar value

groupby(self, by[, axis, level, as_index, …])

Specify grouping variables for the table

has_groupby_vars(self)

Return True if the table has By group variables configured

has_param(self, \*keys)

Return True if the specified parameters exist

has_params(self, \*keys)

Return True if the specified parameters exist

head(self[, n, columns, bygroup_as_index, …])

Retrieve first n rows

hist(self[, column, by])

Make a histogram from the table data

info(self[, verbose, buf, max_cols, …])

Print summary of CASTable information

invoke(self, _name_, \*\*kwargs)

Invoke an action on the registered connection

iteritems(self)

Iterate over column names and CASColumn objects

iterrows(self[, chunksize])

Iterate over the rows of a CAS table as (index, pandas.Series) pairs

itertuples(self[, index, chunksize])

Iterate over rows as tuples

kurt(self[, axis, skipna, level, …])

Return the kurtosis of the values of each column

kurtosis(self[, axis, skipna, level, …])

Return the kurtosis of the values of each column

lookup(self, row_labels, col_labels)

Retrieve values indicated by row_labels, col_labels positions

max(self[, axis, skipna, level, …])

Return the maximum value of each column

mean(self[, axis, skipna, level, …])

Return the mean value of each column

median(self[, axis, skipna, level, …])

Return the median value of each numeric column

merge(self, right[, how, on, left_on, …])

Merge CASTable objects using a database-style join on a column

min(self[, axis, skipna, level, …])

Return the minimum value of each column

mode(self[, axis, numeric_only, max_tie, skipna])

Return the mode of each column

next(self)

Return next item in the iteration

nlargest(self, n, columns[, keep, casout])

Return the n largest values ordered by columns

nmiss(self[, axis, level, numeric_only, casout])

Return total number of missing values in each column

nsmallest(self, n, columns[, keep, casout])

Return the n smallest values ordered by columns

nth(self, n[, dropna, bygroup_as_index, casout])

Return the nth row

pop(self, colname)

Remove a column from the CASTable and return it

probt(self[, casout])

Return the p-value of the T-statistics of the values of each column

quantile(self[, q, axis, numeric_only, …])

Return values at the given quantile

query(self, expr[, inplace, engine])

Query the table with a boolean expression

replace(self[, to_replace, value, inplace, …])

Replace values in the data set

reset_index(self[, level, drop, inplace, …])

Reset the CASTable index

retrieve(self, _name_, \*\*kwargs)

Invoke an action on the registered connection and retrieve results

sample(self[, n, frac, replace, weights, …])

Returns a random sample of the table rows

select_dtypes(self[, include, exclude, inplace])

Return a subset CASTable including/excluding columns based on data type

set_action_params(self, name, \*\*kwargs)

Set parameters for specified action name

set_connection(self, connection)

Set the connection to use for action calls

set_param(self, \*args, \*\*kwargs)

Set paramaters according to key-value pairs

set_params(self, \*args, \*\*kwargs)

Set paramaters according to key-value pairs

skew(self[, axis, skipna, level, …])

Return the skewness of the values of each column

skewness(self[, axis, skipna, level, …])

Return the skewness of the values of each column

slice(self[, start, stop, columns, …])

Retrieve the specified rows

sort(self, by[, axis, ascending, inplace, …])

Specify sort parameters for data in a CASTable

sort_values(self, by[, axis, ascending, …])

Specify sort parameters for data in a CASTable

std(self[, axis, skipna, level, ddof, …])

Return the standard deviation of the values of each column

stderr(self[, casout])

Return the standard error of the values of each column

sum(self[, axis, skipna, level, …])

Return the sum of the values of each column

tail(self[, n, columns, bygroup_as_index, …])

Retrieve last n rows

to_clipboard(self, \*args, \*\*kwargs)

Write the table data to the clipboard

to_csv(self, \*args, \*\*kwargs)

Write table data to comma-separated values (CSV)

to_datastep_params(self)

Create a data step table specification

to_dense(self, \*args, \*\*kwargs)

Return dense representation of table data

to_dict(self, \*args, \*\*kwargs)

Convert table data to a Python dictionary

to_excel(self, \*args, \*\*kwargs)

Write table data to an Excel spreadsheet

to_frame(self[, sample_pct, sample_seed, …])

Retrieve entire table as a SASDataFrame

to_gbq(self, \*args, \*\*kwargs)

Write table data to a Google BigQuery table

to_hdf(self, \*args, \*\*kwargs)

Write table data to HDF

to_html(self, \*args, \*\*kwargs)

Render the table data to an HTML table

to_input_datastep_params(self)

Create an input data step table specification

to_json(self, \*args, \*\*kwargs)

Convert the table data to a JSON string

to_latex(self, \*args, \*\*kwargs)

Render the table data to a LaTeX tabular environment

to_msgpack(self, \*args, \*\*kwargs)

Write table data to msgpack object

to_outtable(self)

Create a copy of the CASTable object with only output table paramaters

to_outtable_params(self)

Create a copy of the CASTable parameters using only the output table parameters

to_params(self)

Return parameters of CASTable object

to_pickle(self, \*args, \*\*kwargs)

Pickle (serialize) the table data

to_records(self, \*args, \*\*kwargs)

Convert table data to record array

to_sparse(self, \*args, \*\*kwargs)

Convert table data to SparseDataFrame

to_sql(self, \*args, \*\*kwargs)

Write table records to SQL database

to_stata(self, \*args, \*\*kwargs)

Write table data to Stata file

to_string(self, \*args, \*\*kwargs)

Render the table to a console-friendly tabular output

to_table(self)

Create a copy of the CASTable object with only input table paramaters

to_table_name(self)

Return the name of the table

to_table_params(self)

Create a copy of the table parameters containing only input table parameters

to_view(self, \*args, \*\*kwargs)

Create a view using the current CASTable parameters

to_xarray(self, \*args, \*\*kwargs)

Represent table data as a numpy.xarray

tvalue(self[, casout])

Return the T-statistics for hypothesis testing of the values of each column

uss(self[, casout])

Return the uncorrected sum of squares of the values of each column

var(self[, axis, skipna, level, ddof, …])

Return the variance of the values of each column

with_params(self, \*\*kwargs)

Create copy of table with kwargs inserted as parameters

xs(self, key[, axis, level, copy, drop_level])

Return a cross-section from the CASTable

Attributes

all_params

at

axes

List of the row axis labels and column axis labels

columns

The visible columns in the table

created_date

Return the created date of the table in the server

dtypes

Series of the data types in the table

ftypes

Series of the ftypes (indication of sparse/dense and dtype) in the table

getdoc

iat

iloc

Integer-based indexer for selecting by position

index

The table index

ix

Label-based indexer with integer position fallback

last_accessed_date

Return the last access date of the table in the server

last_modified_date

Return the last modified date of the table in the server

loc

Label-based indexer

ndim

Number of axes dimensions

outtable_params

param_names

plot

Plot the data in the table

shape

Return a tuple representing the dimensionality of the table

size

Number of elements in the table

table_params

values

Numpy representation of the table