swat.cas.table.CASTable¶
-
class
swat.cas.table.
CASTable
(name, **table_params)¶ Bases: swat.cas.utils.params.ParamManager, swat.cas.utils.params.ActionParamManager
Object for interacting with CAS tables
CASTable objects can be used in multiple ways. They can be used as simply a container of table parameters and used as CAS action parameter values. If a connection is associated with it (either by instantiating it from CAS.CASTable() or using set_connection()), it can be used to call CAS actions on the table. Finally, it supports much of the pandas.DataFrame API, so it can interact with CAS tables in much the same way you interact with local data.
The parameters below are a superset of all of the available parameters. Some CAS actions may not support all parameters. You will need to see the help for each CAS action on what it supports.
- Parameters
- namestring or CASTable
specifies the name of the table to use.
- caslibstring, optional
specifies the caslib containing the table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
- wherestring, optional
specifies an expression for subsetting the input data.
- groupbylist of dicts, optional
specifies the names of the variables to use for grouping results.
- groupbyfmtslist, optional
specifies the format to apply to each group-by variable. To avoid specifying a format for a group-by variable, use “” (no format).
Default: []- orderbylist of dicts, optional
specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables or it can be combined with groupBy variables when groupByMode is set to REDISTRIBUTE.
- computedvarslist of dicts, optional
specifies the names of the computed variables to create. Specify an expression for each parameter in the computedvarsprogram parameter.
- computedvarsprogramstring, optional
specifies an expression for each variable that you included in the computedvars parameter.
- groupbymodestring, optional
specifies how the server creates groups.
Default: NOSORT Values: NOSORT, REDISTRIBUTE- computedondemandboolean, optional
when set to True, the computed variables specified in the compVars parameter are created when the table is loaded instead of when the action begins.
Default: False- singlepassboolean, optional
when set to True, the data does not create a transient table in the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
Default: False- importoptionsdict, optional
specifies the settings for reading a table from a data source.
- ondemandboolean, optional
when set to True, table access is less aggressive with virtual memory use.
Default: True- varslist of dicts, optional
specifies the variables to use in the action.
- timestampstring, optional
specifies the timestamp to apply to the table. Specify the value in the form that is appropriate for your session locale. Used only on output table definitions.
- compressboolean, optional
when set to True, data compression is applied to the table. Used only on output table definitions.
Default: False- replaceboolean, optional
specifies whether to overwrite an existing table with the same name. Used only on output table definitions.
Default: False- replicationint32, optional
specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Used only on output table definitions.
Default: 1
Note: Value range is 0 <= n < 2147483647- threadblocksizeint64, optional
specifies the number of bytes to use for blocks that are read by threads. Increase this value only if you have a large table and CPU utilization by threads shows thread starvation. Used only on output table definitions.
Note: Value range is 0 <= n < 9223372036854775807- labelstring, optional
specifies the descriptive label to associate with the table.
- maxmemsizeint64, optional
specifies the maximum amount of physical memory, in bytes, to allocate for the table. After this threshold is reached, the server uses temporary files and operating system facilities for memory management. Used only on output table definitions.
Default: 0- promoteboolean, optional
when set to True, the output table is added with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope. Used only on output table definitions.
Default: False- ondemandboolean, optional
when set to True, table access is less aggressive with virtual memory use. Used only on output table definitions.
Default: True
- Returns
Examples
Create a CASTable registered to conn.
>>> conn = swat.CAS() >>> iris = conn.CASTable('iris')
Use the table as a CAS action parameter.
>>> summ = conn.summary(table=iris) >>> print(summ)
Call a CAS action directly on the CASTable.
>>> summ = iris.summary() >>> print(summ)
Use a CASTable as an output table definition.
>>> summout = conn.summary(table=iris, ... casout=swat.CASTable('summout', replace=True)) >>> print(summout)
Use a CASTable like a pandas.DataFrame
>>> print(iris.head()) >>> print(iris[['petal_length', 'petal_width']].describe())
-
__init__
(self, name, **table_params)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__(self, name, \*\*table_params)
Initialize self.
abs(self)
Return a new CASTable with absolute values of numerics
all(self[, axis, bool_only, skipna, level])
Return True for each column with only elements that evaluate to true
any(self[, axis, bool_only, skipna, level])
Return True for each column with at least one true element
append(self, other[, ignore_index, …])
Append rows of other to self
append_columns(self, \*items, \*\*kwargs)
Append variable names to action inputs parameter
append_computed_columns(self, names, code[, …])
Append computed columns as specified
append_computedvars(self, \*items, \*\*kwargs)
Append variable names to computedvars parameter
append_computedvarsprogram(self, \*items, …)
Append code to computedvarsprogram parameter
append_groupby(self, \*items, \*\*kwargs)
Append variable names to groupby parameter
append_orderby(self, \*items, \*\*kwargs)
Append orderby parameters
append_where(self, \*items, \*\*kwargs)
Append code to where parameter
as_matrix(self[, columns, n])
Represent CASTable as a Numpy array
boxplot(self[, column, by])
Make a boxplot from the table data
clip(self[, lower, upper, axis])
Clip values at thresholds
clip_lower(self, threshold[, axis])
Clip values at lower threshold
clip_upper(self, threshold[, axis])
Clip values at upper threshold
copy(self[, deep, exclude])
Make a copy of the CASTable object
corr(self[, method, min_periods])
Compute pairwise correlation of columns
count(self[, axis, level, numeric_only])
Return total number of non-missing values in each column
css(self[, casout])
Return the corrected sum of squares of the values of each column
cv(self[, casout])
Return the coefficient of variation of the values of each column
datastep(self, code[, casout])
Execute Data step code against the table
del_action_params(self, \*names)
Delete parameters for specified action names
del_param(self, \*keys)
Delete parameters
del_params(self, \*keys)
Delete parameters
describe(self[, percentiles, include, …])
Get descriptive statistics
drop(self, labels[, axis, level, inplace, …])
Return a new CASTable object with the specified columns removed
drop_duplicates(self, casout[, subset])
Remove duplicate rows from a CASTable.
dropna(self[, axis, how, thresh, subset, …])
Drop rows that contain missing values
eval(self, expr[, inplace, kwargs])
Evaluate a CAS table expression
exists(self)
Return True if table exists in the server
fillna(self[, value, method, axis, inplace, …])
Fill missing values using the specified method
from_csv(connection, path[, casout])
Create a CASTable from a CSV file
from_dict(connection, data[, casout])
Create a CASTable from a dictionary
from_items(connection, items[, casout])
Create a CASTable from a (key, value) pairs
from_records(connection, data[, casout])
Create a CASTable from records
get(self, key[, default])
Get item from object for given key (ex: DataFrame column)
get_action_names(self)
Return a list of available CAS actions
get_action_params(self, name, \*default)
Return parameters for specified action name
get_actionset_names(self)
Return a list of available actionsets
get_connection(self)
Get the registered connection object
get_dtype_counts(self)
Retrieve the frequency of CAS table column data types
get_fetch_params(self)
Return options to be used during the table.fetch action
get_ftype_counts(self)
Retrieve the frequency of CAS table column data types
get_groupby_vars(self)
Return a list of By group variable names
get_inputs_param(self)
Return the column names for the inputs= action parameter
get_param(self, key, \*default)
Return the value of a parameter
get_params(self, \*keys)
Return the values of one or more parameters
get_value(self, index, col, \*\*kwargs)
Retrieve a single scalar value
groupby(self, by[, axis, level, as_index, …])
Specify grouping variables for the table
has_groupby_vars(self)
Return True if the table has By group variables configured
has_param(self, \*keys)
Return True if the specified parameters exist
has_params(self, \*keys)
Return True if the specified parameters exist
head(self[, n, columns, bygroup_as_index, …])
Retrieve first n rows
hist(self[, column, by])
Make a histogram from the table data
info(self[, verbose, buf, max_cols, …])
Print summary of CASTable information
invoke(self, _name_, \*\*kwargs)
Invoke an action on the registered connection
iteritems(self)
Iterate over column names and CASColumn objects
iterrows(self[, chunksize])
Iterate over the rows of a CAS table as (index, pandas.Series) pairs
itertuples(self[, index, chunksize])
Iterate over rows as tuples
kurt(self[, axis, skipna, level, …])
Return the kurtosis of the values of each column
kurtosis(self[, axis, skipna, level, …])
Return the kurtosis of the values of each column
lookup(self, row_labels, col_labels)
Retrieve values indicated by row_labels, col_labels positions
max(self[, axis, skipna, level, …])
Return the maximum value of each column
mean(self[, axis, skipna, level, …])
Return the mean value of each column
median(self[, axis, skipna, level, …])
Return the median value of each numeric column
merge(self, right[, how, on, left_on, …])
Merge CASTable objects using a database-style join on a column
min(self[, axis, skipna, level, …])
Return the minimum value of each column
mode(self[, axis, numeric_only, max_tie, skipna])
Return the mode of each column
next(self)
Return next item in the iteration
nlargest(self, n, columns[, keep, casout])
Return the n largest values ordered by columns
nmiss(self[, axis, level, numeric_only, casout])
Return total number of missing values in each column
nsmallest(self, n, columns[, keep, casout])
Return the n smallest values ordered by columns
nth(self, n[, dropna, bygroup_as_index, casout])
Return the nth row
nunique(self[, dropna, casout])
Return number of unique elements per column in the CASTable
pop(self, colname)
Remove a column from the CASTable and return it
probt(self[, casout])
Return the p-value of the T-statistics of the values of each column
quantile(self[, q, axis, numeric_only, …])
Return values at the given quantile
query(self, expr[, inplace, engine])
Query the table with a boolean expression
rename(self, columns[, errors])
Rename columns of the CASTable.
replace(self[, to_replace, value, inplace, …])
Replace values in the data set
reset_index(self[, level, drop, inplace, …])
Reset the CASTable index
retrieve(self, _name_, \*\*kwargs)
Invoke an action on the registered connection and retrieve results
sample(self[, n, frac, replace, weights, …])
Returns a random sample of the table rows
select_dtypes(self[, include, exclude, inplace])
Return a subset CASTable including/excluding columns based on data type
set_action_params(self, name, \*\*kwargs)
Set parameters for specified action name
set_connection(self, connection)
Set the connection to use for action calls
set_param(self, \*args, \*\*kwargs)
Set paramaters according to key-value pairs
set_params(self, \*args, \*\*kwargs)
Set paramaters according to key-value pairs
skew(self[, axis, skipna, level, …])
Return the skewness of the values of each column
skewness(self[, axis, skipna, level, …])
Return the skewness of the values of each column
slice(self[, start, stop, columns, …])
Retrieve the specified rows
sort(self, by[, axis, ascending, inplace, …])
Specify sort parameters for data in a CASTable
sort_values(self, by[, axis, ascending, …])
Specify sort parameters for data in a CASTable
std(self[, axis, skipna, level, ddof, …])
Return the standard deviation of the values of each column
stderr(self[, casout])
Return the standard error of the values of each column
sum(self[, axis, skipna, level, …])
Return the sum of the values of each column
tail(self[, n, columns, bygroup_as_index, …])
Retrieve last n rows
to_clipboard(self, \*args, \*\*kwargs)
Write the table data to the clipboard
to_csv(self, \*args, \*\*kwargs)
Write table data to comma-separated values (CSV)
to_datastep_params(self)
Create a data step table specification
to_dense(self, \*args, \*\*kwargs)
Return dense representation of table data
to_dict(self, \*args, \*\*kwargs)
Convert table data to a Python dictionary
to_excel(self, \*args, \*\*kwargs)
Write table data to an Excel spreadsheet
to_frame(self[, sample_pct, sample_seed, …])
Retrieve entire table as a SASDataFrame
to_gbq(self, \*args, \*\*kwargs)
Write table data to a Google BigQuery table
to_hdf(self, \*args, \*\*kwargs)
Write table data to HDF
to_html(self, \*args, \*\*kwargs)
Render the table data to an HTML table
to_input_datastep_params(self)
Create an input data step table specification
to_json(self, \*args, \*\*kwargs)
Convert the table data to a JSON string
to_latex(self, \*args, \*\*kwargs)
Render the table data to a LaTeX tabular environment
to_msgpack(self, \*args, \*\*kwargs)
Write table data to msgpack object
to_outtable(self)
Create a copy of the CASTable object with only output table paramaters
to_outtable_params(self)
Create a copy of the CASTable parameters using only the output table parameters
to_params(self)
Return parameters of CASTable object
to_pickle(self, \*args, \*\*kwargs)
Pickle (serialize) the table data
to_records(self, \*args, \*\*kwargs)
Convert table data to record array
to_sparse(self, \*args, \*\*kwargs)
Convert table data to SparseDataFrame
to_sql(self, \*args, \*\*kwargs)
Write table records to SQL database
to_stata(self, \*args, \*\*kwargs)
Write table data to Stata file
to_string(self, \*args, \*\*kwargs)
Render the table to a console-friendly tabular output
to_table(self)
Create a copy of the CASTable object with only input table paramaters
to_table_name(self)
Return the name of the table
to_table_params(self)
Create a copy of the table parameters containing only input table parameters
to_view(self, \*args, \*\*kwargs)
Create a view using the current CASTable parameters
to_xarray(self, \*args, \*\*kwargs)
Represent table data as a numpy.xarray
tvalue(self[, casout])
Return the T-statistics for hypothesis testing of the values of each column
uss(self[, casout])
Return the uncorrected sum of squares of the values of each column
var(self[, axis, skipna, level, ddof, …])
Return the variance of the values of each column
with_params(self, \*\*kwargs)
Create copy of table with kwargs inserted as parameters
xs(self, key[, axis, level, copy, drop_level])
Return a cross-section from the CASTable
Attributes
all_params
at
List of the row axis labels and column axis labels
columns
The visible columns in the table
created_date
Return the created date of the table in the server
Series of the data types in the table
Series of the ftypes (indication of sparse/dense and dtype) in the table
getdoc
iat
Integer-based indexer for selecting by position
index
The table index
Label-based indexer with integer position fallback
last_accessed_date
Return the last access date of the table in the server
last_modified_date
Return the last modified date of the table in the server
Label-based indexer
Number of axes dimensions
outtable_params
param_names
Plot the data in the table
Return a tuple representing the dimensionality of the table
Number of elements in the table
table_params
Numpy representation of the table