Programming Gotchas

There are a few areas in working with SWAT where you can get tripped up. We’ve tried to outline some of the common issues to watch out for.

Return Values

Since the SWAT API tries to blend the world of CAS and Pandas into a single world, you have to be aware of whether you are calling a CAS action or a method from the Pandas API. CAS actions will always return a CASResults object (which is a subclass of Python’s dictionary).

In [1]: out = tbl.summary()

In [2]: type(out)
Out[2]: swat.cas.results.CASResults

In [3]: out = tbl.serverstatus()

Note: Grid node action status report: 1 nodes, 11 total actions executed.
In [4]: type(out) Out[4]: swat.cas.results.CASResults

Methods from the Pandas API will typically return a CASTable, CASColumn, pandas.DataFrame, or pandas.Series object.

In [5]: out = tbl.head()

In [6]: type(out)
Out[6]: swat.SASDataFrame

In [7]: out = tbl.mean()

In [8]: type(out)
Out[8]: pandas.core.series.Series

In [9]: out = tbl.Make

In [10]: type(out)
Out[10]: swat.cas.table.CASColumn

Name Collisions

Much like the way pandas.DataFrames allows you to access column names as attributes as well as keys, some objects in the SWAT package also have multiple namespaces mapped to their attributes. This is especially true with CASTable objects.

CASTable objects have attributes that can come from various sources. These include real object attributes and methods, CAS action names, CAS table parameter names, and CAS table column names. Mapping all of these attributes into one namespace can create collisions. The most notable collisions on CASTable objects are groupby (method, CAS action, table parameter) and promote (CAS action, table parameter).

These collisions can manifest themselves in ways that seem confusing. Here is an example.

In [11]: tbl.groupby
Out[11]: <bound method CASTable.groupby of CASTable('TMPDY_Y2KZS', caslib='CASUSER(castest)')>

In [12]: tbl.groupby = ['Origin']

In [13]: tbl.groupby
Out[13]: <bound method CASTable.groupby of CASTable('TMPDY_Y2KZS', caslib='CASUSER(castest)', groupby=['Origin'])>

In [14]: tbl.params
Out[14]: {'caslib': 'CASUSER(castest)', 'groupby': ['Origin'], 'name': 'TMPDY_Y2KZS'}

As you can see, the groupby method is returned when getting the attribute, but the table groupby parameter is set when setting the attribute. The reason for this is that the dynamic look-thru for CAS actions and table parameters only happens if there isn’t a real Python attribute or method defined. In the case of CASTable objects, the groupby method is defined to match the pandas.DataFrame.groupby() method.

When setting attributes, the name of the attribute is checked against the valid parameter names for a table. If it matches the name of a table attribute, it is set as a CAS table parameter, otherwise it is just set on the object as a standard Python attribute.

While this attribute syntax can be convenient for interactive programming, because of the possibility of collisions, it’s generally useful to use the explicit namespace for table parameters when writing programs.

In [15]: tbl.params.groupby = ['Origin']

In [16]: tbl.params
Out[16]: {'caslib': 'CASUSER(castest)', 'groupby': ['Origin'], 'name': 'TMPDY_Y2KZS'}

Here is the simple.groupby CAS action explicitly accessed using the simple action set name.

In [17]: tbl[['Origin', 'Cylinders']].simple.groupby()
Out[17]: 
[Groupby]

 Groupby for TMPDY_Y2KZS
 
     Origin Origin_f  Cylinders   Cylinders_f  Rank
 0     Asia     Asia        NaN             .    14
 1     Asia     Asia        3.0             3    13
 2     Asia     Asia        4.0             4    12
 3     Asia     Asia        6.0             6    11
 4     Asia     Asia        8.0             8    10
 5   Europe   Europe        4.0             4     9
 6   Europe   Europe        5.0             5     8
 7   Europe   Europe        6.0             6     7
 8   Europe   Europe        8.0             8     6
 9   Europe   Europe       12.0            12     5
 10     USA      USA        4.0             4     4
 11     USA      USA        6.0             6     3
 12     USA      USA        8.0             8     2
 13     USA      USA       10.0            10     1

+ Elapsed: 0.0196s, user: 0.018s, sys: 0.007s, mem: 5.28mb

When getting the groupby attribute, it will always return the real Python groupby method, which corresponds to the pandas.DataFrame.groupby() method.

In [18]: tbl.groupby('Origin')
Out[18]: <swat.cas.table.CASTableGroupBy at 0x7f09494a5080>

Case-Sensitivity

While Python programming is case-sensitive, CAS is not. This means that CAS action names and parameters can be specified in any sort of mixed case. This can cause problems on clients where case-sensitivity matters though. For example, it’s possible to make a CAS action call as follows:

conn.summary(subset=['Max', 'Min']
             subSet=['N'])

While you would never actually type code in like that, you may build parameters up programmatically and mistakenly put in mixed case keys. When you send this action call to CAS the action will fail because they are considered duplicate keys.

The SWAT client automatically lower-cases all of the action and parameter names in the help content to encourage you to always use lower-case as well. It also uses all lower-cased parameter names in operations done behind the scenes. This convention gets as close to the Python convention of lower-cased, underscore-delimited names as possible even though it does cause some longer names to be a bit more difficult to read.

The one case where case-sensitivity does make a difference is in CAS action names. If the first letter of the CAS action name is capitalized (e.g., conn.Summary()), This causes an instance of a CAS action object to be returned rather than calling the action. This allows you to build action parameters in a more object oriented way and call the same action multiple times.

SASDataFrame vs pandas.DataFrame

For the most part, you don’t need to worry about the difference between a SASDataFrame and a panda.DataFrame. They work exactly the same way. The only difference is that a SASDataFrame contains extra attributes to store the SAS metadata such as title, label, and name, as well as colinfo for the column metadata. The only time you need to be concerned about the difference is if you are doing operations on a SASDataFrame such as pandas.concat() that end up returning a pandas.DataFrame. In cases such as that, you will lose the SAS metadata on the result.

Weak CAS Object References

CASTable objects can only call CAS actions on CAS connections that still exist in the Python namespace (meaning, they haven’t been deleted or overwritten with another object). If you delete the CAS object that a CASTable object is associated with then try to call a CAS action on that CASTable, the action call will fail. This is due to the fact that CASTable objects only keep a weak reference to the CAS connection. However, you can re-associate a connection with the CASTable using the CASTable.set_connection() method.