Getting Started

Before you can use the SWAT package, you will need a running CAS server. The SWAT package can connect to either the binary port or the HTTP port. If you have the option of either, the binary protocol will give you better performance.

Other than the CAS host and port, you just need a user name and password to connect. User names and passwords can be implemented in various ways, so you may need to see your system administrator on how to acquire an account.

To connect to a CAS server, you simply import SWAT and use the swat.CAS class to create a connection. This has a couple of different forms. The most basic is to pass the hostname, port, username, and password.

In [1]: import swat

In [2]: conn = swat.CAS(host, port, username, password)

However, if you are using a REST connection to CAS, a URL is the more natural way to specify a host, port, and protocol.

In [3]: conn = swat.CAS('https://my-cas-host.com:443/cas-shared-default-http/',
   ...:                 username='...', password='...')
   ...: 

Notice that in the URL case, username and password, must be specified as keyword parameters since the port parameter is being skipped. Also, in this case we are using a proxy server that requires the base path of ‘cas-shared-default-http’. If you are connecting directly to a CAS server, this is typically not required.

Now that we have a connection to CAS, we can run some actions on it.

Running CAS Actions

To test your connection, you can run the serverstatus action.

In [4]: out = conn.serverstatus()

Note: Grid node action status report: 1 nodes, 8 total actions executed. In [5]: out Out[5]: [About] {'CAS': 'Cloud Analytic Services', 'Version': '4.00', 'VersionLong': 'V.04.00M0D11102022', 'Copyright': 'Copyright © 2014-2022 SAS Institute Inc. All Rights Reserved.', 'ServerTime': '2022-11-11T14:07:36Z', 'System': {'Hostname': 'snap018', 'OS Name': 'Linux', 'OS Family': 'LIN X64', 'OS Release': '2.6.32-358.2.1.el6.x86_64', 'OS Version': '#1 SMP Wed Feb 20 12:17:37 EST 2013', 'Model Number': 'x86_64', 'Linux Distribution': 'Red Hat Enterprise Linux Server release 6.2 (Santiago)'}, 'Documentation': 'http://mycompany.com:8080/job/Actions_ref_doc/ws/casaref/index.html', 'license': {'site': 'SAS Institute Inc.', 'siteNum': 1, 'expires': '05Jan2023:00:00:00', 'gracePeriod': 62, 'warningPeriod': 31}, 'CASHostAccountRequired': 'OPTIONAL', 'Transferred': 'NO', 'CASCacheLocation': 'CAS Disk Cache'} [server] Server Status nodes actions 0 1 8 [nodestatus] Node Status name role uptime running stalled 0 snap018 controller 0.269 0 0 + Elapsed: 0.0014s, user: 0.001s, mem: 0.304mb

Handling the Output

All CAS actions return a CASResults object. This is simply an ordered Python dictionary with a few extra methods and attributes added. In the output above, you’ll see the keys of the dictionary surrounded in square brackets. They are ‘About’, ‘server’, and ‘nodestatus’. Since this is a dictionary, you can just use the standard way of accessing keys.

In [6]: out['nodestatus']
Out[6]: 
Node Status

      name        role  uptime  running  stalled
0  snap018  controller   0.269        0        0

In addition, you can access the keys as attributes. This convenience was added to keep your code looking a bit cleaner. However, be aware that if the name of a key collides with a standard Python attribute or method, you’ll get that attribute or method instead. So this form is fine for interactive programming, but you may want to use the syntax above for actual programs.

In [7]: out.nodestatus
Out[7]: 
Node Status

      name        role  uptime  running  stalled
0  snap018  controller   0.269        0        0

The types of the result keys can vary as well. In this case, the ‘About’ key holds a dictionary. The ‘server’ and ‘nodestatus’ keys hold SASDataFrame objects (a subclass of pandas.DataFrame).

In [8]: for key, value in out.items():
   ...:     print(key, type(value))
   ...: 
About <class 'dict'>
server <class 'swat.SASDataFrame'>
nodestatus <class 'swat.SASDataFrame'>

Since the values in the result are standard Python (and pandas) objects, you can work with them as you normally do.

In [9]: out.nodestatus.role
Out[9]: 
0    controller
Name: role, dtype: object

In [10]: out.About['Version']
Out[10]: '4.00'

Simple Statistics

We can’t have a getting started section without doing some sort of statistical analysis. First, we need to see what CAS action sets are loaded. We can get a listing of all of the action sets and actions using the help CAS action. If you run help without any arguments, it will display all of the loaded actions and their descriptions. Rather than printing that large listing, we’ll specifically ask for the simple action set since we already know that’s the one we want.

In [11]: conn.help(actionset='simple');

Let’s start with the summary action. Of course, we first need to load some data. The simplest way to load data is to do it from the client side. Note that while this is the simplest way, it’s probably not the best way for large data sets. Those should be loaded from the server side if possible.

The CAS.read_csv() method works just like the pandas.read_csv() function. In fact, CAS.read_csv() uses pandas.read_csv() in the background. When pandas.read_csv() finishes parsing the CSV file into a pandas.DataFrame, it gets uploaded to a CAS table by CAS.read_csv(). The returned object is a CASTable object.

In [12]: tbl = conn.read_csv('https://raw.githubusercontent.com/'
   ....:                     'sassoftware/sas-viya-programming/master/data/cars.csv')
   ....: 

Note: Cloud Analytic Services made the uploaded file available as table TMPC8DLCINH in caslib CASUSER(castest).

Note: The table TMPC8DLCINH has been created in caslib CASUSER(castest) from binary data uploaded to Cloud Analytic Services.

CASTable objects are essentially client-side views of the table of data in the CAS server. You can interact with them using CAS actions as well as many of the pandas.DataFrame methods and attributes. The pandas.DataFrame API is mirrored as much as possible, the only difference is that behind-the-scenes the real work is being done by CAS.

If you don’t want the difficult-to-read generated name for a table, you can specify one using the casout= parameter.

In [13]: tbl = conn.read_csv('https://raw.githubusercontent.com/'
   ....:                     'sassoftware/sas-viya-programming/master/data/cars.csv',
   ....:                     casout='cars')
   ....: 

Note: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(castest).

Note: The table CARS has been created in caslib CASUSER(castest) from binary data uploaded to Cloud Analytic Services.

Since we started down this path with the intent to use the summary action, let’s do that first.

In [14]: out = conn.summary(table=tbl)

In [15]: out
Out[15]: 
[Summary]

 Descriptive Statistics for CARS
 
         Column      Min       Max      N  NMiss          Mean         Sum           Std      StdErr           Var           USS           CSS         CV      TValue          ProbT  Skewness   Kurtosis
 0         MSRP  10280.0  192465.0  428.0    0.0  32774.855140  14027638.0  19431.716674  939.267478  3.775916e+08  6.209854e+11  1.612316e+11  59.288490   34.894059  4.160412e-127  2.798099  13.879206
 1      Invoice   9875.0  173560.0  428.0    0.0  30014.700935  12846292.0  17642.117750  852.763949  3.112443e+08  5.184789e+11  1.329013e+11  58.778256   35.196963  2.684398e-128  2.834740  13.946164
 2   EngineSize      1.3       8.3  428.0    0.0      3.196729      1368.2      1.108595    0.053586  1.228982e+00  4.898540e+03  5.247754e+02  34.679034   59.656105  3.133745e-209  0.708152   0.541944
 3    Cylinders      3.0      12.0  426.0    2.0      5.807512      2474.0      1.558443    0.075507  2.428743e+00  1.540000e+04  1.032216e+03  26.834946   76.913766  1.515569e-251  0.592785   0.440378
 4   Horsepower     73.0     500.0  428.0    0.0    215.885514     92399.0     71.836032    3.472326  5.160415e+03  2.215110e+07  2.203497e+06  33.275059   62.173176  4.185344e-216  0.930331   1.552159
 5     MPG_City     10.0      60.0  428.0    0.0     20.060748      8586.0      5.238218    0.253199  2.743892e+01  1.839580e+05  1.171642e+04  26.111777   79.229235  1.866284e-257  2.782072  15.791147
 6  MPG_Highway     12.0      66.0  428.0    0.0     26.843458     11489.0      5.741201    0.277511  3.296139e+01  3.224790e+05  1.407451e+04  21.387709   96.729204  1.665621e-292  1.252395   6.045611
 7       Weight   1850.0    7190.0  428.0    0.0   3577.953271   1531364.0    758.983215   36.686838  5.760555e+05  5.725125e+09  2.459757e+08  21.212776   97.526890  5.812547e-294  0.891824   1.688789
 8    Wheelbase     89.0     144.0  428.0    0.0    108.154206     46290.0      8.311813    0.401767  6.908624e+01  5.035958e+06  2.949982e+04   7.685150  269.196577   0.000000e+00  0.962287   2.133649
 9       Length    143.0     238.0  428.0    0.0    186.362150     79763.0     14.357991    0.694020  2.061519e+02  1.495283e+07  8.802687e+04   7.704349  268.525733   0.000000e+00  0.181977   0.614725

+ Elapsed: 0.015s, user: 0.012s, sys: 0.003s, mem: 4.51mb

In addition, you can also call the summary action directly on the CASTable object. It will automatically populate the table= parameter.

In [16]: out = tbl.summary()

In [17]: out
Out[17]: 
[Summary]

 Descriptive Statistics for CARS
 
         Column      Min       Max      N  NMiss          Mean         Sum           Std      StdErr           Var           USS           CSS         CV      TValue          ProbT  Skewness   Kurtosis
 0         MSRP  10280.0  192465.0  428.0    0.0  32774.855140  14027638.0  19431.716674  939.267478  3.775916e+08  6.209854e+11  1.612316e+11  59.288490   34.894059  4.160412e-127  2.798099  13.879206
 1      Invoice   9875.0  173560.0  428.0    0.0  30014.700935  12846292.0  17642.117750  852.763949  3.112443e+08  5.184789e+11  1.329013e+11  58.778256   35.196963  2.684398e-128  2.834740  13.946164
 2   EngineSize      1.3       8.3  428.0    0.0      3.196729      1368.2      1.108595    0.053586  1.228982e+00  4.898540e+03  5.247754e+02  34.679034   59.656105  3.133745e-209  0.708152   0.541944
 3    Cylinders      3.0      12.0  426.0    2.0      5.807512      2474.0      1.558443    0.075507  2.428743e+00  1.540000e+04  1.032216e+03  26.834946   76.913766  1.515569e-251  0.592785   0.440378
 4   Horsepower     73.0     500.0  428.0    0.0    215.885514     92399.0     71.836032    3.472326  5.160415e+03  2.215110e+07  2.203497e+06  33.275059   62.173176  4.185344e-216  0.930331   1.552159
 5     MPG_City     10.0      60.0  428.0    0.0     20.060748      8586.0      5.238218    0.253199  2.743892e+01  1.839580e+05  1.171642e+04  26.111777   79.229235  1.866284e-257  2.782072  15.791147
 6  MPG_Highway     12.0      66.0  428.0    0.0     26.843458     11489.0      5.741201    0.277511  3.296139e+01  3.224790e+05  1.407451e+04  21.387709   96.729204  1.665621e-292  1.252395   6.045611
 7       Weight   1850.0    7190.0  428.0    0.0   3577.953271   1531364.0    758.983215   36.686838  5.760555e+05  5.725125e+09  2.459757e+08  21.212776   97.526890  5.812547e-294  0.891824   1.688789
 8    Wheelbase     89.0     144.0  428.0    0.0    108.154206     46290.0      8.311813    0.401767  6.908624e+01  5.035958e+06  2.949982e+04   7.685150  269.196577   0.000000e+00  0.962287   2.133649
 9       Length    143.0     238.0  428.0    0.0    186.362150     79763.0     14.357991    0.694020  2.061519e+02  1.495283e+07  8.802687e+04   7.704349  268.525733   0.000000e+00  0.181977   0.614725

+ Elapsed: 0.014s, user: 0.013s, sys: 0.002s, mem: 4.51mb

Again, the output is a CASResults object (a subclass of a Python dictionary), so we can pull off the keys we want (there is only one in this case). This key contains a SASDataFrame, but since it’s a subclass of pandas.DataFrame, you can do all of the standard DataFrame operations on it.

In [18]: summ = out.Summary

In [19]: summ = summ.set_index('Column')

In [20]: summ.loc['Cylinders', 'Max']
Out[20]: 12.0

Loading CAS Action Sets

While CAS comes with a few pre-loaded action sets, you will likely want to load action sets with other capabilities such as percentiles, Data step, SQL, or even machine learning. Most action sets will require a license to run them, so you’ll have to take care of those issues before you can load them.

The action used to load action sets is called loadactionset.

In [21]: conn.loadactionset('percentile')

Note: Added action set 'percentile'.
Out[21]: [actionset] 'percentile' + Elapsed: 0.000695s, mem: 0.203mb

Once you load an action set, its actions will be automatically added as methods to the CAS connection and any CASTable objects associated with that connection.

In [22]: tbl.percentile()
Out[22]: 
[Percentile]

 Percentiles for CARS
 
        Variable  Pctl     Value  Converged
 0          MSRP  25.0  20329.50        1.0
 1          MSRP  50.0  27635.00        1.0
 2          MSRP  75.0  39215.00        1.0
 3       Invoice  25.0  18851.00        1.0
 4       Invoice  50.0  25294.50        1.0
 5       Invoice  75.0  35732.50        1.0
 6    EngineSize  25.0      2.35        1.0
 7    EngineSize  50.0      3.00        1.0
 8    EngineSize  75.0      3.90        1.0
 9     Cylinders  25.0      4.00        1.0
 10    Cylinders  50.0      6.00        1.0
 11    Cylinders  75.0      6.00        1.0
 12   Horsepower  25.0    165.00        1.0
 13   Horsepower  50.0    210.00        1.0
 14   Horsepower  75.0    255.00        1.0
 15     MPG_City  25.0     17.00        1.0
 16     MPG_City  50.0     19.00        1.0
 17     MPG_City  75.0     21.50        1.0
 18  MPG_Highway  25.0     24.00        1.0
 19  MPG_Highway  50.0     26.00        1.0
 20  MPG_Highway  75.0     29.00        1.0
 21       Weight  25.0   3103.00        1.0
 22       Weight  50.0   3474.50        1.0
 23       Weight  75.0   3978.50        1.0
 24    Wheelbase  25.0    103.00        1.0
 25    Wheelbase  50.0    107.00        1.0
 26    Wheelbase  75.0    112.00        1.0
 27       Length  25.0    178.00        1.0
 28       Length  50.0    187.00        1.0
 29       Length  75.0    194.00        1.0

+ Elapsed: 0.0423s, user: 0.08s, sys: 0.02s, mem: 10.8mb

Note that the percentile action set has an action called percentile in it. you can call the action either as tbl.percentile or tbl.percentile.percentile.

CAS Tables as DataFrames

As we mentioned previously, CASTable objects implement many of the pandas.DataFrame methods and properties. This means that you can use the familiar pandas.DataFrame API, but use it on data that is far too large for pandas to handle. Here are a few simple examples.

In [23]: tbl.head()
Out[23]: 
Selected Rows from Table CARS

    Make           Model   Type Origin DriveTrain     MSRP  Invoice  EngineSize  Cylinders  Horsepower  MPG_City  MPG_Highway  Weight  Wheelbase  Length
0  Acura             MDX    SUV   Asia        All  36945.0  33337.0         3.5        6.0       265.0      17.0         23.0  4451.0      106.0   189.0
1  Acura  RSX Type S 2dr  Sedan   Asia      Front  23820.0  21761.0         2.0        4.0       200.0      24.0         31.0  2778.0      101.0   172.0
2  Acura         TSX 4dr  Sedan   Asia      Front  26990.0  24647.0         2.4        4.0       200.0      22.0         29.0  3230.0      105.0   183.0
3  Acura          TL 4dr  Sedan   Asia      Front  33195.0  30299.0         3.2        6.0       270.0      20.0         28.0  3575.0      108.0   186.0
4  Acura      3.5 RL 4dr  Sedan   Asia      Front  43755.0  39014.0         3.5        6.0       225.0      18.0         24.0  3880.0      115.0   197.0
In [24]: tbl.describe()
Out[24]: 
                MSRP        Invoice  EngineSize   Cylinders  Horsepower    MPG_City  MPG_Highway       Weight   Wheelbase      Length
count     428.000000     428.000000  428.000000  426.000000  428.000000  428.000000   428.000000   428.000000  428.000000  428.000000
mean    32774.855140   30014.700935    3.196729    5.807512  215.885514   20.060748    26.843458  3577.953271  108.154206  186.362150
std     19431.716674   17642.117750    1.108595    1.558443   71.836032    5.238218     5.741201   758.983215    8.311813   14.357991
min     10280.000000    9875.000000    1.300000    3.000000   73.000000   10.000000    12.000000  1850.000000   89.000000  143.000000
25%     20329.500000   18851.000000    2.350000    4.000000  165.000000   17.000000    24.000000  3103.000000  103.000000  178.000000
50%     27635.000000   25294.500000    3.000000    6.000000  210.000000   19.000000    26.000000  3474.500000  107.000000  187.000000
75%     39215.000000   35732.500000    3.900000    6.000000  255.000000   21.500000    29.000000  3978.500000  112.000000  194.000000
max    192465.000000  173560.000000    8.300000   12.000000  500.000000   60.000000    66.000000  7190.000000  144.000000  238.000000
In [25]: tbl[['MSRP', 'Invoice']].describe(percentiles=[0.3, 0.7])
Out[25]: 
                MSRP        Invoice
count     428.000000     428.000000
mean    32774.855140   30014.700935
std     19431.716674   17642.117750
min     10280.000000    9875.000000
30%     22000.000000   20284.000000
50%     27635.000000   25294.500000
70%     35940.000000   32997.000000
max    192465.000000  173560.000000

For more information about CASTable, see the API Reference.

Closing the Connection

When you are finished with the connection, it’s always a good idea to close it.

In [26]: conn.close()

Authentication

While it is possible to put your username and password in the CAS constructor, it’s generally not a good idea to have a password in your code. To get around this issue, the CAS class supports authinfo files. Authinfo files are a a file used to store username and password information for specified hostname and port. They are protected by file permissions so that only you can read them. This allows you to set and protect your passwords in one place and have them used by all of your programs.

The format of the file is as follows:

host HOST user USERNAME password PASSWORD port PORT

machine is a synonym for host, login and account are synonyms for user, and protocol is a synonym for port.

You can specify as many of the host lines as possible. The port field is optional. If it is left off, all ports will use the same password. Hostnames much match the hostname used in the CAS constructor exactly. It does not do any DNS expanding of the names. So ‘host1’ and ‘host1.my-company.com’ are considered two different hosts.

Here is an exmaple for a user named ‘user01’ and password ‘!s3cret’ on host ‘cas.my-company.com’ and port 12354:

host cas.my-company.com port 12354 user user01 password !s3cret

By default, the authinfo files are looked for in your home directory under the name .authinfo. You can also use the name .netrc which is the name of an older specification that authinfo was based on.

The permissions on the file must be readable and writable by the owner only. This is done with the following command:

chmod 0600 ~/.authinfo

If you don’t want to use an authinfo in your home directory, you can specify the name of a file explicitly using the authinfo= parameter.

In [27]: conn = swat.CAS('cas.my-company.com', 12354, authinfo='/path/to/authinfo.txt')