Sorting¶
Since data in CAS can be spread across many machines and may even be redistributed depending on events that occur, the data is not stored in an ordered form. In general, when using statistical actions, this doesn’t make much difference since the CAS actions doing the work will handle the data regardless of the order that it is in. However, when you are bringing a table of data back to the client from CAS using the fetch action, you may want to have it come back in a sorted form.
In [1]: conn = swat.CAS(host, port, username, password) In [2]: tbl = conn.read_csv('https://raw.githubusercontent.com/' ...: 'sassoftware/sas-viya-programming/master/data/cars.csv') ...:
Note: Cloud Analytic Services made the uploaded file available as table TMPRZ6BHYVL in caslib CASUSER(castest).
Note: The table TMPRZ6BHYVL has been created in caslib CASUSER(castest) from binary data uploaded to Cloud Analytic Services. In [3]: tbl.fetch(to=5) Out[3]: [Fetch] Selected Rows from Table TMPRZ6BHYVL Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length 0 Acura MDX SUV Asia All 36945.0 33337.0 3.5 6.0 265.0 17.0 23.0 4451.0 106.0 189.0 1 Acura RSX Type S 2dr Sedan Asia Front 23820.0 21761.0 2.0 4.0 200.0 24.0 31.0 2778.0 101.0 172.0 2 Acura TSX 4dr Sedan Asia Front 26990.0 24647.0 2.4 4.0 200.0 22.0 29.0 3230.0 105.0 183.0 3 Acura TL 4dr Sedan Asia Front 33195.0 30299.0 3.2 6.0 270.0 20.0 28.0 3575.0 108.0 186.0 4 Acura 3.5 RL 4dr Sedan Asia Front 43755.0 39014.0 3.5 6.0 225.0 18.0 24.0 3880.0 115.0 197.0 + Elapsed: 0.00744s, user: 0.006s, sys: 0.001s, mem: 2.04mb In [4]: tbl.fetch(to=5, sortby=['MSRP']) Out[4]: [Fetch] Selected Rows from Table TMPRZ6BHYVL Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length 0 Kia Rio 4dr manual Sedan Asia Front 10280.0 9875.0 1.6 4.0 104.0 26.0 33.0 2403.0 95.0 167.0 1 Hyundai Accent 2dr hatch Sedan Asia Front 10539.0 10107.0 1.6 4.0 103.0 29.0 33.0 2255.0 96.0 167.0 2 Toyota Echo 2dr manual Sedan Asia Front 10760.0 10144.0 1.5 4.0 108.0 35.0 43.0 2035.0 93.0 163.0 3 Saturn Ion1 4dr Sedan USA Front 10995.0 10319.0 2.2 4.0 140.0 26.0 35.0 2692.0 103.0 185.0 4 Kia Rio 4dr auto Sedan Asia Front 11155.0 10705.0 1.6 4.0 104.0 25.0 32.0 2458.0 95.0 167.0 + Elapsed: 0.012s, user: 0.009s, sys: 0.002s, mem: 5.62mb
Of course, it is possible to set the direction of the sorting as well.
In [5]: tbl.fetch(to=5, sortby=[{'name':'MSRP', 'order':'descending'}])
Out[5]:
[Fetch]
Selected Rows from Table TMPRZ6BHYVL
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Porsche 911 GT2 2dr Sports Europe Rear 192465.0 173560.0 3.6 6.0 477.0 17.0 24.0 3131.0 93.0 175.0
1 Mercedes-Benz CL600 2dr Sedan Europe Rear 128420.0 119600.0 5.5 12.0 493.0 13.0 19.0 4473.0 114.0 196.0
2 Mercedes-Benz SL600 convertible 2dr Sports Europe Rear 126670.0 117854.0 5.5 12.0 493.0 13.0 19.0 4429.0 101.0 179.0
3 Mercedes-Benz SL55 AMG 2dr Sports Europe Rear 121770.0 113388.0 5.5 8.0 493.0 14.0 21.0 4235.0 101.0 179.0
4 Mercedes-Benz CL500 2dr Sedan Europe Rear 94820.0 88324.0 5.0 8.0 302.0 16.0 24.0 4085.0 114.0 196.0
+ Elapsed: 0.0119s, user: 0.011s, mem: 5.59mb
If you are using the pandas.DataFrame style API for CASTable, you can also use the sort_values() method on CASTable objects.
In [6]: sorttbl = tbl.sort_values(['MSRP'])
In [7]: sorttbl.head()
Out[7]:
Selected Rows from Table TMPRZ6BHYVL
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Kia Rio 4dr manual Sedan Asia Front 10280.0 9875.0 1.6 4.0 104.0 26.0 33.0 2403.0 95.0 167.0
1 Hyundai Accent 2dr hatch Sedan Asia Front 10539.0 10107.0 1.6 4.0 103.0 29.0 33.0 2255.0 96.0 167.0
2 Toyota Echo 2dr manual Sedan Asia Front 10760.0 10144.0 1.5 4.0 108.0 35.0 43.0 2035.0 93.0 163.0
3 Saturn Ion1 4dr Sedan USA Front 10995.0 10319.0 2.2 4.0 140.0 26.0 35.0 2692.0 103.0 185.0
4 Kia Rio 4dr auto Sedan Asia Front 11155.0 10705.0 1.6 4.0 104.0 25.0 32.0 2458.0 95.0 167.0
In [8]: sorttbl = tbl.sort_values(['MSRP'], ascending=False)
In [9]: sorttbl.head()
Out[9]:
Selected Rows from Table TMPRZ6BHYVL
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Porsche 911 GT2 2dr Sports Europe Rear 192465.0 173560.0 3.6 6.0 477.0 17.0 24.0 3131.0 93.0 175.0
1 Mercedes-Benz CL600 2dr Sedan Europe Rear 128420.0 119600.0 5.5 12.0 493.0 13.0 19.0 4473.0 114.0 196.0
2 Mercedes-Benz SL600 convertible 2dr Sports Europe Rear 126670.0 117854.0 5.5 12.0 493.0 13.0 19.0 4429.0 101.0 179.0
3 Mercedes-Benz SL55 AMG 2dr Sports Europe Rear 121770.0 113388.0 5.5 8.0 493.0 14.0 21.0 4235.0 101.0 179.0
4 Mercedes-Benz CL500 2dr Sedan Europe Rear 94820.0 88324.0 5.0 8.0 302.0 16.0 24.0 4085.0 114.0 196.0
As previously mentioned, this doesn’t affect anything in the table on the CAS server itself, it merely stores away the sort keys and applies them when data is fetched (either through fetch directly, or any method that calls fetch in the background).