SQL to Alteryx
Reference comparison between Alteryx functions and SQL and pandas. My perspective on the functionality of these tools.
The idea of this post is to map constructs from SQL to their equivalent node in Alteryx.
SQL | Alteryx | Remarks |
---|---|---|
WHERE | Filter | In Alteryx, we have access to both matched and unmatched datasets. By match we mean where the condition was met. |
ORDER BY | Sort | |
WINDOW FUNCTIONS LEAD LAG | Multi-row formula | |
Forumla such as Multiplication | Formula | |
Formual among multiple fields | Multi-field formula | In SQL we'd reference each field individually. In Alteryx, we use the expression builder |
ROW_NUMBER | Record ID | |
DISTINCT | Unique | In Alteryx, we are also provided access to the non-unique values as output. Calculating non-unique values in SQL would require another step of subtracting the output of DISTINCT from the original dataset. |
LIMIT | Sample | In Alteryx, the Sample tool also provides other functionality that would require additional steps in SQL. For e.g. Last N, 1 of every N, first N% etc. It also allows grouping, so features like Group by a field and then take the first N rows of each are performed in a single click. |
Unknown | Random % Sample | Random sampling by number or percentage |
Not Present | Auto Field | This detects the optimal field types and lengths. This usually takes a long time to do in Python. Pandas does make it possible to autodetect by setting a parameter when reading in a DataFrame from a file, but it takes a lot longer, so the Alteryx implementation is pretty optimized |
NVL and CASE along with aggregate functions | Imputation | |
GROUP BY and Unknown | Tile | This is a much more advance tool and performs a lot of different grouping functionality that is not natively supported by SQL or would take multiple steps to achieve. This essentially performs a lot of common clustering operations with ease. |
Unknown | Generate Rows | This allows generation of data and looks like a visual representation of a for loop in procedural programming with the addition of using database rows as input for concatentation. |
I can see, that through splitting data output to both matched/unmatched, Alteryx is trying to reduce the number of steps a human would need to do by anticipating that ahead of time.