SQL to Alteryx
The idea of this post is to map constructs from SQL to their equivalent node in Alteryx.
SQL | Alteryx | Remarks |
---|---|---|
WHERE | Filter | In Alteryx, we have access to both matched and unmatched datasets. By match we mean where the condition was met. |
ORDER BY | Sort | |
WINDOW FUNCTIONS LEAD LAG | Multi-row formula | |
Forumla such as Multiplication | Formula | |
Formual among multiple fields | Multi-field formula | In SQL we'd reference each field individually. In Alteryx, we use the expression builder |
ROW_NUMBER | Record ID | |
DISTINCT | Unique | In Alteryx, we are also provided access to the non-unique values as output. Calculating non-unique values in SQL would require another step of subtracting the output of DISTINCT from the original dataset. |
LIMIT | Sample | In Alteryx, the Sample tool also provides other functionality that would require additional steps in SQL. For e.g. Last N, 1 of every N, first N% etc. It also allows grouping, so features like Group by a field and then take the first N rows of each are performed in a single click. |
Unknown | Random % Sample | Random sampling by number or percentage |
Not Present | Auto Field | This detects the optimal field types and lengths. This usually takes a long time to do in Python. Pandas does make it possible to autodetect by setting a parameter when reading in a DataFrame from a file, but it takes a lot longer, so the Alteryx implementation is pretty optimized |
NVL and CASE along with aggregate functions | Imputation | |
GROUP BY and Unknown | Tile | This is a much more advance tool and performs a lot of different grouping functionality that is not natively supported by SQL or would take multiple steps to achieve. This essentially performs a lot of common clustering operations with ease. |
Unknown | Generate Rows | This allows generation of data and looks like a visual representation of a for loop in procedural programming with the addition of using database rows as input for concatentation. |
I can see, that through splitting data output to both matched/unmatched, Alteryx is trying to reduce the number of steps a human would need to do by anticipating that ahead of time.