Aggregate Counts of Known Categorical Values

Comments

2 comments

  • Avatar
    Gareth Penten

    Hi Adrian,

    This is really useful, thanks for posting.

    I've got a somewhat related question so hoping you can please help. I'm hitting my head against the wall trying to figure out how to replicate a SQL window function within the Transform node (python).

    Essentially what I want to do is order by DATE ascending and count each instance of the ID that occurs. The rough code in SQL would be:

    ROW_NUMBER() OVER(PARTITION BY "ID" ORDER BY "DATE") as Instance_Count

    The easy option would be for me to adjust my code within the DB node however I know I will need similar functionality in future and figure there's definitely a way to do this within Dataverse/Python.

    If you can assist that would be great - thanks.

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    There are a number of ways you can partition your data to allow you to then work with these sub-groups and apply custom window functions. In all cases you will need to sort your data using the group by fields before using the data in the Transform node.

    One method is to explicitly create your own aggregate and reset the aggregate variable at the end of each sub-group. An example of this is shown in the top Transform node in the attached data flow file.

    You can also leverage the GrouBy functionality in the Transform node and the Python API's 'grouping' functions. This functionality is described in the Dataverse Help Python scripting -> Python module support. It is also described in the online documentation here. An example of using the grouping functions is shown in the middle Transform node in the data flow.

    If you just want to calculate the aggregate records you can also use the grouping functions in the Transform node. However, it may be more straightforward to use the Aggregate node - especially when using the Aggregate node's script-less configuration functionality that was introduced in Dataverse 3.1.6. An example of using the grouping functions to calculate aggregate values for each group is shown in the lower Transform node in the data flow. Typically this approach would be used if you wanted to subsequently join, say, the field containing the group average value to the original data set.

     

    Regards,

    Adrian

     

    Attached files

    Partitioning_Data_Using_a_Window_Function - 15 Nov 2017.lna

    0
    Comment actions Permalink

Please sign in to leave a comment.



Powered by Zendesk