Adding row counter with the new python transform node

Comments

4 comments

  • Avatar
    Nick Lancaster

    Solved my own question

    Use a Transform node.

    [ConfigueFields] 
    out1 += in1

    out1.rowid = int

    [ProcessRecords] 
    out1 += in1

    for x in range(node.execCount): 
    out1.rowid = x + 1

    If you want rowid to start at 0 then delete "+ 1"

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    Hi Nick, I'm glad you found a solution.

    If you just want to add a record count field to the data, can I make the following suggestions how it could be simplified and operate better with larger data sets:


    1. Changing the data type of the 'rowid' field to long instead of int will reduce the likelihood of counter overflow with very large data sets.


    2. The Transform node will perform better if it is not looping over the list generated by the range() function for each input record.


    There are a number of ways this could be achieved, for example:


    #### Option1


    [ConfigureFields]

    out1 += in1

    out1.rowid = long

     

    [ProcessRecords]

    out1 += in1

    out1.rowid = node.execCount


    #### End Option 1

     

    #### Option 2


    [ConfigureFields]

    out1 += in1

    out1.rowid = long

    x = 1  # Set to 0 to start at the count zero

     

    [ProcessRecords]

    out1 += in1

    out1.rowid = x

    x += 1


    #### End Option 2

     

    If you want the record count field to be output at the left hand side of the data as the first column put the out1.rowid = long statement before the out1 += in1 statement in the ConfigureFields script.


    In some situations you may need to know the total number records in the data set so that you can perform a calculation. Hence, the total record count needs to be available when each record is being processed. You can achieve this by using an Aggregate node to count the records and then join the value to each record using a Lookup node. In the example see attached screenshots) the Aggregate node is configured to count the 'color' field and output the count as the 'record_count' field. This is input to the Lookup node on its lookup pin. Leaving the Left Field and Right Field Match Key properties at their default values will result in the node joining the record count to every record in the data.

     

     

    Attached files

    Adding_Record_Count_to_Data_Set_1.png
    Adding_Record_Count_to_Data_Set_2.png
    Adding_Record_Count_to_Data_Set_3.png

    0
    Comment actions Permalink
  • Avatar
    Nick Lancaster

    Thanks Adrian I'll look at implementing your suggestion

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    Update: when using Data3Sxity Analyze v.3.2.7 you can also leverage the Calculate Fields node to generate a row counter.

    See the node help or online documentation here:

    https://preview-desktop.lavastorm.com/#e-node-help/Aggregation_and_Transformation/calculate-fields.html%3FTocPath%3DNode%2520help%7CAggregation%2520and%2520Transformation%7C_____2

     

    0
    Comment actions Permalink

Please sign in to leave a comment.



Powered by Zendesk