We recommend switching to the latest versions of Edge, Firefox, Chrome or Safari. Using Internet Explorer will result in a loss of website functionality.

Preparing Data – Renaming Fields

Follow

Comments

12 comments

  • Avatar
    Jeffery Brown

    How does the newer node for "Modify Fields" add to being able to rename fields?

    1
    Comment actions Permalink
  • Avatar
    Adrian Williams

    Hi Jeff, thanks for the question.

    Yes you can rename individual fields or multiple fields using the Modify Fields node - without using any code.

    I have added a link at the start of this article to another article that describes the capabilities of the Modify Fields node.

    1
    Comment actions Permalink
  • Avatar
    Jeffery Brown

    Thanks Adrian! 

    0
    Comment actions Permalink
  • Avatar
    Daniel Rempel

    I have a dataset where all the field names contain spaces. Rather than entering a new name for each field, could I somehow use python replace(" ","_") to change all the spaces in all the field names at once?

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    A combination of built in nodes and some scripting will perform the replacement.

    You can use the Get Metadata node to obtain the metadata for all fields in the input data set.

    A Transform node can then be used to replace the space characters in the metadata. 

    The node also outputs a 'newFieldType' field where all the values are set to Null.

    The output of the Transform node is input to the Change Metadata node on its 'metadata changes' input.

    The properties of the Change Metadata node can be left blank as the default fieldnames used by the Change Metadata node align with the field names output by the Transform node.

    When the Change Metadata node is run it modifies the metadata of the data on its 'input data' pin.

     

     

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    It is not possible to attach a data flow file to a comment so I've added the example data flow for replacing all space characters in the field names to the end of the main article.

    0
    Comment actions Permalink
  • Avatar
    Daniel Rempel

    Excellent thank you!

    0
    Comment actions Permalink
  • Avatar
    Daniel Rempel

    Hi, I've been playing around with this some more and was able to accomplish this with a Transform node with the following lines in the Configure Fields property:

    for field in fields:
    out1[field.replace(" ","_")] = in1[field]

    Although frankly I'm a little surprised that it works because in ProcessRecords I don't assign anything to the new out1 fields, it still just has the default "out1 += in1"

    Then I got a little more ambitious and decided I also want to change camelCase fieldnames to lower_case fieldnames, and output the fields in alphabetical order which I find very handy on datasets where I have 80-100 fields:

    import re

    field_conversions = {re.sub("([a-z])([A-Z])","\g<1>_\g<2>",field.replace(" ","_")).lower():field for field in fields}

    for new_field in sorted(field_conversions.keys()):
    out1[new_field] = in1[field_conversions[new_field]]
    0
    Comment actions Permalink
  • Avatar
    Scott Perry

    Hi Adrian

    I currently use the Change Metadata node extensively to rename columns but the performance is causing an issue which must be due to the change in data type specified in the NewFieldType column. I assume the performance issue is due to the fact that it processes every record and does a type conversion hence is slow for my datasets as they consist of 1 million+ records. 

    Is there any way to tell the Change Metadata node to only do the column name change and not change the type? I checked the online help but it doesn't seem to be possible so I was thinking of building a custom node using the python node that works in exactly the same way (except for doing the type conversion) but can't get it to work. 

    Would really be appreciated if you could share an example python node?

    kind regards

    Scott

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    For the Change Metadata node if the NewFieldType value is Null then the node does not change the data type for that field. This should provide the best performance if only the field names are to be changed.

    A typical use case for this node was to use a Get Metadata node to retrieve the metadata for the input data set. This could then be passed into a Transform node that inserted the NewFieldName and NewFieldType fields into it's output data set. The process records script is configured to set the relevant values in the output data where a field name and/or field type needs to be changed. The data can then be passed into the Change Metadata node's metadataChanges input pin

    For example to change only the 'color' field to 'colour' the Transform nodes scripts would be:

    Which results in the following output data:

    Alternatively, you could also consider using the more recent Modify Fields node that may offer better performance and require less configuration, e.g.

    0
    Comment actions Permalink
  • Avatar
    Scott Perry

    Hi Adrian 

    thanks for the info. I had previously tried leaving NewFieldType blank but it caused the node to fail. Should have thought to try NULL. (fyi I it also works when you set it to Null which D360 treats as a string input). 

    Regarding the modify fields node, just to double check it's not possible to use this node by passing it data that contains the FieldName,NewFieldName mapping? It looks like it is more for manually configuring it in the GUI. 

    Thanks again

    Scott 

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    Hi Scott,

    The Modify Fields node does not have a 'From field' source option for the 'OutputFields' property grid so you cannot pass in the metadata changes from an input pin.

    The limits on parameterizing the node would be to use a textual property substitution as the value for the New Name cell in the required cell in the grid, i.e.

    If you were renaming all the fields in the data set to have a common prefix/suffix you could insert the textual property substitution into the 'rename by Pattern' cell in the grid:

    The textual property substitution can be a Data Flow or Run property. The Data Flow and Run properties behaves like a global literals so it cannot be changed from within data flow itself. You can pass in the Run property values to the data flow when it is being run via an Execute Data Flow node (as discussed elsewhere in the community forum).

     

    0
    Comment actions Permalink

Please sign in to leave a comment.