We recommend switching to the latest versions of Edge, Firefox, Chrome or Safari. Using Internet Explorer will result in a loss of website functionality.

Using the Python-based Transform Node

Follow

Comments

3 comments

  • Avatar
    Daniel Rempel

    Using the Transform node to manipulate field names

    To replace spaces in all field names:

    ConfigureFields:

    for field in fields:
    out1[field.replace(" ","_")] = in1[field]

    Although frankly I'm a little surprised that it works because in ProcessRecords I don't assign anything to the new out1 fields, it still just has the default "out1 += in1"

    Convert field names to lower case

    ConfigureFields:

    for field in fields:
    out1[field.lower()] = in1[field]

    Append a prefix or suffix to field names

    ConfigureFields:

    for field in fields:
    out1["prefix_"+field] = in1[field]

    for field in fields:
    out1[field+"_suffix"] = in1[field]

    Convert camelCase field names to lower_case

    ConfigureFields:

    import re
    for field in fields:
    out1[re.sub("([a-z])([A-Z])" , "\g<1>_\g<2>" , field).lower()] = in1[field]

    This one needs a little explaining. We are using regular expressions (import re) to look for lowercase characters ([a-z]) that are immediately followed by uppercase characters ([A-Z]). 

    The ( ) create groups in the search pattern that can be referenced in the replacement pattern using the \g<#> syntax. So in this case we take the first match, the lowercase character, and add an _ between it and the second match, the uppercase character. Finally, since the re module doesn't support the \L switch to convert text to lowercase, we apply .lower() to the result of the regex operation.

    Rearrange fields into alphabetical order

    ConfigureFields:

    for field in sorted(fields):
    out1[field] = in1[field]

    More advanced methods

    These methods can be combined, as in this example that converts to lowercase, replaces spaces with underscores and sorts the fields:

    for field in sorted(fields):
        out1[field.lower().replace(" ","_")] = in1[field]

    Python sort is case-sensitive, however, so all fields starting with uppercase characters will come before all fields starting with lowercase characters. In this example the sorted() function is being applied to the list of incoming fields, so we don't yet have our lowercase names available for sorting there.

    One option would be to run one transform node to convert to lowercase, followed by another to sort the fields. But it can also be done in one step - we just need to convert the fields to lowercase while still remembering which original field they identified. We use a dictionary to do this:

    field_conversions = {}
    for field in fields:
    field_conversions[field.lower()] = field

    This can also be done in a single line:

    field_conversions = {field.lower():field for field in fields}

    Once we have this dictionary, we sort it by its keys and then output the new fields and link them to their identification on in1:

    for new_field in sorted(field_conversions.keys()):
    out1[new_field] = in1[field_conversions[new_field]]

    In the end, I'm using this code which combines many of these elements to give me sorted, lower_case fieldnames:

    import re
    field_conversions = {re.sub("([a-z])([A-Z])","\g<1>_\g<2>",field.replace(" ","_")).lower():field for field in fields}
    for new_field in sorted(field_conversions.keys()):
    out1[new_field] = in1[field_conversions[new_field]]
    0
    Comment actions Permalink
  • Avatar
    Jaroslaw Stary

    Hey,

    We are happy that we can use a Python without any classes like in Lavastorm, that is a advantage of this node, but still there are a few things which could be added.

    Is it a reason why transform node is working only with standard python libraries?

    Is it possible to extend a jython implementation by new libraries like pandas, numpy or cx_oracle?

    Thanks 

    Jaroslaw

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    The Transform node is built using a Java implementation of the Python language - Jython. The Jython instance includes equivalents for the functionality included in the Python standard library.

    You can leverage third-party Python packages that are written in Python - i.e. they are Pure Python. However, it is not possible to use Python packages that are written in the C language.

    Please see this article for additional details:

    https://support.infogix.com/hc/en-us/articles/360051959533-How-to-install-3rd-party-Python-packages-on-Analyze

    0
    Comment actions Permalink

Please sign in to leave a comment.