A common function to use in this case would be the replace function. The replace function allows you to perform a "find and replace" of text and doesn't require you to change your source data. This article is only relevant when using our older propriety scripting language: BrainScript. We recommend using the new Transform node which supports Python. See this article for an example of how to use the new Transform node and the Transform node section of the online documentation for further details. For details of how to replace text using the Transform node see this article.
replace(source, find, replace)
OR
source.replace(find, replace)
You will likely need to use the replace function in conjunction with the emit or override emit statement in order to use the corrected data for the next part of your analysis.
Example 1:
In my data, the country code field 'Country' contains the value "USA" but I want to replace it with "US" in order to match with a reference table that I intend to join with. I can use the following to perform this replace:
'Country'.replace("USA", "US")
or
replace('Country', "USA","US")
So then in a Transform node, I might have the following:
emit *
override emit 'Country'.replace("USA", "US") as "Country"
Example 2:
In my data, the country code field 'Country' sometimes contains periods in the country name abbreviations. For example, some country values appear as "U.S.A." or "U.S." but I want these values without the period i.e. "USA" and "US" respectively. I can use the replace function to replace "." with an empty string "":
'Country'.replace(".", "")
or
replace('Country', ".","")
Following from Example 1, what if I needed to remove periods in the data but also replace "USA" with "US"? I can apply the replace function twice using the following:
'Country'.replace(".", "").replace("USA", "US")
Advanced:
Sometimes a direct string replacement may not be enough if, say, the position you want to find or replace is important e.g. only replace "rd" with "road" if it is at the end of the string but only then.
Another function that is useful is regexSubstitute (and the case-insensitive version regexSubstituteI), which allows you to perform a find and replace of string patterns using Regular Expressions.
Example:
My address data contains the common abbreviation "rd" but I want to replace it with the full word "road".
The address values all end with the street suffix and I want to make sure I'm only replacing instances where there is a space before "rd" and where it is at the end of the string. This is important so that I don't change street names that contain "rd" nor change suffixes like "Boulevard"
I can perform this pattern replacement using the following:
'Address'.regexSubstituteI(" rd$"," Road")
or
regexSubstituteI('Address'," rd$"," Road")
Regular Expressions can be extremely powerful so I do encourage you to explore the other Regular Expression operators in Data360 Analyze. If you're not familiar with Regular Expressions, don't fret! There are some great free references online to help you get started.
Comments
0 comments
Please sign in to leave a comment.