Searching with unicode regular expression
I have a bunch of Unicode records containing various Unicode characters (e.g. € and †) which I want to find and replace using D360.
I need to use the \u format to use the Unicode Code Points (e.g. \u20AC for the euro sign €)
If I set up my search patterns in a Create Data node as follows:
pattern:unicode,replace:unicode
\u2020,AAAAA
\u20AC,BBBBB
then the output pin of the Create Data node renders the characters as expected († and €)
And my search and replace (using Python's re package) works as expected as well.
However, if I put the above pattern text into a UTF-8 encoded CSV file and bring in the data via the CSV/Delimited node (with Typed Headers = TRUE), the output pin does not render the characters as above, but instead I see:
And my Python search and replace does not work (it doesn't find the desired patterns).
How do I get the CSV input to work? I need to provide a client with an externally configurable list of Unicode patterns, so I can't hard-code into a Create Data node.
-
You can just copy the actual unicode characters into your UTF-8 encoded csv file and then import them using the CSV/Delimited node
Attached files
-
I will confer with the team to confirm the situation but I believe the nodes are operating correctly. The CSV/Delimited node expects the data in the file to be encoded using the relevant character set as defined by the FileCharacterSet property. The node does not support escaped unicode characters.
In the mean time does this provide you with a solution for escaped UTF-8 characters?:
Please sign in to leave a comment.
Comments
6 comments