Combine columns into a single output column with a delimiter
Hi
i am trying to output multiple columns into a single column and delimit using a pipe.
There is a way but seems clunky:
emit 'column1'+"|"+'column2'+"|"+'column3' as Data
It works but if i add a new column upstream I have to amend.
Is there a quicker way to do this and more flexible?
Thanks
Martin
-
Have you written this in BrainScript? If you transition to a Transform node using Jython you can do the following. There is probably a more elegant way using dict but this will do what you said. Note that it will check that the field is a string or unicode and not null before adding it to the data variable. You could modify this to add blank strings or convert everything to a string if you want all values.
data = ''
for i,fn in enumerate(in1):
field = inputs[0].metadata().field(i)
ft = in1.metadata().typeName(field.type())
fv = in1[fn]
if (ft == 'string' or ft == 'unicode') and fv is not Null:
data += '|' + fv
out1.Data = data.strip('|') -
Works lovely thanks Gerry. You mentioned the ability to amend this to pick up null or empty fields.
I tried to amend the script but i failed miserably. I want to pass through all fields into the next node whether they are blank or not. Could you help me on that one please?
Thanks
Martin
-
To clarify a few things above if they were not clear and give a little background if it helps. In the for loop we cycle through every field on the input pin. Obviously some fields could be boolean, doubles, unicode, dates etc. meaning that if you want to concatenate them all together as a string you would need to convert them to a string/unicode. Python does not like when you don't handle nulls specifically. In BrainScript you could just do something "emit string1 + string2 as joinedstring" and if string1 was null, it would not care and basically just take it as an empty string.
Back to the for loop. The variable fn will be the field name, fv will be the field value and ft will be the field type. It sounds like you just want to concatenate all your fields together, so we probably don't really care about the field type per se but we do need to handle the nulls that may or may not be present. That's my long winded explanation, hopefully it will help you understand a little more with using the Transform node. Now here's the modified script:
out1 += in1
data = ''
# fn is field name. fv is field value. ft is field type
for i,fn in enumerate(in1):
field = inputs[0].metadata().field(i)
ft = in1.metadata().typeName(field.type())
fv = in1[fn]
fv_uni = unicode(fv) if fv is not Null else '' # convert all fields to unicode
data += '|' + fv_uni # concatenate all fields together
out1.Data = data.strip('|')
Please sign in to leave a comment.
Comments
6 comments