joining on all fields in the join node
Originally posted by: residnt
Hey There,
Is there a simple way of joining data on all fields? I have tried various ways looking at inputFields and trying to replace that as a string variant but doesn't work. There has to be a way to do this with out me specifying this functionality every time.
Thanks
Jason
Hey There,
Is there a simple way of joining data on all fields? I have tried various ways looking at inputFields and trying to replace that as a string variant but doesn't work. There has to be a way to do this with out me specifying this functionality every time.
Thanks
Jason
-
Originally posted by: awilliams1024
Hi Jason,
Assuming the use of the Join Inner node you can do the following:
1. Configure both the LeftInputKey & RightInputKey with the following value: map(&field,inputFields(1))
2. Remove name field conflicts by changing the second line in the Join Inner node's Output Script to be default emit 2:*
3. Set the SortLeftInput and SortRightInput properties to true -
Originally posted by: awilliams1024
I think your expression should be the following as you want to remove only one field from the list (i.e. with the splice function's length attribute set to 1 rather than -1):
map(&field,inputFields(1).splice(inputFields(1).fi nd(toLower("{{^sheetName^}}_created_ts")),1))
However, I do not believe the inputFields() function states there is any guarantee that the order of the fields in the field list corresponds with the order of the fields in the input data - though it does appear to align in my ad-hoc test. -
Originally posted by: residnt
Sorry it took me a while to respond. I'm using the -1 to return the list from the beginning assuming that the created_ts field is the last field in the list. How can we filter out the list to not include it as that would be the better way to go.
Also on a side note. Is there a way to use the referenceFields function in the join node or any other node to pull a list that i've added to my data to only emit out those columns. I've tried map and that didn't work as it's looking for that column name as opposed to the list data associated within that field.
Even trying the above expression in the referenceFields results in 0 records
Thanks
Jason -
Originally posted by: awilliams1024
I'm not sure I know how to filter the list to exclude a specific field. If you are not going to use that field in the later analysis, could you use a Filter node to exclude the field prior to the join?
After some discussion the conclusion was that you cannot use the inputFields + map functions within referenceFields as referenceFields is a macro that is evaluated at compile time where as the inputField and map functions are only evaluated during node execution, meaning the values would not be available to referenceFields (as that would have already been evaluated prior to the node execution) -
Originally posted by: residnt
Ok is it a possibility to put in say a brainscript parameter where I can read in the column to use for the reference fields. I can put the comma separated values manually in a parameter which will work. I just cannot find any documentation on how to use each parameter type. -
Originally posted by: stonysmith
Think for a moment about the AggEx node.
In it you you have the "GroupBy" parameter. You can place expressions into that field, like this:
MyField1.left(3), MyField2.date("CCYYMMDD")
===========
Then, this expression
emit referencedFields(1,{{^GroupBy^}})
For this example, that expression translates to this:
emit MyField1,MyField2
This is documented if you put your cursor on the referenceFields function and hit F1. -
Originally posted by: residnt
Yes referenceFields is documented. It more so how to use the different parameter types. Ie Brainscript, BrainscriptExp, BrainscriptOutput, Java, Python.
So that make sense with the field being output is the cell contents of the referenced field. However what I'm trying to do is dynamically setup the output which I don't always know what fields I'll need outputted. Basically I have a column that is joined on to get an alternative name. so lets say column name = card_id_1000 is changed to "Joe Smith" ${field1} ${field2}. field1 and field2 are referencing the field to replace that text with which I am able to handle with regex expressions however I don't need the entire data source from this point. I only need the name column, field1, field2. So by extracting those field names I aggregated the list into a comma separated string. Now this is where I want to filter out the data source to only those columns. I can't use reference field with a variable as it needs to be a column found on the input.
So I'm looking for a way to emit only the columns that I find need to be replaced. -
Originally posted by: stonysmith
Take a look at the ChangeMetaData node.
Depending upon how you set the options and what data you send into the 2nd pin, you can:
1) Include all columns
2) Include just some columns
3) Rename any column
4) Change the data type of any column
Generally the order of the records on the 2nd pin sets the order used for output by ChangeMetaData.
What I very commonly do is:
- use the GetMetaData node to get a list of column names
- use a Filter to setup the OldName, NewName, NewType values
- send this into the 2nd pin of ChangeMetaData
But, also in a lot of cases, I just use a StaticData node to setup the values I need instead of the GetMetaData
Please sign in to leave a comment.
Comments
11 comments