We have 2 Datasets to compare and get only the unique matched records. For records that are not matched or records that are not uniquely matched (multiple matches) then we have to ignore such records.
To elaborate, we have dataset as below:
Dataset1:
“Pay String” is the complete string
Col1, Col2, Col3, Col4 has the comma separated values from column “Pay String”
E.G.
|Pay string| |COL1| |COL2| |COL3| |COL4|
Bill,Vod,ISH,123 Bill Vod ISH 123
Dataset2:
Data dump which is used for identifying accounts.
“Bank String” has the complete string
Bank_col1, Bank_col2, Bank_col3, Bank_col4 has the comma separated values from “Bank String” column.
|Bank string| |Bank_col1| |Bank_col2| |Bank_col3| |Bank_col4|
Bill,Vod,EMA,1234 Bill Vod EMA 1234
We want to know how to search for the “COL1” string with the entire “Bank string” with unique matches. Similarly for 'COL2','COL3','COL4'
By using Join node we are getting two matching records for eg. value=Bill in above dataset
X-ref node gives two matches.
Lookup gives First match.
We want those records that have unique matches and ignore those having two or more matches.
Kindly help.
-
While using X-ref node
left input key = 'Bank string'
right input key = 'COL1'
But as it is gives output's that are Exact matches in my dataset it is giving 0 output.
Hence I want something that will do the Contains operation
I want something that will check if 'Bank string' contains 'COL1' value
-
I've re-read this a couple of times and I don't think I fully understand the ask so I hope I give you something to think about at least. I do see a request to have a partial match within the Merge (or X-Ref) node being mentioned. This is how I have interpreted that:
1. Bank String = Bill,Vod,EMA,1234 .... this will be our right input data.
This means that it would match to Col1 = Bill or Col2 = Vod. I made the presumption we did not know how many Col1, Col2, Col3 etc. fields were going to be in our left input data.
2. We can cycle though all of the elements in our list Bank String against all of Col1, Col2, Col3 etc. values.
3. We can do this with a little python, depending on how big the data set is, it may end up being slow to cycle through all the values.
Attached is my attempt at solving this problem. I used Join node just because the node allows us to have 2 input pins, we're not really "joining" anything.
Also, I did this within a couple of for loops. If you wanted to just do "does Col1 exist within Bank string", you could removed one of them and replace it with an "if col1 in bankString" type of logic but I think you'll end up with some bad matches, especially the longer your Bank String value gets.
Attached files
-
Hi Gerry, How can I count number of matches found
For eg I have 2 strings
BILL PAY,ISHWARI,123
EMA,BILL PAY,123
So when I match both string on BILL PAY,123 I am getting both strings as outputs,but when I match on Ishwari I should get one value.
So I think I will be able to do find this if I get count of number of matching values.
Please sign in to leave a comment.
Comments
6 comments