We recommend switching to the latest versions of Edge, Firefox, Chrome or Safari. Using Internet Explorer will result in a loss of website functionality.

DB Execute running too long, need to find a new way to solve this !

Comments

11 comments

  • Avatar
    Gerry Mullin

    You could try changing your query from = to in. Oracle allows you up to 1000 entries for an in condition query, I believe SQL Server has no limit, other databases will probably vary in what they allow. This way you would only be performing 1 query for a thousand entries (if it's Oracle), as opposed to 1000 individual queries. Afterward you could filter on the NN_Order_Item_No field to limit your dataset to specifically what you want or perhaps you could incorporate it into the query. I'm not sure how 2 in conditions in a query would function.

    e.g. select NN_Agent_No, NN_Order_Item_No from SALES_ORDER where NN_Agent_No in (1,2,3,4,5)

    0
    Comment actions Permalink
  • Avatar
    Yusairi

    Hi Gerry,
    I am aware about 1000 limit of data being query. The problem if I change from "=" to "IN", the query in DB Execute Node will run too long due to table having more than 500 millions records. The only way is to loop by providing 1 record per each process for DB Execute Node to produce the output. In this case, I am not found any loop that be able to provide 1 by 1 record to process... There must be a way to process each record when going to DB execute Node.

    0
    Comment actions Permalink
  • Avatar
    Gerry Mullin

    Sorry yes, you are correct! I forgot the scenario I described would require a loop for DB Execute. I ran into this issue before and replaced it with JDBC Query. Here's a library node I built for this purpose. I believe it will execute quicker than your current situation. 

    Also has a DBA added any indexes to your columns?

     

    Attached files

    JDBC IN Condition Query - Library Node.lna

     

    0
    Comment actions Permalink
  • Avatar
    Yusairi

    Hi Mr. Gerry,
    I can't open your lna file due to incompatible version. I'm using Data3Sixty Analyze v3.4.3.5100

    0
    Comment actions Permalink
  • Avatar
    Gerry Mullin

    I would recommend you consider upgrading, but in the meantime I recreated the solution in 3.4.3.

     

    Attached files

    example.png
    JDBC In Condition - v3.4.3.lna

     

    0
    Comment actions Permalink
  • Avatar
    Yusairi

    Hi Mr. Gerry,
    Appreciate your assistance. Currently the JDBC class not available since I can't simply upload any Oracle JAR or files related to Linux server.

    I'm looking a way to use loop but I'm not familiar how it works if want to loop based on the record feed into that loop. Let say, if the source contain 70k records, I need to loop 70k time and each record shall be feed into DB Execute Node which I believe it will very fast return the result eventhough its keep loop 70k times.

    The While loop sample not very much help on this.


    0
    Comment actions Permalink
  • Avatar
    Gerry Mullin

    I don't understand why you can't use JDBC, as Oracle JDBC Driver is shipped with Analyze. Maybe you experienced some version compatibility or other issue? In any case, if you want to continue with oci and DB Execute, rather than using a loop, which will work but will take more time, you can use the same principle in the data flow but replace the JDBC Query node with a DB Execute node. I have added an example to the data flow.

     

    Attached files

    example_dbexecute.png
    JDBC and DB Ececute In Condition - 3.4.3.lna

     

    0
    Comment actions Permalink
  • Avatar
    Yusairi

    Hi Mr. Gerry,
    I'm still trying to understand how the flow sample work. It seems the flow had set 1000 records shall be execute on that DB Execute Node. I got 2 questions:

    1. Which node shall perform the loop if the records got 70k records ?.
    2. Is it shall be loop 70 times ?.

    0
    Comment actions Permalink
  • Avatar
    Yusairi

    Hi Mr. Gerry,
    My aim to handle 70k records are, the input only required 1 records (contain 2 columns) per each time DB Execute Node run and require to loop again for another 69,999 records.

    e.g. select NN_Agent_No, NN_Order_Item_No from SALES_ORDER where (NN_Agent_No,NN_Order_Item_No) in (:1,:2)

    0
    Comment actions Permalink
  • Avatar
    Gerry Mullin

    There is no looping in the example, hence why it should be quicker than using a Do While node in the process. The data flow I provided will only allow one SqlQueryFieldBinding value, not 2 like you want. However I don't see that as a major issue for your situation. It will group the 1 column into groups of 1000 and in the case of 70k rows, it will execute 70 queries. You can then use a Lookup or Merge node to filter out the results you don't need from the query with the 2 columns you want in your query. Your query using one column in the data flow I attached would look like this:

    select NN_Agent_No, NN_Order_Item_No from SALES_ORDER where NN_Agent_No in {{^InList^}}

    The output of this query will contain more data that you want, so you would directly follow that with a Lookup node, add a second pin to split between match and non-match. Then the matches found would be the dataset you actually want. From past experience I have done a similar thing and found the total execution time to be brought down dramatically.

    0
    Comment actions Permalink
  • Avatar
    Yusairi

    Hi Mr. Gerry,
    I had perform 3 time HashSplit from full records of 900k >> 90k >> 9k >> 900.
    Each DB Execute Node perform the query only with avg less than 1k records.
    I had save the time from 15 hours to 4 hours process. Appreciate your advice on 1000 records per each query and the SQL command changes between "=" and "in".

    At this moment, issue had been resolved.
    Thank you very much.

    1
    Comment actions Permalink

Please sign in to leave a comment.