R Node: Unexpected error while importing data into data frames

Comments

4 comments

  • Avatar
    Adrian Williams

    It's a bit difficult to say without being able to understand the structure of the data. Would you be able to provide a small example data set that exhibits the problem?

     

    Is there any additional error information provided after the unexpected error message?

    You can have issues with importing data if the data set contains two columns with the same field name.

     

    If you do the following does the data get passed through ok?:

    - add an R node to the canvas

    - go to the define tab, scroll down the properties  and add an input called 'in1' and an output called 'out1' (without the apostrophes)

    - switch to the Configure tab and configure the RScript property with the following statement:

    out1 <- in1

    - run the node

     

    If you insert a Head node between the R node and the previous node and configure it to just output the first record does the data import ok? What about when you set the number of records to be output to 0 - do you just get the metadata output?

     

    0
    Comment actions Permalink
  • Avatar
    rakshit bhargava

    Hi Adrian,

    Thank you for your response. As per your advice, I inserted a head node and it ran perfectly for 100K records but its giving me error for the complete data that has over 9 million records and 23 columns

    Please advice how can I go around this and also increase the efficiency of flow in terms of time.

    Thank you in advance

    Best,

    Rakshit

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    You may want to use the Tail node to test whether the issue is with the data records at the end of the file.

     

    You could also use a combination of the Tail and Head nodes in series to generate a 'window' of say 1000 records that can be stepped through the data set to try to localise the record(s) that are causing the issue. For example initially set the Tail node and Head node to produce 1000 records - which will give you the last 1000 records in the data set. Test whether those records cause the error. If they do, change the record counts to focus in on the relevant records. If the initial window did not fail, change the Tail node to provide more records until the R node generates the error.

    You can also remove any unnecessary columns to try to identify the issue more quickly.

    The R node uses the Rserve protocol to connect with your R environment. If you have a lot of data then this can cause delays as the data needs to be piped over the associated TCP connection. You should remove unnecessary columns to speed up the data transfer process.

    Alternatively, if your Data3Sixty Analyze ane R environment are on the same machine you could publish the data set from the upstream as a csv file using the Output Delimited node, e.g.

    You can then configure the R node to read in the data directly from the csv file:

    Note that you should also limit the data you return to other nodes via an output pin. Consider writing the data to a file and then read it into the data flow. You can synchronize the running of the R node to an upstream node if required, see the help topic on Run Dependencies.

    0
    Comment actions Permalink
  • Avatar
    Adrian Williams

    Note, in the above image the read.csv() function is on a single line, the text is wrapped in the editor window due to the width of the panel.

     

    0
    Comment actions Permalink

Please sign in to leave a comment.



Powered by Zendesk