Full details on the Machine Learning may be found within the following parent article :
Machine Learning with Data360 Analyze and R
We are going to download the example data directly from the UCI repository. To do this Start Analyze and Create a new data flow. From the ‘Interfaces and Adapter’s category of the node palette drag a R node onto the canvas.
By default the R node does not have any input or output pins. We are going to add an output pin for the downloaded data. With the node selected in the canvas, click on the ‘Define’ tab:
Scroll down to the bottom of the properties panel so that the ‘Inputs’ and ‘Outputs’ sections are visible. Click into the ‘Output Name’ text area (where the “Type to add a new output” placeholder is displayed). Type the name for the output – in this case enter ‘out1’ (without the quotes) and press enter. The panel will change to show a new placeholder line and the R node on the canvas will be updated to display the new output pin.
Now scroll back to the top of the properties panel and select the ‘Configure’ tab.
Type a descriptive name in the ‘Name’ property. The indication area of the node properties show that there is one mandatory value missing – the contents of the RScript property – into which we will be inserting the required R script.
Before we get into the specifics of the script for this example, it is worth noting that, when the R node is run it will, by default, generate an error if it cannot locate a variable with the same name as an output pin – ‘out1’ in our case. When developing a R script within the node it is useful to assign a placeholder for any outputs when you create the pins. When outputting data, the data must be a R object of class ‘data.frame’. In its simplest case this can be a single string value, e.g.
out1 <- data.frame(placeholder="Hello world!")
Alternatively, if you had a node with an input pin (called ‘in1’) you could pass the data through the R node using:
out1 <- in1
Note that in R, variables are case sensitive. The variables you use must be valid R names.
Ok, back to our example. The R node can download data from a web source. So in the script the first statement defines the URL of the file (comment lines starting with # are ignored).
## Define the URL for the source data from the UCI Machine Learning Repository
data_URL <- "https://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition/letter-recognition.data"
The next R statement downloads the data from the specified URL and assigns it to the ‘df’ variable:
df <- read.csv(url(data_URL), header=FALSE)
As the file in the repository does not include the variable names the header attribute is set to ‘FALSE’.
Next we assign the correct names to the variables (see the data description for details). The names are defined as a vector and the data.frame column names are then assigned to the values in the vector:
var_Names <- c("lettr","x-box","y-box","width",
"high","onpix","x-bar","y-bar",
"x2bar","y2bar","xybar","x2ybr",
"xy2br","x-ege","xegvy","y-ege","yegvx")
colnames(df) <- var_Names
As we want to output the downloaded data we assign the ‘out1’ variable to the ‘df’ data.frame
out1 <- data.frame(df)
As in this example the R environment is on the local machine with a default Rserve configuration, there is no need to configure the Rusername, Rpassword or change the Rport value.
Running the node retrieves the data and outputs it on the ‘out1’ pin:
Comments
0 comments
Please sign in to leave a comment.