Random sample of 10
Hi,
I want to select a random sample of 10 records.
Depending on the input data, the input for this node might not always be 10.
Any ideas for a work around?
-
Here is one solution. Use an Aggregate node to count the total number of records which is output as the 'record_Count' field. Input the count data to a Cat node on its first input pin. Your data set is input on the Cat node's second input pin. The Cat node is configured to generate the union of the fields:
The combined data set is then input to a Transform node. The Transform node leverages the Python sample() function. The number of records is obtained from the first record and is used to create a list of the indexes for all the data records (starting at 2). The sample() function is used to generate the list of 10 unique indexes to be output. If there are greater than 10 records in the data the records with the matching indexes are output. If there are 10 or less data records then all records are output.
The corresponding example data flow is attached (requires Analyze v.3.5.1).
The Head node is just used in the example as a means to control the number of data records, to allow the testing of the operation of the remaining nodes.
Attached files
Please sign in to leave a comment.
Comments
4 comments