In this post, I'm going to take a Rcpp example that call a C++ function to find kth smallest element from an array. A partition-based selection algorithm could be used for implementation.
A most basic partition-based selection algorithm, quickselect, is able to achieve linear performance to find the kth element in an unordered list. Quickselect is a variant of quicksort, both of which choose a pivot and then partitions the data by it. The procedure of quickselect is to firstly move all elements smaller than the pivot to the left and what greater than the pivot the the right by exchanging the location of them, given a pivot such as the last element in the list; and then to move the elements in the left or right sublist again according to a new pivot until getting exact kth elements. The difference from quicksort is that quickselect only need to recurses on one side where the desired kth element is, instead of recursing on both sides of the partition which is what quicksort does. That is why it could reduce the average complexity from O(n log n) to O(n). Given a set A of n elements, and an integer k, we want to find the kth smallest element in A. The pseudocode of quickselect is as follows:
and the pseudocode of partition is
There is a function "nth_element" in C++ for the partition-based selection algorithm, that we can use in R by calling Rcpp.
First, let's write a C++ function, with the input (x, r) which means we want to find the rth smallest elements in an array x. A sample code is as follows:
I save this code with a file name "nth.cpp".
In order to call this function in R, we should type in library(Rcpp) and sourceCpp(" "), for example,
library(Rcpp)
setwd("/Users/Desktop")
sourceCpp("nth.cpp")
Then we can run this function and compare with sort() function in R. Given a list with size of 10^8, and we want the smallest 10 numbers. The R program output is as follows:
> x = sample(1:10^8,10^8,replace=FALSE)
>
> ini.time <- Sys.time()
> y = nth(x,10)
> Sys.time() - ini.time
Time difference of 2.061583 secs
> y[1:10]
[1] 9 2 6 5 4 8 7 1 3 10
>
> ini.time <- Sys.time()
> z = sort(x)
> Sys.time() - ini.time
Time difference of 9.954409 secs
> z[1:10]
[1] 1 2 3 4 5 6 7 8 9 10
We can find that both the "nth" function using partition-based selection algorithm and the sort() function are able to get the correct answer. Although the result by "nth" function is not sorted, it does not matter because we care about the 10th numbers but not their orders. Then let's compare the computational efficiency. Obviously, the partition-based selection algorithm is more efficient than the quicksort algorithm.
A most basic partition-based selection algorithm, quickselect, is able to achieve linear performance to find the kth element in an unordered list. Quickselect is a variant of quicksort, both of which choose a pivot and then partitions the data by it. The procedure of quickselect is to firstly move all elements smaller than the pivot to the left and what greater than the pivot the the right by exchanging the location of them, given a pivot such as the last element in the list; and then to move the elements in the left or right sublist again according to a new pivot until getting exact kth elements. The difference from quicksort is that quickselect only need to recurses on one side where the desired kth element is, instead of recursing on both sides of the partition which is what quicksort does. That is why it could reduce the average complexity from O(n log n) to O(n). Given a set A of n elements, and an integer k, we want to find the kth smallest element in A. The pseudocode of quickselect is as follows:
and the pseudocode of partition is
There is a function "nth_element" in C++ for the partition-based selection algorithm, that we can use in R by calling Rcpp.
First, let's write a C++ function, with the input (x, r) which means we want to find the rth smallest elements in an array x. A sample code is as follows:
#include <iostream>
#include <vector>
#include <algorithm>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector nth(NumericVector x, int r) {
NumericVector y = clone(x);
std::nth_element(y.begin(), y.begin()+r-1, y.end());
return y;
}
I save this code with a file name "nth.cpp".
In order to call this function in R, we should type in library(Rcpp) and sourceCpp(" "), for example,
library(Rcpp)
setwd("/Users/Desktop")
sourceCpp("nth.cpp")
> x = sample(1:10^8,10^8,replace=FALSE)
>
> ini.time <- Sys.time()
> y = nth(x,10)
> Sys.time() - ini.time
Time difference of 2.061583 secs
> y[1:10]
[1] 9 2 6 5 4 8 7 1 3 10
>
> ini.time <- Sys.time()
> z = sort(x)
> Sys.time() - ini.time
Time difference of 9.954409 secs
> z[1:10]
[1] 1 2 3 4 5 6 7 8 9 10
We can find that both the "nth" function using partition-based selection algorithm and the sort() function are able to get the correct answer. Although the result by "nth" function is not sorted, it does not matter because we care about the 10th numbers but not their orders. Then let's compare the computational efficiency. Obviously, the partition-based selection algorithm is more efficient than the quicksort algorithm.
Comments
Post a Comment