3 @brief Module provides various random sampling algorithms. 5 @authors Andrei Novikov (pyclustering@yandex.ru) 7 @copyright GNU Public License 9 @cond GNU_PUBLIC_LICENSE 10 PyClustering is free software: you can redistribute it and/or modify 11 it under the terms of the GNU General Public License as published by 12 the Free Software Foundation, either version 3 of the License, or 13 (at your option) any later version. 15 PyClustering is distributed in the hope that it will be useful, 16 but WITHOUT ANY WARRANTY; without even the implied warranty of 17 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 GNU General Public License for more details. 20 You should have received a copy of the GNU General Public License 21 along with this program. If not, see <http://www.gnu.org/licenses/>. 32 @brief Performs data sampling using Reservoir Algorithm R. 33 @details Algorithm complexity O(n). Implementation is based on paper @cite article::utils::sampling::1. Average 34 number of uniform random variates: \f$N - n\f$. 36 @param[in] data (list): Input data for sampling. 37 @param[in] n (uint): Size of sample that should be extracted from 'data'. 39 @return (list) Sample with size 'n' from 'data'. 41 Generate random samples with 5 elements and with 3 elements using Reservoir Algorithm R: 43 data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] 44 sample = reservoir_r(data, 5) # generate sample with 5 elements for 'data'. 47 sample = reservoir_r(data, 3) # generate sample with 3 elements for 'data'. 51 Output example for the code above: 59 raise ValueError(
"Incorrect sampling value 'n' (%d) that should be bigger then data size (%d).")
64 for i
in range(n, len(data)):
65 m = random.randrange(0, i + 1)
67 reservoir[m] = data[i]
74 @brief Performs data sampling using Reservoir Algorithm X. 75 @details Algorithm complexity O(n). Implementation is based on paper @cite article::utils::sampling::1. Average 76 number of uniform random variates: 77 \f[\approx 2n\ln \left (\frac{N}{n} \right)\f] 79 @param[in] data (list): Input data for sampling. 80 @param[in] n (uint): Size of sample that should be extracted from 'data'. 82 @return (list) Sample with size 'n' from 'data'. 84 Generate random sample with 5 elements using Reservoir Algorithm X: 86 data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] 87 sample = reservoir_x(data, 10) # generate sample with 5 elements for 'data'. 91 Output example for the code above: 93 [0, 20, 2, 16, 13, 15, 19, 18, 10, 9] 97 def calculate_skip_value(t, size, skip):
98 return pow(t + 1 - size, skip + 1) / pow(t + 1, skip + 1)
100 def generate_skip_value(t, size):
101 threshold, skip = random.random(), 0
102 while calculate_skip_value(t, size, skip) > threshold:
107 raise ValueError(
"Incorrect sampling value 'n' (%d) that should be bigger then data size (%d).")
110 reservoir = data[0:n]
114 i += generate_skip_value(i, n)
117 m = random.randrange(0, n)
118 reservoir[m] = data[i]
def reservoir_r(data, n)
Performs data sampling using Reservoir Algorithm R.
def reservoir_x(data, n)
Performs data sampling using Reservoir Algorithm X.