¶
subsample¶
.subsample(
file, n_trees, n_required, subp = True
)
subsample a set of trees considering their distribution in the n_trees dimensional space. It tries to maximize the distance between the points in the sample considering the pairwise distance with respect to the furthest points found at a certain step. If the distance of a sample point P is not greater than the one between MD1 & MD2, then a random value is retrieved from a uniform distribution {0,1}. If the value is greater than 0.5, then the point is kept, else discarded. This allows to sample also considering the density of the points.
Args
- file (str) : name of file containing the set of trees in newick format.
- n_trees (int) : number of trees in set.
- n_required (int) : number of trees in subsample.
Returns
- points (list) : list of trees subsampled.
- idxs (list) : list of indexes of the trees subsampled.
Last update: 2024-04-22