Skip to content

Question: How do we ensure TDigest::merge never causes numCentroids over centroidsCapacity and thus index out of bound? #703

@tisonkun

Description

@tisonkun

numCentroids_ = 0;
Sort.stableSort(values, weights, num);
if (reverseMerge_) { // this might be avoidable if stableSort could be implemented with a boolean parameter to invert the logic
Sort.reverse(values, num);
Sort.reverse(weights, num);
}
centroidMeans_[0] = values[0];
centroidWeights_[0] = weights[0];
numCentroids_++;
int current = 1;
double weightSoFar = 0;
while (current != num) {
final double proposedWeight = centroidWeights_[numCentroids_ - 1] + weights[current];
boolean addThis = false;
if ((current != 1) && (current != (num - 1))) {
final double q0 = weightSoFar / centroidsWeight_;
final double q2 = (weightSoFar + proposedWeight) / centroidsWeight_;
final double normalizer = ScaleFunction.normalizer(k_ * 2, centroidsWeight_);
addThis = proposedWeight <= (centroidsWeight_ * Math.min(ScaleFunction.max(q0, normalizer), ScaleFunction.max(q2, normalizer)));
}
if (addThis) { // merge into existing centroid
centroidWeights_[numCentroids_ - 1] += weights[current];
centroidMeans_[numCentroids_ - 1] += ((values[current] - centroidMeans_[numCentroids_ - 1])
* weights[current]) / centroidWeights_[numCentroids_ - 1];
} else { // copy to a new centroid
weightSoFar += centroidWeights_[numCentroids_ - 1];
centroidMeans_[numCentroids_] = values[current];
centroidWeights_[numCentroids_] = weights[current];
numCentroids_++;
}
current++;
}

I don't find any constraint that numCentroids will never exceed centroidsCapacity here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions