Compare a prediction to a reference label using embedding distance.
The labeled criteria evaluator, which evaluates a model based on a custom set of criteria, with a reference label.
The labeled pairwise string evaluator, which predicts the preferred prediction from between two models based on a ground truth reference label.
Compare two predictions using embedding distance.
The pairwise string evaluator, which predicts the preferred prediction from between two models.
The agent trajectory evaluator, which grades the agent's intermediate steps.
The criteria evaluator, which evaluates a model based on a custom set of criteria without any reference labels.