python - Plotting Precision-Recall curve when using cross-validation in scikit-learn -


I use cross-verification to evaluate the performance of a classifier with scikit-learn And I want to plot the precision-reel curve. To plot the PR curve, I was found on the Skikit-Learning website but it does not use cross verification for evaluation.

How can I plot the exact-reel curve in the spine while using cross-validation?

I have done the following but I'm not sure that this is the correct way to do it (psudo code):

  For every K fold: exact, remember, _ = Precision_recall_curve (y_test, probs) mean_precision + = precision mean_recall + = remember mean_precision / = num_folds mean_recall / = num_folds plt  

What do you think?

Edit:

This does not work because

anyone? Instead of remembering the exact values ​​and values ​​after each fold,

Predictions on the test sample after each multiplication. After this, collect all predictions (ie out-of-bag) predictions and calculate precision and remember Give test samples to
  ## test_samples [k] = Model = Train (parameter, train_samples [k]) predictions_fold [k: kth (list list) ## K in range Train_samples [k] = test samples for kth fold (list list) (0, t) times] = prediction (model, test_samalez [k]) # predictions of forecasting predictions_combined = [preds_fold in preds Preds] ## Give predictions = rearrangement of predictions They are the original order ## Usage Predictions and labels are to calculate the list of TP, FP, FN ## Usage TP, FP, FN to calculate precisions and K-fold cross verification   one-sided, full-scale cross-valuation, the prediction gives one and only one prediction for each sample. Given that n samples, you should have n test predictions 

(Note: These predictions are different from training predictions because the prediction predicts for each sample without first seeing it).

Unless you are using one-out out-cross-resolution , then K-fold cross validation usually requires a random segmentation of data. Ideally, you'll do Repeat (and Stratified ) k-fold cross validation. Matching precise recall curves from different rounds, however, can not directly straight forward, since you can not do not use simple linear interpolation between precise memory points, opposite of ROC (see).

I personally evaluated the Davis-Goedrich method for interpolation in PR space (after numerical integration) AUC-PR and repeatedly stratified 10-fold cross Compared with classification of those using AUC-PR estimates from accreditation.

For a good conspiracy, I showed a representative PR curve from the cross-audit round.

Of course, there are many other ways to assess classifiers performance on the basis of nature, for example, if the ratio of the label (binary) in your dataset is not skew (i.e. it is approximately 50-50 ), You can use simple ROC analysis with cross-validation:

Collect predictions from each fold and create ROC curves (as before), collect all TPR-FPR points Do (i.e. Take Union F. All TPR-FPR Tuples), then potentially plot the combined set of points with smoothing. Alternatively, calculate AUC-ROC by using simple linear interpolation and overall trapezoid method for numerical integration.


Comments