We investigate the application of 17 machine learning models to predict recovery rates of Italian non-performing loans. Our data set comprises loan level data including historical collections and original recovery expectations including the expected timing and recovery cash flow amounts.
We investigate whether original recovery expectations provide a more accurate forecast compared to independently derived statistical models of recoveries. We also test whether original recovery expectations should be used as features i.e., added as input variables to machine learning models in addition to other predictors.
We test the accuracy of recovery predictions in the first, second and third year after the creation of the original recovery expectations. Our data does not allow us to test the accuracy of predicting the ultimate recovery rates as most cases are still in workout at the last observation date.
Our findings are as follows:
- As in other recent academic studies we find that nonlinear machine learning techniques such as neural networks and ensemble models can outperform traditional parametric regressions.
- Historical collection data is an important feature in creating and updating forecasts for loan recoveries of open cases that are still in workout.
- Interestingly, while original recovery expectations are not reliable as standalone predictors, they enhance the models’ performance when used as features.
- Independently derived models, excluding the original recovery expectations, show a lower prediction error in the first, second and third year after the creation of the original recovery expectations.
Our findings should be interesting for NPL investors, banks, and loan servicing companies who are looking to improve the accuracy of projected recovery cash flows and independently verify original recovery expectations. Such verifications, can include benchmarking with models that use historical collections and, where available, original recovery expectations as model features.
Read the full article: Enhancing recovery rate predictions with machine learning