Sep 12, 2023
Sep 12, 2023 01:39 PM
  1. 模式检查:
      • 观察残差图,检查是否有任何模式。在理想情况下,点应该随机分布在0周围,没有明显的模式。
      • 如果残差图显示有模式,这可能意味着模型没有捕获数据中的某些信息。这可能是因为模型是线性的,而数据中存在非线性关系,或者模型可能需要更多的特征或更复杂的模型来捕获数据的模式。
  1. 检查残差的正态分布:
      • 观察残差的直方图和正态分布曲线,检查残差是否大致呈正态分布。
      • 如果残差不遵循正态分布,可能需要对特征或响应进行转换,如对数转换。
  1. 验证恒定的方差:
      • 残差的分布应该在整个预测范围内都是均匀的,这称为“恒定方差”。
      • 如果残差的分布在预测范围的一部分变宽或变窄,这可能意味着模型的误差项的方差不是恒定的。
  1. 查找离群值和高杠杆点:
      • 在残差图上,查找任何明显偏离其余数据的点。这些可能是异常值或高杠杆点。
      • 可以考虑删除或进一步调查这些点。
  1. 时间序列分析(如果适用):
      • 如果数据是时间序列数据,还需要检查残差与时间的关系,确保没有任何趋势或季节性。
  1. 多重共线性:
      • 如果模型中存在高度相关的预测变量,可能会导致残差中的模式。检查数据的相关性矩阵,以确保没有多重共线性问题。
Residual analysis is a vital part of model assessment. By examining the residual plots and the histogram of residuals, we can identify issues with the model and certain characteristics in the data. Here are the fundamental steps for conducting a residual analysis:
  1. Pattern Inspection:
      • Observe the residual plot, checking for any patterns. Ideally, the points should be randomly scattered around 0 with no evident patterns.
      • If the residual plot shows patterns, it might mean that the model hasn't captured certain information in the data. This could be because the model is linear and there's a nonlinear relationship in the data, or the model might need more features or a more intricate model to capture the patterns in the data.
  1. Check for Normal Distribution of Residuals:
      • Observe the histogram of residuals and the normal distribution curve, checking if residuals are roughly normally distributed.
      • If residuals don't follow a normal distribution, a transformation on the features or response, such as a logarithmic transformation, might be needed.
  1. Validate Constant Variance:
      • The distribution of residuals should be uniform across the range of predictions, known as "homoscedasticity".
      • If the distribution of residuals gets wider or narrower in some part of the prediction range, it could mean that the variance of the error terms isn't constant.
  1. Look for Outliers and High Leverage Points:
      • On the residual plot, look for any points that deviate significantly from the rest of the data. These might be outliers or high leverage points.
      • Consider removing or further investigating these points.
  1. Time Series Analysis (if applicable):
      • If the data is time-series data, also check the relationship of residuals with time to ensure there's no trend or seasonality.
  1. Multicollinearity:
      • If there are highly correlated predictors in the model, it can lead to patterns in the residuals. Check the correlation matrix of the data to ensure there's no multicollinearity issue.
After conducting the residual analysis, adjustments to the model or data can be made based on the observed issues. For instance, if a nonlinear relationship is detected in the data, consider adding polynomial features or using a model that can capture nonlinear relationships.