python决策树预测模型_带决策树回归模型的负交叉值得分
TL、DR:
1)不,除非您顯式指定,或者它是估計器的默認.score方法。因為您沒有,它默認為DecisionTreeRegressor.score,它返回決定系數,即R^2。可能是負數。在
2)是的,這是個問題。這也解釋了為什么你會得到一個負的決定系數。在
細節:
您使用的函數如下:scores = cross_val_score(simple_tree, df.loc[:,'system':'gwno'], df['gdp_growth'], cv=cv)
所以你沒有顯式地傳遞一個“scoring”參數。讓我們看看docs:scoring : string, callable or None, optional, default: None
A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).
所以它沒有明確說明,但這可能意味著它使用了估計器的默認.score方法。在
為了證實這個假設,讓我們深入研究source code。我們看到最終使用的記分器如下:
^{pr2}$
has_scoring = scoring is not None
if not hasattr(estimator, 'fit'):
raise TypeError("estimator should be an estimator implementing "
"'fit' method, %r was passed" % estimator)
if isinstance(scoring, six.string_types):
return get_scorer(scoring)
elif has_scoring:
# Heuristic to ensure user has not passed a metric
module = getattr(scoring, '__module__', None)
if hasattr(module, 'startswith') and \
module.startswith('sklearn.metrics.') and \
not module.startswith('sklearn.metrics.scorer') and \
not module.startswith('sklearn.metrics.tests.'):
raise ValueError('scoring value %r looks like it is a metric '
'function rather than a scorer. A scorer should '
'require an estimator as its first parameter. '
'Please use `make_scorer` to convert a metric '
'to a scorer.' % scoring)
return get_scorer(scoring)
elif hasattr(estimator, 'score'):
return _passthrough_scorer
elif allow_none:
return None
else:
raise TypeError(
"If no scoring is specified, the estimator passed should "
"have a 'score' method. The estimator %r does not." % estimator)
所以請注意,scoring=None已經完成,所以:has_scoring = scoring is not None
暗示has_scoring == False。另外,估計器有一個.score屬性,所以我們要通過這個分支:elif hasattr(estimator, 'score'):
return _passthrough_scorer
這很簡單:def _passthrough_scorer(estimator, *args, **kwargs):
"""Function that wraps estimator.score"""
return estimator.score(*args, **kwargs)
最后,我們現在知道scorer就是你的估計器默認的score。讓我們檢查一下docs for the estimator,它清楚地表明:Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the regression
sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual
sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible
score is 1.0 and it can be negative (because the model can be
arbitrarily worse). A constant model that always predicts the expected
value of y, disregarding the input features, would get a R^2 score of
0.0.
所以看起來你的分數實際上就是決定系數。所以,基本上,R^2為負值,意味著你的模型表現得很差。比我們僅僅預測每個輸入的期望值(即平均值)更糟糕。這是有道理的,因為正如你所說:I have a small sample of ~40 observations and ~70 variables. Might
this be the problem?
這是個問題。當你只有40個觀測值時,對一個70維的問題空間進行有意義的預測幾乎是沒有希望的。在
總結
以上是生活随笔為你收集整理的python决策树预测模型_带决策树回归模型的负交叉值得分的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Website for the intr
- 下一篇: netty概念