我知道用stata和R的更多,此处仅基于兴趣摸索,本文数据来自于朱顺全老师的教程,感谢本杰明·安德森教授!!!
检验异方差 1 2 3 4 5 6 7 8 bpg_test = het_breuschpagan(fit.resid, X) print ("Breusch-Pagan Test:" , bpg_test) white_test = het_white(fit.resid, X) print ("White Test:" , white_test) gq_test=het_goldfeldquandt(fit.resid, X) print ("Goldfeld-Quandt Test:" ,gq_test)
使用statsmodels里面的三个内置函数来检验异方差(教材里手算Goldfeld-Quandt已经不需要了)
三种方法返回的数据一般都只需要关注第二位,即p值,小于0.05则认为存在异方差
WLS 1 2 3 4 5 6 wgt = 1 / fit.resid**2 wls_model = sm.WLS(y, X, weights=wgt) wls_fit = wls_model.fit()print (wls_fit.summary())
此处我是用残差平方的倒数作为权重(我记得李旭晖老师当时是这么教的,当然这地方有可以深挖的,所以关于异方差的处理,后面还要将继续深入)
完整代码 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 import pandas as pdimport numpy as npimport statsmodels.api as smfrom statsmodels.stats.outliers_influence import variance_inflation_factorfrom statsmodels.stats.diagnostic import het_breuschpagan, het_white,het_goldfeldquandtfrom patsy import dmatrices data=pd.DataFrame() data=pd.DataFrame(pd.read_excel('C:\\Users\\cyx94a\\Desktop\\Python\\data\\al8-2.xls' ))vars =['x' ,'y' ] df=data[vars ] y,X=dmatrices('y~x' ,data=df,return_type='dataframe' )print (data.describe())print (data.corr()) model=sm.OLS(y,X) fit=model.fit()print (fit.summary()) X_with_const = sm.add_constant(X) vif_values = [variance_inflation_factor(X_with_const.values, i) for i in range (X_with_const.shape[1 ])] print ("VIF values:" , vif_values) bpg_test = het_breuschpagan(fit.resid, X) print ("Breusch-Pagan Test:" , bpg_test) white_test = het_white(fit.resid, X) print ("White Test:" , white_test) gq_test=het_goldfeldquandt(fit.resid, X) print ("Goldfeld-Quandt Test:" ,gq_test) wgt = 1 / fit.resid**2 wls_model = sm.WLS(y, X, weights=wgt) wls_fit = wls_model.fit()print (wls_fit.summary())