井出草平の研究ノート

重回帰のLasso、Adaptive Lasso、Plugin Lassoを実行し比較する[Stata]

Rからデータをエクスポート

まずはデータをつくるところから。

library(AER)
data(CPS1985)
library(rio)
export(CPS1985, "CPS1985.dta")

Sataのdoファイル

// データの読み込み
use "CPS1985.dta", clear

// OLSの推定値を計算し、その結果をols_modelとしてメモリに格納する
quietly regress wage education experience age ethnicity region gender occupation sector union married
estimates store ols_model

// MSEの標本内推定値を計算する
lassogof ols_model

// 最適なペナルティパラメータをクロスバリデーションで選択
lasso linear wage education experience age ethnicity region gender occupation sector union marrie, selection(cv, folds(10)) nolog rseed(123)

// クロスバリデーションの結果を表示
di "Best lambda (penalty parameter) is: " e(lambda)

// CV 関数をプロットする
cvplot, minmax

// 最適なペナルティパラメータを使用してLassoロジスティック回帰を行う
lasso linear wage education experience age ethnicity region gender occupation sector union marrie, lambda(`e(lambda)') rseed(123)
estimates store CV

// Lassoの結果を表示
lassocoef CV, sort(coef,standardized) display(coef,standardized)

// Adaptive Lasso
lasso linear wage education experience age ethnicity region gender occupation sector union marri, selection(adaptive) rseed(123)
estimates store adaptive
lassocoef adaptive, sort(coef,standardized) display(coef,standardized)

// Plugin Lasso
lasso linear wage education experience age ethnicity region gender occupation sector union marri, selection(plugin) rseed(123)
estimates store plugin
lassocoef plugin, sort(coef,standardized) display(coef,standardized)

// モデルの比較
// すべての推定量の予測のサンプル外MSEを推定する。予測のサンプル外MSEが最も低いものを選択する。
lassogof CV adaptive plugin

分析

通常のOLSまで。通常の重回帰なので不要。

// データの読み込み
use "CPS1985.dta", clear

// OLSの推定値を計算し、その結果をols_modelとしてメモリに格納する
quietly regress wage education experience age ethnicity region gender occupation sector union married
estimates store ols_model

// MSEの標本内推定値を計算する
lassogof ols_model

n-fold Cross Validation

CVを用いたLassoを行う。CVの種類はn-fold Cross Validation。まずは最適なλを計算する。

// 最適なペナルティパラメータをクロスバリデーションで選択
lasso linear wage education experience age ethnicity region gender occupation sector union marrie, selection(cv, folds(10)) nolog rseed(123)

// クロスバリデーションの結果を表示
di "Best lambda (penalty parameter) is: " e(lambda)

// CV 関数をプロットする
cvplot, minmax

結果。

Lasso linear model                          No. of obs        =        534
                                            No. of covariates =         10
Selection: Cross-validation                 No. of CV folds   =         10

--------------------------------------------------------------------------
         |                                No. of      Out-of-      CV mean
         |                               nonzero       sample   prediction
      ID |     Description      lambda     coef.    R-squared        error
---------+----------------------------------------------------------------
       1 |    first lambda    1.960896         0      -0.0011     26.38862
      61 |   lambda before    .0073826         9       0.2530     19.69139
    * 62 | selected lambda    .0067268         9       0.2530     19.69136
      63 |     last lambda    .0061292         9       0.2530     19.69153
--------------------------------------------------------------------------
* lambda selected by cross-validation.

最適なλが計算できたので、λを利用してLasso重回帰を行う。

. lasso linear wage education experience age ethnicity region gender occupation sector union marrie, lambda(`e(lambda)') rseed(123)

Lasso linear model                          No. of obs        =        534
                                            No. of covariates =         10
Selection: Cross-validation                 No. of CV folds   =         10

--------------------------------------------------------------------------
         |                                No. of      Out-of-      CV mean
         |                               nonzero       sample   prediction
      ID |     Description      lambda     coef.    R-squared        error
---------+----------------------------------------------------------------
       1 |    first lambda    1.960896         0      -0.0011     26.38862
      61 |   lambda before    .0073826         9       0.2530     19.69139
    * 62 | selected lambda    .0067268         9       0.2530     19.69136
      63 |     last lambda    .0061292         9       0.2530     19.69153
--------------------------------------------------------------------------
* lambda selected by cross-validation.


. lassocoef CV, sort(coef,standardized) display(coef,standardized)

------------------------
             |        CV
-------------+----------
   education |   2.08908
         age |  1.122071
      gender | -1.076161
       union |  .5882659
      sector |  -.460368
   ethnicity | -.3305098
      region |  .2912564
  occupation |  .2659422
     married |  .1929359
       _cons |         0
------------------------

Adaptive Lasso

次に、Adaptive Lassoを行う。selection(adaptive)のところが違うだけである。

lasso linear wage education experience age ethnicity region gender occupation sector union marri, selection(adaptive) rseed(123)
estimates store adaptive
lassocoef adaptive, sort(coef,standardized) display(coef,standardized)

結果。

. lasso linear wage education experience age ethnicity region gender occupation sector union marri, selection(adaptive) rseed(123)

Lasso linear model                         No. of obs         =        534
                                           No. of covariates  =         10
Selection: Adaptive                        No. of lasso steps =          2

Final adaptive step results
--------------------------------------------------------------------------
         |                                No. of      Out-of-      CV mean
         |                               nonzero       sample   prediction
      ID |     Description      lambda     coef.    R-squared        error
---------+----------------------------------------------------------------
      64 |    first lambda    9.819472         0      -0.0011     26.38862
     109 |   lambda before    .1492472         7       0.2565     19.60032
   * 110 | selected lambda    .1359885         7       0.2565      19.5987
     111 |    lambda after    .1239077         7       0.2564     19.60266
     146 |     last lambda    .0047748         9       0.2530     19.69093
--------------------------------------------------------------------------
* lambda selected by cross-validation in final adaptive step.

. lassocoef adaptive, sort(coef,standardized) display(coef,standardized)

------------------------
             |  adaptive
-------------+----------
   education |  2.133081
         age |   1.18055
      gender | -1.002929
       union |  .4719569
      sector | -.2850779
   ethnicity | -.1704469
      region |  .1324732
       _cons | -3.55e-15
------------------------

Plugin Lasso

次に行うのはPlugin Lasso。こちらのコードもselection(plugin)が違うだけ。

lasso linear wage education experience age ethnicity region gender occupation sector union marri, selection(plugin) rseed(123)
estimates store plugin
lassocoef plugin, sort(coef,standardized) display(coef,standardized)

結果。

. lasso linear wage education experience age ethnicity region gender occupation sector union marri, selection(plugin) rseed(123)

Computing plugin lambda ...
Iteration 1:     lambda = .1502937   no. of nonzero coef. =  4
Iteration 2:     lambda = .1502937   no. of nonzero coef. =  4

Lasso linear model                          No. of obs        =        534
                                            No. of covariates =         10
Selection: Plugin heteroskedastic

--------------------------------------------------------------------------
         |                                No. of
         |                               nonzero    In-sample
      ID |     Description      lambda     coef.    R-squared          BIC
---------+----------------------------------------------------------------
     * 1 | selected lambda    .1502937         4       0.1910     3180.827
--------------------------------------------------------------------------
* lambda selected by plugin formula assuming heteroskedastic errors.

. lassocoef plugin, sort(coef,standardized) display(coef,standardized)

------------------------
             |    plugin
-------------+----------
   education |  1.301289
         age |  .4256132
      gender | -.4170235
       union |  .0935011
       _cons |         0
------------------------

モデルの比較

3つのLassoを行ったので、モデル比較を行う

. lassogof CV adaptive plugin

Penalized coefficients
-------------------------------------------------
       Name |         MSE    R-squared        Obs
------------+------------------------------------
         CV |    18.90849       0.2827        534
   adaptive |    19.08944       0.2758        534
     plugin |     21.3259       0.1910        534
-------------------------------------------------

すべての推定量の予測のサンプル外MSEが推定される。予測のサンプル外MSEが最も低いものを選択すると最も良いモデルが選択できる。このケースでいえば、CVが最も低い値なので、CVで選ばれた推定が最も良いモデルだと判断できる。

参照:Lasso関連のエントリ

ides.hatenablog.com

ides.hatenablog.com

ides.hatenablog.com

ides.hatenablog.com

ides.hatenablog.com

ides.hatenablog.com

ides.hatenablog.com