Max-1というのは、一日前の最高気温を表します。 Min-6だったら、6日前の最低気温を表しています。

X=df.drop(["Max_temp","Min_temp"],axis=1)
Y_Max=df["Max_temp"]
Y_Min=df["Min_temp"]
X_train=X[:-365]
X_test=X[-365:]
Y_Max_train=Y_Max[:-365]
Y_Max_test=Y_Max[-365:]
Y_Min_train=Y_Min[:-365]
Y_Min_test=Y_Min[-365:]

XとYに分けます。Yには当日の最低気温と最高気温をそれぞれ切り出します。

Xには一日前までの時点での7日間の気温の推移が詰まりますね。

forest_Max = LGBMRegressor(min_samples_leaf=3, random_state=42)
forest_Max.fit(X_train, Y_Max_train)
forest_Min = LGBMRegressor(min_samples_leaf=3, random_state=42)
forest_Min.fit(X_train, Y_Min_train)

Y_Max_pred=forest_Max.predict(X_test)
Y_Min_pred=forest_Min.predict(X_test)

print("MaxTemp pred r2score",r2_score(Y_Max_test, Y_Max_pred))
print("MinTemp pred r2score",r2_score(Y_Min_test, Y_Min_pred))

MaxTemp pred r2score 0.8308272576471176
MinTemp pred r2score 0.9529302111487752

前の記録が、

MaxTemp pred r2score 0.8336991228198057
MinTemp pred r2score 0.9558456419510862

だったことを考えると、むしろ劣化してる感じですね。

ここでOptunaの出番です。

def opt(trial):
    n_estimators = trial.suggest_int('n_estimators', 0, 1000)
    max_depth = trial.suggest_int('max_depth', 1, 20)
    min_child_weight = trial.suggest_int('min_child_weight', 1, 20)
    subsample = trial.suggest_discrete_uniform('subsample', 0.5, 0.9, 0.1)
    colsample_bytree = trial.suggest_discrete_uniform('colsample_bytree', 0.5, 0.9, 0.1)
    model_opt = LGBMRegressor(
        random_state=42,
        n_estimators = n_estimators,
        max_depth = max_depth,
        min_child_weight = min_child_weight,
        subsample = subsample,
        colsample_bytree = colsample_bytree,
    )
    model_opt.fit(X_train,Y_Max_train)
    opt_pred = model_opt.predict(X_test)
    return (1.0 - (model_opt.score(X_test, Y_Max_test)))

model_opt=LGBMRegressor()
study = optuna.create_study()
study.optimize(opt, n_trials=100)

[I 2019-04-02 00:43:51,852] Finished trial#0 resulted in value: 0.15806235655863032. Current best value is 0.15806235655863032 with parameters: {'n_estimators': 44, 'max_depth': 2, 'min_child_weight': 15, 'subsample': 0.5, 'colsample_bytree': 0.5}.
[I 2019-04-02 00:43:53,161] Finished trial#1 resulted in value: 0.18335921288895263. Current best value is 0.15806235655863032 with parameters: {'n_estimators': 44, 'max_depth': 2, 'min_child_weight': 15, 'subsample': 0.5, 'colsample_bytree': 0.5}.
＊＊＊＊＊＊＊略＊＊＊＊＊＊＊＊
[I 2019-04-02 00:44:45,399] Finished trial#98 resulted in value: 0.1737700041293272. Current best value is 0.15399476054354166 with parameters: {'n_estimators': 99, 'max_depth': 1, 'min_child_weight': 15, 'subsample': 0.8, 'colsample_bytree': 0.9}.
[I 2019-04-02 00:44:46,120] Finished trial#99 resulted in value: 0.18618166444654893. Current best value is 0.15399476054354166 with parameters: {'n_estimators': 99, 'max_depth': 1, 'min_child_weight': 15, 'subsample': 0.8, 'colsample_bytree': 0.9}.

結果を見ます。

print(study.best_params)
print(1-study.best_value)

{'n_estimators': 99, 'max_depth': 1, 'min_child_weight': 15, 'subsample': 0.8, 'colsample_bytree': 0.9}
0.8460052394564583

どうやらこのパラメータが適切みたいですね。精度もちょびっと上がってます。

最低気温にも適用して、予想してみましょう。

def opt(trial):
    n_estimators = trial.suggest_int('n_estimators', 0, 1000)
    max_depth = trial.suggest_int('max_depth', 1, 20)
    min_child_weight = trial.suggest_int('min_child_weight', 1, 20)
    subsample = trial.suggest_discrete_uniform('subsample', 0.5, 0.9, 0.1)
    colsample_bytree = trial.suggest_discrete_uniform('colsample_bytree', 0.5, 0.9, 0.1)
    model_opt = LGBMRegressor(
        random_state=42,
        n_estimators = n_estimators,
        max_depth = max_depth,
        min_child_weight = min_child_weight,
        subsample = subsample,
        colsample_bytree = colsample_bytree,
    )
    model_opt.fit(X_train,Y_Min_train)
    opt_pred = model_opt.predict(X_test)
    return (1.0 - (model_opt.score(X_test, Y_Min_test)))

model_opt=LGBMRegressor()
study = optuna.create_study()
study.optimize(opt, n_trials=100)

[I 2019-04-02 00:44:46,555] Finished trial#0 resulted in value: 0.047794481939270494. Current best value is 0.047794481939270494 with parameters: {'n_estimators': 195, 'max_depth': 15, 'min_child_weight': 4, 'subsample': 0.8, 'colsample_bytree': 0.8}.
[I 2019-04-02 00:44:47,809] Finished trial#1 resulted in value: 0.05033750001841275. Current best value is 0.047794481939270494 with parameters: {'n_estimators': 195, 'max_depth': 15, 'min_child_weight': 4, 'subsample': 0.8, 'colsample_bytree': 0.8}. ＊＊＊＊＊＊＊略＊＊＊＊＊＊＊＊ 
[I 2019-04-02 00:45:26,039] Finished trial#98 resulted in value: 0.048526892555636025. Current best value is 0.04475752694493973 with parameters: {'n_estimators': 268, 'max_depth': 2, 'min_child_weight': 11, 'subsample': 0.7, 'colsample_bytree': 0.6}.
[I 2019-04-02 00:45:27,011] Finished trial#99 resulted in value: 0.04961441756195517. Current best value is 0.04475752694493973 with parameters: {'n_estimators': 268, 'max_depth': 2, 'min_child_weight': 11, 'subsample': 0.7, 'colsample_bytree': 0.6}.

print(study.best_params)
print(1-study.best_value)

{'n_estimators': 268, 'max_depth': 2, 'min_child_weight': 11, 'subsample': 0.7, 'colsample_bytree': 0.6}
0.9552424730550603

最低気温と最高気温で結構パラメータ違いますね。

Optunaを使えば結構高速にわかってしまいます。便利ですね。

ではお楽しみの予想をしてみましょう。

forest_Max = LGBMRegressor(n_estimators= 572,max_depth= 1, min_child_weight= 14, subsample= 0.9, colsample_bytree= 0.7,random_state=42)
forest_Max.fit(X_train, Y_Max_train)
forest_Min = LGBMRegressor(n_estimators= 69, max_depth= 4, min_child_weight= 20, subsample= 0.8, colsample_bytree=0.8 ,random_state=42)
forest_Min.fit(X_train, Y_Min_train)
Y_Max_pred=forest_Max.predict(X_test)
Y_Min_pred=forest_Min.predict(X_test)

print("MaxTemp pred r2score",r2_score(Y_Max_test, Y_Max_pred))
print("MinTemp pred r2score",r2_score(Y_Min_test, Y_Min_pred))

MaxTemp pred r2score 0.8435216030218544
MinTemp pred r2score 0.9553404953510781

result=pd.DataFrame([Y_Max_pred,Y_Min_pred],index=["MaxTemp_pred","MinTemp_pred"]).T
result.index=X[-len(X_test):].index
result["MaxTemp_act"]=Y_Max_test
result["MinTemp_act"]=Y_Min_test

結果を整理。

result.plot(figsize=(16,6))

<matplotlib.axes._subplots.AxesSubplot at 0x21075c0fbe0>

XX_predが予測値、XX_actが実測値です。

前回と比べると…良くなったのか…なぁ？スコアは良くなってますけどね。。
1/27の気温を題材に、実際に比較してみましょう。

前回との比較ということで、1/26のデータから1/27の予測を行ってみます。

df_pred=pd.DataFrame()
dfp=pd.read_csv("data2.csv",encoding="shiftjis",header=3).drop([0,1],axis=0).reset_index(drop=True)
df_pred["datetime"]=pd.to_datetime(dfp["年月日"])
df_pred["Max_temp"]=dfp["最高気温(℃)"]
df_pred["Min_temp"]=dfp["最低気温(℃)"]
df_pred=df_pred.set_index("datetime",drop=True)

df_pred=week_dataset(df_pred)

pred=pd.DataFrame()
pred["datetime"]=pd.to_datetime(["2019-01-27"])
pred.index=pred.datetime
pred=pred.drop("datetime",axis=1)

pred["Max -1"]=df_pred.iloc[-1]["Max_temp"]       
pred["Min -1"]=df_pred.iloc[-1]["Min_temp"]
                          
for i in range(2,8):
    pred["Max -"+str(i)]=df_pred.iloc[-1]["Max -"+str(i-1)]
    pred["Min -"+str(i)]=df_pred.iloc[-1]["Min -"+str(i-1)]

Y_Max_pred=forest_Max.predict(pred)
Y_Min_pred=forest_Min.predict(pred)

print("Max temp",Y_Max_pred)
print("Min temp",Y_Min_pred)

Max temp [9.09020509]
Min temp [-1.41623157]

さて、予測したところ、1/27の最高気温は9.09度、最低気温は-1.41度のようです。

実際の1/27は最高気温10.2度、最低気温-1.5度だったようです。
前の結果を確認してみましょう。

＞Max temp [11.14383333]
＞Min temp [-1.28191667]

最低気温はだいぶ近づきましたね。最高気温は精度変わらずくらいです。上振れか下振れかの違いです。でもこれでもGoogleに表示される天気予報と同じくらいに当てられています。（この日に限りですが）
このスクリプトは１時間かからないくらいで作成できたので、そのくらいのコストでこの精度ならかなり良いような気がしますね。

これから言えることは、おそらくこの程度の誤差が勾配Boostingによる数値ベース予想の限界ってことですね。（XGBとかCatBoostを使ってアンサンブルしても目に見えて精度が向上しそうにない）
気象予報士とかは雲の形とか太陽の活動度とか使って予想をしているそうなので、そういうデータを適切に追加すれば、もっと精度の良い予想ができそうです。

	Max_temp	Min_temp	Max -1	Min -1	Max -2	Min -2	Max -3	Min -3	Max -4	Min -4	Max -5	Min -5	Max -6	Min -6	Max -7	Min -7
datetime
2011-01-01	10.7	-3.0	5.6	0.2	11.6	0.1	10.3	-2.3	9.5	-3.9	9.2	-1.3	13.9	0.9	15.6	4.5
2011-01-02	11.4	-0.6	10.7	-3.0	5.6	0.2	11.6	0.1	10.3	-2.3	9.5	-3.9	9.2	-1.3	13.9	0.9
2011-01-03	9.3	-1.0	11.4	-0.6	10.7	-3.0	5.6	0.2	11.6	0.1	10.3	-2.3	9.5	-3.9	9.2	-1.3
2011-01-04	10.0	0.0	9.3	-1.0	11.4	-0.6	10.7	-3.0	5.6	0.2	11.6	0.1	10.3	-2.3	9.5	-3.9
2011-01-05	11.1	-2.6	10.0	0.0	9.3	-1.0	11.4	-0.6	10.7	-3.0	5.6	0.2	11.6	0.1	10.3	-2.3

晴耕雨読オンライン

おたくエンジニアが日々の所感を備忘録代わりにつらつら書いていくブログです。

LightGBMとOptunaをお試しで使って気温を予想してみる