Demonstration of Dynamic Treatment Regimes

tmarumt

2023-09-28

Q学習のデモンストレーション

DTRの推定

\begin{aligned} & Y\mid\left(X=x,A=a\right)\sim \mathrm{N}\left(Q\left(x,a\right),1^2\right) & X\sim \mathrm{U}\left(0,1.5\right)\\ & Q\left(x,a\right)=\mathrm{E}\left[Y\mid X=x,A=a\right]=x+a-ax & X\perp A \end{aligned}

シミュレーションデータの生成

set.seed(20230928)
n <- 10000
x <- runif(n, 0, 1.5)
a <- rbinom(n, 1, 0.5)
y <- rnorm(n, x + a * (1 - x), 1)

a=0 or 1 のときの Y の条件付き期待値 \begin{cases} Q\left(x,0\right)&=\mathrm{E}\left[Y\mid X=x,A=0\right] \\ &=x+0-0 \cdot x \\ &=x \qquad \text{if $a=0$,} \\ Q\left(x,1\right)&=\mathrm{E}\left[Y\mid X=x,A=1\right] \\ &=x+1-1 \cdot x \\ &=1 \qquad \text{if $a=1$.} \end{cases}

交互作用モデルの当てはめ

fit <- lm(y ~ x * a)
co <- coef(fit)

Variable	Estimate
(Intercept)	0.0746816
x	0.9446059
a	0.9082840
x:a	-0.9046616

推定した最適DTR（ポリシー）

\hat{d}\left(x\right) = \mathrm{I}\left(0.90828−0.90466x > 0\right)

価値の評価

\begin{aligned} V\left(d\right) &= \mathrm{E}\left[Q\left(X,1\right)\mathrm{I}\left(d\left(X\right) = 1\right) + Q\left(X,0\right)\mathrm{I}\left(d\left(X\right) = 0\right)\right] \\ \hat{V}\left(\hat{d}\right) &= \frac{1}{n}\sum_i\left[Q\left(x_i,1\right)\mathrm{I}\left(\hat{d}\left(x_i\right) = 1\right) + Q\left(x_i,0\right)\mathrm{I}\left(\hat{d}\left(x_i\right) = 0\right)\right] \end{aligned}

policy_1dtr <- function(x, co1, co2) {
  return(as.integer(co1 + x * co2 > 0))
}

ite_1dtr <- tibble(
  x = x, a = a, y = y,
  a_opt = policy_1dtr(x, co["a"], co["x:a"]),
  q1 = co[1] + x * co["x"] + 1 * co["a"]
       + 1 * x * co["x:a"],
  q0 = co[1] + x * co["x"] + 0 * co["a"]
       + 0 * x * co["x:a"],
  q_opt = ifelse(a_opt == 1, q1, q0)
) |> mutate(i = row_number())

q1、q0はそれぞれ a=1 、a=0 に強制的に割り当てたときの推定期待値

ate_1dtr <- ite_1dtr |>
  summarise(mean_y = mean(y), v_hat = mean(q_opt))

mean_y	v_hat
0.89965	1.08629

推定した最適DTR（ポリシー）に従えば、アウトカムが 0.2 ほど改善
※ あくまで学習データに基づく推測

i	x	a	y	a_opt	q1	q0	q_opt
1	0.643	0	2.932	1	1.009	0.682	1.009
2	1.025	1	0.758	0	1.024	1.043	1.043
3	0.867	0	−0.274	1	1.018	0.894	1.018
4	0.874	0	0.322	1	1.018	0.901	1.018
5	0.180	0	1.252	1	0.990	0.245	0.990
6	1.277	1	1.721	0	1.034	1.281	1.281

推定価値の評価

推定したDTRで治療を行った場合のテストデータの生成

set.seed(20230928 + 1)
n <- 10000
x_test <- runif(n, 0, 1.5)
a_test <- policy_1dtr(x_test, co["a"], co["x:a"])
y_test <- rnorm(n, x_test + a_test * (1 - x_test), 1)

テストデータに対して推定DTRによる治療を行った場合の価値の推定

ate_test_1dtr <- ate_1dtr |>
  mutate(mean_y_test = mean(y_test))

mean_y	v_hat	mean_y_test
0.89965	1.08629	1.08627

mean_y_testが学習データに基づく推定価値（ =\hat{V}\left(\hat{d}\right)= v_hat）に近い値を示したため、価値関数の推定はある程度正しかったと言える。

2段階DTR推定のデモンストレーション

使用するR-package

library(DTRlearn2)
library(DynTxRegime)
library(DTRreg)

使用するデータ

data(adhd)
attach(adhd)

id	o11	o12	o13	o14	a1	r	o21	o22	a2	y
1	0	1.13883	0	1	-1	0	2	1	-1	3
2	0	0.19749	0	1	-1	0	6	1	-1	3
3	1	0.35080	0	0	-1	0	2	0	1	1
4	1	−0.40761	0	1	-1	0	6	1	1	4
5	0	−0.70404	1	0	-1	1	NA	0	-1	4
6	1	−1.33982	0	1	-1	0	1	0	-1	5
7	0	−0.04403	0	0	-1	0	7	0	1	1

2段階Q学習（from R-package {DynTxRegime}）

ステージ1、2の会議モデルの変数（履歴）を定義

H1 <- cbind(o11, o12, o13, o14)
H2 <- cbind(H1, a1, H1 * a1, r, o22, r * a1, o22 * a1)

d^{opt}_2\left(h_2\right) = d^{opt}_2\left(x_1,a_1,x_2\right) = \underset{a_2} {\operatorname{argmax}}\ \mathrm{E}\left[Y\left(a_1,a_2\right)\mid X_1 = x_1,X_2\left(a_1\right)=x_2\right]

q_main2 <- buildModelObj(model = ~ H2, solver.method = "lm")
q_cont2 <- buildModelObj(model = ~ H2, solver.method = "lm")
ql2 <- qLearn(moMain = q_main2, moCont = q_cont2, data = adhd, response = adhd["y"], txName = "a2")

yに対する線形回帰モデルの最小二乗推定量を求めて、ステージ2の最適（潜在）アウトカムql2を計算

d^{opt}_1\left(h_1\right) = d^{opt}_1\left(x_1\right) = \underset{a_1} {\operatorname{argmax}}\ \mathrm{E} \left[\underset{a_2} {\operatorname{max}}\ \mathrm{E}\left[Y\left(a_1,a_2\right)\mid X_1,X_2\left(a_1\right)\right]\mid X_1 = x_1\right]

q_main1 <- buildModelObj(model = ~ H1, solver.method = "lm")
q_cont1 <- buildModelObj(model = ~ H1, solver.method = "lm")
ql1 <- qLearn(moMain = q_main1, moCont = q_cont1, data = adhd, response = ql2, txName = "a1")

ql2に対する線形回帰モデルの最小二乗推定量を求める。