1 Generate Data for Choice Model

First we generate data in the following procedure.

  1. Generate \(\lambda_{imt}\). \(\lambda_{imt}\) is one of the most significant variables that represents consumer’s risk (observable for consumers but not econometrician)
  2. Generate choice in \(t = 0\) according to the choice model. In \(t = 0\), consumers have four options: the firm with and without monitoring, competitor 1 and 2.
  3. Generate realization of cost (claims count, severity and monitoring score) in \(t = 0\) according to the cost model.
  4. Generate choice in \(t = 1\) based on renewal prices realized by claims and scores. Now consumers have three options: the firm, competitor 1 and 2.

1.1 Generate \(\lambda\)

  • The rate parameter, \(\lambda_{imt}\) of Poisson distributed claims count, has mean \(\mu_{\lambda, imt}\) and an additive error \(\varepsilon_{\lambda, i}\) which is log-normally distributied with spread \(\sigma_{\lambda}\)
  • We assume that \(\mu_{\lambda, imt} = \theta_{\lambda1}x_{1, imt} + \cdots + \theta_{\lambda4}x_{4, imt}\)
rm(list = ls())

N <- 10^3
T <- 1
D <- 4

# set the seed
set.seed(1)

## True Parameters
sigma_lambda <- 0.1
theta_lambda <- c(-3, -0.5, 1, -1, 1, -0.5)

## Covariates
mu_x_t0 <- rep(0, 4)
mu_x_t1 <- rep(0, 4)
sigma_x <- diag(0.25, 4)

X_t0 <- mvrnorm(N, mu_x_t0, sigma_x)
X_t1 <- 0.5*X_t0 + 0.5*mvrnorm(N, mu_x_t1, sigma_x)

X <- X_t0 %>% 
  rbind(X_t0) %>% 
  rbind(X_t1) %>% 
  rbind(X_t1)

colnames(X) <- c(paste("x", 1:4, sep = "_"))

X_lambda <- expand.grid(i = 1:N, m = 0:1, t = 0:1) %>% 
  tibble::as_tibble() %>% 
  cbind(X)

## mu_lambda
mu_lambda <- X_lambda %>% 
  dplyr::mutate(mu_lambda = theta_lambda[1]
                + theta_lambda[2]*x_1 + theta_lambda[3]*x_2
                + theta_lambda[4]*x_3 + theta_lambda[5]*x_4
                + theta_lambda[6]*ifelse(m == 1 & t == 0, 1, 0))

## epsilon_lambda
epsilon_lambda <- expand.grid(i = 1:N) %>% 
  tibble::as_tibble() %>% 
  dplyr::mutate(epsilon_lambda = rnorm(N, 0, sigma_lambda))

## lambda
lambda <- mu_lambda %>% 
  dplyr::left_join(epsilon_lambda, by = c("i")) %>% 
  dplyr::mutate(lambda = exp(mu_lambda + epsilon_lambda))
  • The distribution of generated \(\lambda\) is as follows
## Visualize 
g_lambda <- lambda %>% 
  ggplot(aes(x = lambda)) + 
  geom_histogram(binwidth = 0.01)

plot(g_lambda)
  • We also check opt-in drivers tend to have lower risk
## Visualize lambda and monitoring opt-in
g_lambda_monitoring <- lambda %>% 
  dplyr::mutate(monitoring = ifelse(m == 1, "Monitored", "Unmonitored")) %>% 
  ggplot(aes(x = monitoring, y = lambda)) + 
  geom_boxplot() + 
  scale_y_continuous(limits=c(0, 0.1))

plot(g_lambda_monitoring)

1.2 Generate (Expectation of) Monitoring Score and Out-of-Pocket

  • Then, we generate expectation of monitoring score and severity, which is necessary for expectation of out-of-pocket, \(e(C, y_{d})\) and renewal price multiplier \(R_{idt}(C, s)\).
  • Score \(s\) follows log-normal distribution with an individual mean \(\mu_{s, i}\) and precision \(\sigma_{s}\)
  • Assuming that \(\mu_{s, i} = \theta_{s, 1} + \theta_{s, 2}\log(\lambda_{i}) + \theta_{s, 3}x_{1, i} + \theta_{s, 4}x_{2, i}\), the expectation of score is given by \(\exp(\mu_{s, i} + \frac{\sigma_{s}^{2}}{2})\)
  • Out-of-pocket (OOP), is that consumers have to pay when an accident occurs due to deductible and/or policy limit
  • We assume that every plan has only policy limit
  • In the paper, for simplicity, consumers only consider the possibility of one claim occurrence per term in expectation
  • Then, expectation of OOP is given by:

\[ \begin{align} E[e(C, y_{d})] &= \Pr(C = 1) \times \int_{y_{0}}^{\infty} (x - y_{0})a_{l} l_{0}^{a_{l}} x^{- a_{l} - 1}dx \\ &= \lambda \exp(-\lambda) \times \left\{ \frac{l_{0}^{a_{l}}}{a_{l} - 1} y_{0}^{-a_{l} + 1} \right\} \end{align} \]

## Generate expectations of score
theta_score <- c(-3, -0.5, 2, 1)
sigma_score <- 0.25

score <- lambda %>% 
  dplyr::filter(t == 0, m == 1) %>% 
  dplyr::mutate(mu_score = theta_score[1] + theta_score[2]*log(lambda)
                + theta_score[3]*x_1 + theta_score[4]*x_2,
                E_score = exp(mu_score + sigma_score^2 / 2)) %>% 
  dplyr::select(i, m, t, mu_score, E_score)

## Visualize Score
g_mu_score <- score %>% 
  ggplot(aes(x = mu_score)) + 
  geom_histogram(binwidth = 0.1)

plot(g_mu_score)
## Expectation of Out-of-Pocket
policy_limit <- 1000
l0 <- 500
a_l <- 2.5

E_oop <- lambda %>% 
  dplyr::mutate(prob_C1 = lambda * exp(-lambda),
                E_oop = prob_C1 * (l0^(a_l)/(a_l - 1))*policy_limit^(-a_l + 1),
                E_oop_sq = prob_C1 * ((2 * l0^(a_l)) / ((a_l - 1) * (a_l - 1))) * policy_limit^(-a_l + 2)) %>% 
  dplyr::select(i, m, t, E_oop, E_oop_sq)

## Visualize OOP
g_oop <- E_oop %>% 
  ggplot(aes(x = E_oop)) + 
  geom_histogram(binwidth = 0.5)

plot(g_oop)

1.3 Generate Choice in \(t = 0\)

  • In the paper, consumer’s realized choice utility is modeled as follows:

\[ \begin{align} u_{idt}(C, s) &= u_{\gamma}(w_{it} + h_{idt}(C, s)) \\ h_{idt}(C, s) &= -p_{idt} - \mathbf{1}_{d, t-1} \cdot \psi_{idt} - e(C, y_{d}) - p_{idt} \cdot R_{idt}(C, s) \\ \text{where } \psi_{idt} &= \mathbf{1}_{d, t-1} \cdot \eta_{0} + \mathbf{1}_{f_{d}, t-1} \cdot \eta_{it} + \mathbf{1}_{m_{d}} \cdot \mathbf{1}_{t=0} \cdot \xi_{it} \end{align} \]

  • At each period \(t\), consumer \(i\) chooses \(d\) so as to maximize her expected utility:

\[ \begin{align} d_{it} &= \mathop{\rm argmax}\limits_{d \in D_{it}} \left\{ v_{idt} + \varepsilon_{idt} \right\} \\ \text{where } v_{idt} &= \mathbb{E}_{C, s}[u_{idt}(C, s)] = \mathbb{E}[h_{idt}] -\frac{\gamma}{2}\mathbb{E}[h_{idt}^{2}] \end{align} \]

  • Thus, we start with generating each component of \(h_{idt}\).
price_par_t0_d1 <- c(15, 3, 2, 3, -2)
price_par_t0_d2 <- c(15, 3, 2, 3, -2)
price_par_t0_d3 <- c(15, 1, 3, 2, -2)
price_par_t0_d4 <- c(15, 4, 5, 1, -2)

price_gen_d <- function(choice_t0, D, price_par_t0){
  output <- choice_t0 %>% 
    dplyr::filter(d == D) %>% 
    dplyr::mutate(price = 
                    price_par_t0[1] +
                    price_par_t0[2]*x_1 + 
                    price_par_t0[3]*x_2 + 
                    price_par_t0[4]*x_3 +
                    price_par_t0[5]*m +
                    rnorm(N, 1, 1)) %>% 
    dplyr::select(i, d, t, m, f, price)
}

## Generate Price
price_gen_t0 <- function(choice_t0,
                         price_par_t0_d1, price_par_t0_d2,
                         price_par_t0_d3, price_par_t0_d4){
  
  price <- price_gen_d(choice_t0, 1, price_par_t0_d1) %>% 
    rbind(price_gen_d(choice_t0, 2, price_par_t0_d2)) %>% 
    rbind(price_gen_d(choice_t0, 3, price_par_t0_d3)) %>% 
    rbind(price_gen_d(choice_t0, 4, price_par_t0_d4))
  
  output <- choice_t0 %>% 
    dplyr::left_join(price, by = c("i", "d", "t", "m", "f"))
  
  return(output) 
}


## Generate Prior Firm
prior_firm_t0 <- expand.grid(i = 1:N) %>% 
  tibble::as_tibble() %>% 
  dplyr::mutate(prior_firm = sample(1:3, size = N, replace = T, prob = c(0.4, 0.35, 0.25)))


## True Parameters
### Inertia
eta_f <- c(5, 1, 2, 2)
xi <- c(1, -2.5)

### Renewal Price
alpha_m0 <- c(20, 5, 3)
alpha_m1 <- c(20, 5, 3, -1)
beta <- 20

### Risk Aversion
risk_aversion <- 10^(-4)


## Generate Choice
choice_t0 <- expand.grid(i = 1:N, d = 1:D, t = 0) %>% 
  tibble::as_tibble() %>% 
  dplyr::mutate(m = ifelse(d == 1, 1, 0),
                f = dplyr::case_when(
                  d <= 2 ~ 1,
                  d == 3 ~ 2,
                  d == 4 ~ 3)) %>% 
  dplyr::left_join(lambda, by = c("i", "m", "t")) %>% 
  dplyr::left_join(score, by = c("i", "m", "t")) %>% 
  dplyr::left_join(E_oop, by = c("i", "m", "t")) %>% 
  price_gen_t0(price_par_t0_d1, price_par_t0_d2, price_par_t0_d3, price_par_t0_d4) %>% 
  dplyr::left_join(prior_firm_t0, by = c("i")) %>% 
  dplyr::mutate(inertia = eta_f[1] + eta_f[2]*x_1 + eta_f[3]*x_2 + eta_f[4]*x_3,
                monitoring_disutility = xi[1] + xi[2]*log(lambda),
                demand_friction = ifelse(f == prior_firm, 0, inertia) + 
                  ifelse(d == 1, monitoring_disutility, 0)) %>% 
  dplyr::mutate(E_R_s = ifelse(m == 0, 
                           (alpha_m0[1] + alpha_m0[2]*x_1 + alpha_m0[3]*x_2) / beta,
                           (alpha_m1[1] + alpha_m1[2]*x_1 + alpha_m1[3]*x_2 + alpha_m1[4]*E_score) / beta),
                E_R_s_sq = E_R_s / beta,
                E_R_C = 0.98 * exp(-lambda) + 1.2 * (1 - exp(-lambda)),
                E_R_C_sq = 0.98^2 * exp(-lambda) + 1.2^2 * (1 - exp(-lambda)),
                E_renewal_price = price * E_R_s * E_R_C,
                E_renewal_price_sq = price^2 * E_R_s_sq * E_R_C_sq) %>% 
  dplyr::mutate(h = - price - demand_friction - E_oop - E_renewal_price,
                h_sq = price^2 - demand_friction^2 + E_oop_sq + E_renewal_price_sq
                + 2*price*demand_friction + 2*price*E_oop + 2*price*E_renewal_price
                + 2*demand_friction*E_oop + 2*demand_friction*E_renewal_price
                + 2*E_oop*E_renewal_price)

# draw idiosyncratic shocks
e_t0 <- evd::rgev(dim(choice_t0)[1])

choice_t0 <- cbind(choice_t0, e_t0) %>% 
  dplyr::mutate(v = h - (risk_aversion / 2)*h_sq,
                u = v + e_t0) %>% 
  dplyr::group_by(i, t) %>% 
  dplyr::mutate(choice = ifelse(u == max(u), 1, 0)) %>% 
  dplyr::ungroup()

summary(choice_t0)

summarize_choice <- choice_t0 %>% 
  dplyr::group_by(d) %>% 
  dplyr::summarise(share = mean(choice))

summarize_choice
  • The figure shows generated prices of each plan.
  • Monitoring plan provided by firm 1 is slightly lower than other plans, representing the opt-in discount
## Visualize Price in t = 0
g_price_t0 <- choice_t0 %>% 
  dplyr::mutate(plan = dplyr::case_when(d == 1 ~ "Firm1 Monitoring",
                                        d == 2 ~ "Firm1 No Monitoring",
                                        d == 3 ~ "Firm2",
                                        d == 4 ~ "Firm3")) %>% 
  ggplot(aes(x = price, fill = plan)) + 
  geom_histogram(position = "identity", alpha = 0.6, binwidth = 0.25)

plot(g_price_t0)
  • The following figure shows the expectation of \(R_{idt}(s)\), which implies that monitored consumers tend to have smaller expected renewal prices, incentivizing them to opt-in
## Visualize E_R_s in t = 0
g_E_R_s_t0 <- choice_t0 %>% 
  dplyr::filter(f == 1) %>% 
  dplyr::mutate(plan = dplyr::case_when(m == 1 ~ "Monitoring",
                                        m == 0 ~ "No Monitoring")) %>% 
  ggplot(aes(x = E_R_s, fill = plan)) + 
  geom_histogram(position = "identity", alpha = 0.6, binwidth = 0.01)

plot(g_E_R_s_t0)

1.4 Generate Realization of Cost in \(t = 0\)

  • Since consumer’s choice is path-dependent, i.e. price in next period depends on claims and monitoring score, we need to generate realization of costs in \(t = 0\)
  • Claim occurs according to Poisson distribution with mean \(\lambda\)
  • The monitoring score \(s\) is drawn according to a log-normal distribution with mean \(\mu_{s, i}\) and precision \(\sigma_{s}\)
  • Note that we do not need to generate realization of severity since it does not affect consumer’s choice in \(t = 1\)
## Claims Count
cost_t0 <- choice_t0 %>% 
  dplyr::filter(choice == 1) %>% 
  dplyr::mutate(claims = rpois(N, lambda = lambda))

## Severity
# max_claims <- max(cost_t0$claims)
# 
# claims_mat <- matrix(rep(0, N * max_claims), nrow = N)
# severity <- matrix(l0*runif(N * max_claims)^(-1/a_l), nrow = N)
# 
# colnames(severity) <- paste("sev_", 1:max_claims, sep = "")
# 
# cost_t0 <- cbind(cost_t0, severity)

## Monitoring Score
cost_t0 <- cost_t0 %>% 
  dplyr::mutate(score = rlnorm(N, mu_score, sigma_score),
                score = ifelse(m == 1, score, 1),
                log_score = log(score)) %>% 
  dplyr::select(i, claims, matches("sev."), score, log_score)
## Visualize realized claims count
g_realized_claims_count <- cost_t0 %>%
  ggplot(aes(x = claims)) + 
  geom_histogram(binwidth = 0.5)

plot(g_realized_claims_count)
  • The following figure shows realized monitoring scores (in logarithm) for opt-in drivers
## Visualize realized score
g_realized_score <- cost_t0 %>%
  dplyr::filter(score != 1) %>% 
  ggplot(aes(x = log_score)) + 
  geom_histogram(binwidth = 0.25)

plot(g_realized_score)

1.5 Generate Choice in \(t = 1\)

## Prior Choice and Firm
prior_firm_t1 <- choice_t0 %>% 
  dplyr::filter(choice == 1) %>% 
  dplyr::mutate(prior_choice = d * choice,
                prior_firm = f * choice) %>% 
  dplyr::select(i, prior_choice, prior_firm)

### Prior Price
prior_price_t1 <- choice_t0 %>% 
  dplyr::left_join(prior_firm_t1, by = c("i")) %>% 
  dplyr::mutate(prior_price = price,
                flg = dplyr::case_when(prior_choice == 1 & d != 2 ~ 1,
                                       prior_choice > 1 & d != 1 ~ 1,
                                       TRUE ~ 0)) %>% 
  dplyr::filter(flg == 1) %>% 
  dplyr::select(i, f, prior_price) %>% 
  dplyr::arrange(i)

## Choice in t = 1
choice_t1 <- expand.grid(i = 1:N, d = 1:(D - 1), t = 1) %>% 
  tibble::as_tibble() %>% 
  dplyr::mutate(m = 0,
                f = d) %>% 
  dplyr::left_join(prior_firm_t1, by = c("i")) %>% 
  dplyr::left_join(prior_price_t1, by = c("i", "f")) %>% 
  dplyr::left_join(lambda, by = c("i", "m", "t")) %>% 
  dplyr::left_join(E_oop, by = c("i", "m", "t")) %>% 
  dplyr::left_join(cost_t0, by = c("i")) %>% 
  dplyr::mutate(gamma_alpha = ifelse(score == 1,
                                     alpha_m0[1] + alpha_m0[2]*x_1 + alpha_m0[3]*x_2,
                                     alpha_m1[1] + alpha_m1[2]*x_1 + alpha_m1[3]*x_2 + alpha_m1[4]*score)) %>% 
  dplyr::mutate(R_s = rgamma(N * (D - 1), gamma_alpha, beta),
                R_C = ifelse(claims == 0, 0.98, 1.2),
                price = prior_price * R_s * R_C) %>% 
  dplyr::mutate(inertia = eta_f[1] + eta_f[2]*x_1 + eta_f[3]*x_2 + eta_f[4]*x_3,
                demand_friction = ifelse(d == prior_firm, 0, inertia)) %>%   
  dplyr::mutate(E_R_s = ifelse(m == 0, 
                           (alpha_m0[1] + alpha_m0[2]*x_1 + alpha_m0[3]*x_2) / beta,
                           (alpha_m1[1] + alpha_m1[2]*x_1 + alpha_m1[3]*x_2 + alpha_m1[4]*E_score) / beta),
                E_R_s_sq = E_R_s / beta,
                E_R_C = 0.98 * exp(-lambda) + 1.2 * (1 - exp(-lambda)),
                E_R_C_sq = 0.98^2 * exp(-lambda) + 1.2^2 * (1 - exp(-lambda)),
                E_renewal_price = price * E_R_s * E_R_C,
                E_renewal_price_sq = price^2 * E_R_s_sq * E_R_C_sq) %>% 
  dplyr::mutate(h = - price - demand_friction - E_oop - E_renewal_price,
                h_sq = price^2 - demand_friction^2 + E_oop_sq + E_renewal_price_sq
                + 2*price*demand_friction + 2*price*E_oop + 2*price*E_renewal_price
                + 2*demand_friction*E_oop + 2*demand_friction*E_renewal_price
                + 2*E_oop*E_renewal_price) %>% 
  dplyr::arrange(i)

# draw idiosyncratic shocks
e_t1 <- evd::rgev(dim(choice_t1)[1])

choice_t1 <- cbind(choice_t1, e_t1) %>% 
  dplyr::mutate(v = h - (risk_aversion / 2)*h_sq,
                u = v + e_t1) %>% 
  dplyr::group_by(i, t) %>% 
  dplyr::mutate(choice = ifelse(u == max(u), 1, 0)) %>% 
  dplyr::ungroup()

summary(choice_t1)

summarize_choice_t1 <- choice_t1 %>% 
  dplyr::group_by(d) %>% 
  dplyr::summarise(share = mean(choice))

summarize_choice_t1
## Visualize Price in t = 1
g_price_t1 <- choice_t1 %>% 
  dplyr::mutate(plan = dplyr::case_when(d == 1 ~ "Firm1",
                                        d == 2 ~ "Firm2",
                                        d == 3 ~ "Firm3"),
                prior_plan = dplyr::case_when(prior_choice == 1 ~ "Firm1 Monitoring",
                                              prior_choice == 2 ~ "Firm1 No Monitoring",
                                              prior_choice == 3 ~ "Firm2",
                                              prior_choice == 4 ~ "Firm3")) %>% 
  ggplot(aes(x = price, fill = prior_plan)) + 
  geom_histogram(position = "identity", alpha = 0.6, binwidth = 0.25)

plot(g_price_t1)
## Visualize Realized R_s in t = 1
g_R_s_t1 <- choice_t1 %>% 
  dplyr::mutate(plan = dplyr::case_when(d == 1 ~ "Firm1",
                                        d == 2 ~ "Firm2",
                                        d == 3 ~ "Firm3"),
                prior_plan = dplyr::case_when(prior_choice == 1 ~ "Firm1 Monitoring",
                                              prior_choice == 2 ~ "Firm1 No Monitoring",
                                              prior_choice == 3 ~ "Firm2",
                                              prior_choice == 4 ~ "Firm3")) %>% 
  ggplot(aes(x = R_s, fill = prior_plan)) + 
  geom_histogram(position = "identity", alpha = 0.4, binwidth = 0.025)

plot(g_R_s_t1)

1.6 Merge Choice Data

df_t0 <- choice_t0 %>% 
  dplyr::select(i, d, t, m, f, x_1, x_2, x_3, x_4, price, prior_firm, choice)
df_t1 <- choice_t1 %>% 
  dplyr::select(i, d, t, m, f, x_1, x_2, x_3, x_4, price, prior_firm, choice, claims, score, R_s, R_C)

df <- dplyr::bind_rows(df_t0, df_t1) %>% 
  dplyr::arrange(t, i, d)