Methods

Our objective is to explain variation in the binary satisfaction measure, “Very Satisfied” and “Not Very Satisfied”. Let \(Y_i \in \lbrace VS, NVS \rbrace\) denote respondent \(i\)’s outcome. To account for the dichotomous outcome measure we use a probit model of the form \[ {\rm Pr}(Y_i={\rm VS} |\theta_{c[i]},\beta,X_i) = F(\mu + \theta_{c[i]} + \beta'X_i), \] where \(F(x)\) the normal cdf, \(c[i]\) is \(i\)’s county of residence and \(X_i\) is an array of predictor variables. \(\theta_{c[i]}\) is a county effect for \(i\)’s county while \(\beta\) caputures the effect of the predictor variables.

A Multilevel Model for \(\theta,\beta\)

All of the predictor variables are coded as categorical using the categories shown in the data section. To illustrate the basic structure of the model, assume - for simplicity - that \(X\) only consists of INCOME and AGE. The dimension of \(X_i\) would in this case be equal to the sum of the number of categories of INCOME and AGE and would have two 1s in the appropriate locations. We can write this model as \[ {\rm Pr}(Y_i={\rm VS} |\theta_{c[i]},\beta,X_i) = F(\mu + \theta_{c[i]} + \beta^{inc}_{j[i]} + \beta^{age}_{k[i]}), \] where \(j[i]\) is the income category which \(i\) belongs to, while \(k[i]\) is the age category which respondent \(i\) belongs to. Expressing the model this way (i.e., “effects” notation) is equivalent to the one using the “X” notation above.

We impose a multilevel structure on the effects of the model. This serves two purposes. First, while the full sample size of the data is very large, we wish to explore the effect of possibly deep interactions of the variables. For example, what is the difference in the probability of reporting being “Very Satisfied” for a Hispanic Male aged 20-24 and a Black Female aged 30-34? The sample cell sizes quickly become rather small as we increase the number of interactions in the model. A multilevel structure on \(\beta\) and \(\theta\) allows us to use efficient shrinkage estimation by pooling information within exchangeable groups of parameters (this) book is a good introduction to multilevel modeling). Second, by relying on a multilevel hierarchy of the parameters, we can employ fast Bayesian computational algorithms for parameter estimation. This second point is important since both sample size and parameter dimensionality is huge in this application.

To illustrate the multilevel specication, consider the basic model above with county, income and age effects. For the county effects we assume that \[ \theta_c \sim {\rm N}(\mu_\theta +\pi_ \gamma'Z_c,\tau_\theta^{-1}), \] where \(Z_c\) is a vector of county level predictors. Note that we can simply roll the \(Z_c\) vector into the \(X_i\) vector and without loss of generality assume that \[ \theta_c \sim {\rm N}(\mu_\theta,\tau^{-1}_\theta), \qquad c=1,\dots,K_c, \] where \(K_c\) is the number of counties. Similarly, for INCOME and AGE we assume \[ \beta^{inc}_j \sim {\rm N}(\mu_\mbox{inc},\tau^{-1}_\mbox{inc}), \qquad j=1,\dots,K_\mbox{inc}. \] \[ \beta^{age}_k \sim {\rm N}(\mu_\mbox{age},\tau^{-1}_\mbox{age}), \qquad k=1,\dots,K_\mbox{age}. \]

If the model also has an interaction, e.g., \[ {\rm Pr}(Y_i={\rm VS} |\theta_{c[i]},\beta,X_i) = F(\mu + \theta_{c[i]} + \beta^\mbox{female} X^\mbox{female}_i + \beta^\mbox{inc}_{j[i]} + \beta^\mbox{age}_{k[i]} + \beta^\mbox{female,inc}_{j[i]} X^\mbox{female}_i ), \] then the interaction is modelled in a similar fashion: \[ \beta^\mbox{female,inc}_j \sim {\rm N}(\mu_\mbox{female,inc},\tau^{-1}_\mbox{female,inc}), \qquad k=1,\dots,K_\mbox{inc}, \] where the gender effect is given a vague prior, e.g., \(\beta^\mbox{female} \sim N(0,10)\). We note that the model as specified above is over-parameterized. For example, we can add a constant to the intercept \(\mu\) and subtract the same constant from \((\mu_\theta,\mu_\mbox{inc},\mu_\mbox{age})\) without changing the likelihood function. We will estimate the model in the over-parameterized form and then post-process the estimates to get identified quantities. This is similar to the procedure used by Shirley and Gelman (2014). The section on estimation below provides the details.

We estimate two different specifications. In the first model we include Gender, Age and Income with all interactions (for Income and Age we only include interactions betweeen low Income and age, and high Income and Age). In the second model we add employment and health outcomes to the model.

Table 3. Model Specifications.
Model 1	Model 2	Multilevel?
County	County	Yes
Female	Female	No
Race	Race	No
Income	Income	Yes
Age	Age	Yes
Female x Race	Female x Race	No
Female x Income	Female x Income	Yes
Female x Age	Female x Age	Yes
Race x Income	Race x Income	Yes
Race x Age	Race x Age	Yes
Income_(0,10) x Age	Income_(0,10) x Age	Yes
Income_75+ x Age	Income_75+ x Age	Yes
Female x Race x Income	Female x Race x Income	Yes
Female x Race x Age	Female x Race x Age	Yes
	Employment	Yes
	Race x Employment	Yes
	Health	Yes
	Income_(0,10) x Health	Yes
	Income_75+ x Health	Yes
	Marital	Yes
	Female x Marital	Yes
	County.Gini	Yes
	County.Income	Yes
	County.Unemp	Yes
	County.Religion	Yes
	County.Race	Yes
	Race x County.Race	Yes

The total number of parameters contained in \((\theta,\beta)\) is 2122 for model 1 and 2240 for model 2.

Estimation

Estimating a probit model with more than 2,000 parameters and close to 1.4 Million observations is a non-trivial task. All conventional statistical software suites will not be able to handle this. We use a customized Markov Chain Monte Carlo (MCMC) algorithm for estimation. This section provides the technical details of this. Readers mainly interested in the results can skip this and move on to the results section.

MCMC Algorithm

We use an algorithm of the Gibbs sampler variety. We start with the well-known observation that the probit model can be represented as \[ Y_i = \begin{cases} VS & \mbox{if } Y^*_i \geq 0, \\ NVS & \mbox{if } Y^*_i < 0 \end{cases}, \qquad {\rm where}~~ Y^*_i \sim {\rm N}(\mu + \theta_{c[i]} + \beta'X_i,1). \]

We take advantage of this representation by augmenting the parameter vector by \(Y^* = \lbrace Y^*_i \rbrace_{i=1}^n\). To finish the specification of the full model we need to specify a list of prior distributions for the remaining model parameters.

For the mean and precision parameters for the exchangeable groups (e.g., \(\mu_\mbox{inc}\) and \(\tau_\mbox{inc}\) above), we assume \[ \mu_\mbox{var} \sim {\rm N}(0,\tau^{-1}_0),~~\tau_\mbox{var} \sim {\rm G}(a_0,b_0), \] where \(var\) is a predictor variable or interaction of variables (e.g., \(\theta\), income, age, female x income etc.), and \(G(a_0,b_0)\) is the Gamma distribution with shape \(a_0\) and rate \(b_0\). We set \(\tau_0=0.1\) and \(a_0=3,b_0=0.02\). This is a mildly informative prior for \(\tau\) which encourages shrinkage without ruling out the possibility of very low values of \(\tau\). Finally, for the \(\beta\) elements without a multilevel structure (gender and race), we assume \[ \beta^\mbox{female} \sim {\rm N}(0,100), ~\beta^\mbox{black} \sim {\rm N}(0,100),~ \beta^\mbox{female,black} \sim {\rm N}(0,100) , ~~\mbox{etc.} \]

Next we describe the details of the Gibbs sampler. This involves characterizing each of the conditional distributions of the joint data augmented posterior distribution.

Sampling \(Y\). The conditional for \(Y^*_i\) is a truncated normal distribution: \[ Y^*_i \sim \begin{cases} {\rm TN}_{(0,\infty)}(\mu + \theta_{c[i]} + \beta'X_i,1) & \mbox{if } Y_i=\mbox{VS}, \\ {\rm TN}_{(-\infty,0)}(\mu + \theta_{c[i]} + \beta'X_i,1) & \mbox{if } Y_i=\mbox{NVS.} \\ \end{cases} \]
Sampling \((\mu,\beta)\). The conditional for \(\tilde \beta \equiv (\mu,\beta)\) is a multivariate normal distribution: \[ \begin{eqnarray} \tilde \beta & \sim & {\rm N} \left( V \times ( X'Z(Y^*,\theta) + \Omega_\beta \pi_{\tilde \beta}), V \right), \\ V & = & \left( X'X + \Omega_{\tilde \beta} \right)^{-1}. \end{eqnarray} \] where \(X\) is the full \(N \times K\) matrix of predictor variables (including a column of ones for the intercept \(\mu\)), \(Z(Y^*,\theta)\) is an \(N\) vector with element \(i\) equal to \(Z_i \equiv Y^*_i - \theta_{c[i]}\), \(\Omega_{\tilde \beta}\) is a diagonal matrix with \(\tau\)-parameters along the diagonal arranged to be conformable with \(\tilde \beta\). For example, in the part of \(\tilde \beta\) where \(\beta^\mbox{inc}_j, j=1,\dots,K_\mbox{inc}\) are located, \(\Omega_{\tilde \beta}\) will have a corresponding \(\tau_\mbox{inc}\) (and \(\pi_{\tilde \beta}\) will have \(\mu_\mbox{inc}\) for the same locations).
Sampling \(\theta\). To sample \(\theta_c\)
Sampling \(\tau,\mu\). To sample

Due to the large sample size we coded up the algorithm using the Intel Visual Fortran XE Compiler 12.1 taking advantage of parallel processing capabilities. The source code is available here.

Methods

A Multilevel Model for \(\theta,\beta\)

Estimation

MCMC Algorithm

Post-processing