17: Better MTC MNL Mode Choice

For this example, we’re going to create a richer and more sophisticated mode choice model, using the same MTC data. We’ll jump straight to the preferred model 17 from the Self Instructing Manual.

To build that model, we are going to have to create some variables that we don’t already have: cost divided by income, and out of vehicle travel time divided by distance. The tricky part is that cost and time are idca Format variables, and income and distance are idco Format variables, in a different table. Fortunately, we can use SQL to pull the data from one table to the other, but first we’ll set ourselves up to do so efficiently.

d = larch.DB.Example('MTC')
d.execute("CREATE INDEX IF NOT EXISTS data_co_casenum ON data_co (casenum);")

The index we create here on the idco Format table will allow SQLite to grab the correct row from the data_co table almost instantly (more or less) each time, instead of having to search through the whole table for the matching caseid. Once we have this index, we can write a couple UPDATE queries to build our two new idca Format variables:

d.add_column("data_ca", "costbyincome FLOAT")
qry1="UPDATE data_ca SET costbyincome = 1.0*totcost/(SELECT hhinc FROM data_co WHERE data_co.casenum=data_ca.casenum)"
d.execute(qry1)

d.add_column("data_ca", "ovtbydist FLOAT")
qry2="UPDATE data_ca SET ovtbydist = 1.0*ovtt/(SELECT dist FROM data_co WHERE data_co.casenum=data_ca.casenum)"
d.execute(qry2)

In each block, we first add a new column to the data_ca table, and then populate that column with the calculated values. Now we are ready to build our model.

m = larch.Model(d)

from larch.roles import P,X
m.utility.ca = (
        + X("costbyincome") * P("costbyincome")
        + X("tottime * (altnum IN (1,2,3,4))") * P("motorized_time")
        + X("tottime * (altnum IN (5,6))") * P("nonmotorized_time")
        + X("ovtbydist * (altnum IN (1,2,3,4))") * P("motorized_ovtbydist")
)

The costbyincome data is already computed above so we can add it to the model very simply. In our preferred specification, we want to differentiate the total travel time by motorized modes (1 to 4) and non-motorized modes (5 and 6), which we can do by specifying some math inside the data string. Often the data string is just the name of a column as we have seen before, but it can also be any valid SQLite expression that can be evaluated on the relevant master query (either larch_idca or larch_idco).

m.utility.co[4] = P("hhinc#4") * X("hhinc")
m.utility.co[5] = P("hhinc#5") * X("hhinc")
m.utility.co[6] = P("hhinc#6") * X("hhinc")

Since the model we want to create groups together DA, SR2 and SR3+ jointly as reference alternatives with respect to income, we can simply omit all of these alternatives from the block that applies to hhinc.

For vehicles per worker, the preferred model include a joint parameter on SR2 and SR3+, but not including DA and not fixed at zero. Here we might use a shadow_parameter, which allows us to specify one or more parameters that are simply a fixed proportion of another parameter. For example, we can say that vehbywrk_SR2 will be equal to 1.0 times vehbywrk_SR.

m.parameter("vehbywrk_SR")
m.shadow_parameter.vehbywrk_SR2 = m.parameter.vehbywrk_SR
m.shadow_parameter["vehbywrk_SR3+"] = m.parameter.vehbywrk_SR

Having defined these parameter aliases, we can then loop over all alternatives (skipping DA in the index-zero position) to add vehicles per worker to the utility function:

for a,name in m.df.alternatives()[1:]:
        m.utility[a] += X("vehbywrk") * P("vehbywrk_"+name)

We can also run similar loops over workplace in CBD, etc:

for a,name in m.df.alternatives()[1:]:
        m.utility[a] += X("wkccbd+wknccbd") * P("wkcbd_"+name)

for a,name in m.df.alternatives()[1:]:
        m.utility[a] += X("wkempden") * P("wkempden_"+name)

for a,name in m.df.alternatives()[1:]:
        m.utility[a] += P("ASC_"+name)

m.option.calc_std_errors = True

We didn’t explicitly define our parameters first, which is fine; Larch will find them in the utility functions (or elsewhere in more complex models). But they may be found in a weird order that is hard to read in reports. We can define an ordering scheme by assigning to the parameter_groups attribute, like this:

m.parameter_groups = (
        "costbyincome",
        ".*time",
        ".*dist",
        "hhinc.*",
        "vehbywrk.*",
        "wkcbd.*",
        "wkempden.*",
        "ASC.*",
)

Having created this model, we can then estimate it:

>>> result = m.maximize_loglike()
>>> result.message
'Optimization terminated successfully...

>>> m.loglike()
-3444.1...

>>> print(m)
====================================================================================================
Model Parameter Estimates
----------------------------------------------------------------------------------------------------
Parameter               InitValue       FinalValue      StdError        t-Stat          NullValue
costbyincome             0              -0.0524213       0.0104042      -5.03849         0
motorized_time           0              -0.0201867       0.00381463     -5.2919          0
nonmotorized_time        0              -0.045446        0.00576857     -7.87821         0
motorized_ovtbydist      0              -0.132869        0.0196429      -6.76423         0
hhinc#4                  0              -0.00532375      0.00197713     -2.69266         0
hhinc#5                  0              -0.00864285      0.00515439     -1.67679         0
hhinc#6                  0              -0.00599738      0.00314859     -1.90478         0
vehbywrk_SR              0              -0.316638        0.0666331      -4.75196         0
vehbywrk_Tran            0              -0.946257        0.118293       -7.99925         0
vehbywrk_Bike            0              -0.702149        0.258287       -2.71849         0
vehbywrk_Walk            0              -0.72183         0.169392       -4.26131         0
wkcbd_SR2                0               0.259828        0.123353        2.10638         0
wkcbd_SR3+               0               1.06926         0.191275        5.59018         0
wkcbd_Tran               0               1.30883         0.165697        7.89889         0
wkcbd_Bike               0               0.489274        0.361098        1.35496         0
wkcbd_Walk               0               0.101732        0.252107        0.403529        0
wkempden_SR2             0               0.00157763      0.000390357     4.04152         0
wkempden_SR3+            0               0.00225683      0.000451972     4.9933          0
wkempden_Tran            0               0.00313243      0.00036073      8.68358         0
wkempden_Bike            0               0.00192791      0.00121547      1.58614         0
wkempden_Walk            0               0.00289023      0.000742102     3.89465         0
ASC_SR2                  0              -1.80782         0.106123       -17.035          0
ASC_SR3+                 0              -3.43374         0.151864       -22.6106         0
ASC_Tran                 0              -0.684817        0.247816       -2.7634          0
ASC_Bike                 0              -1.62885         0.427399       -3.81108         0
ASC_Walk                 0               0.0682096       0.348001        0.196004        0
====================================================================================================
Model Estimation Statistics
----------------------------------------------------------------------------------------------------
Log Likelihood at Convergence           -3444.19
Log Likelihood at Null Parameters       -7309.60
----------------------------------------------------------------------------------------------------
Rho Squared w.r.t. Null Parameters      0.529
====================================================================================================
...

Tip

If you want access to the model in this example without worrying about assembling all the code blocks together on your own, you can load a read-to-estimate copy like this:

m = larch.Model.Example(17)