Logit Models

The basic tool for analysis in Larch is a discrete choice model. A model is a structure that interacts data with a set of ModelParameters.

Creating Model Objects

class larch.Model([d])
Parameters:d (Fountain) – The source data used to automatically populate model arrays. This can be either a DB or DT object (or another data provider that inherits from the abstract Fountain class). This parameter can be omitted, in which case data will not be loaded automatically and validation checks will not be performed when specifying data elements of the model.

This object represents a discrete choice model. In addition to the methods described below, a Model also acts a bit like a list of ModelParameter.

Model.Example(number=1)

Generate an example model object.

Parameters:number (int) – The code number of the example model to load. Valid numbers include {1,17,22,101,102,104,109,111,114}.

Larch comes with a few example models, which are used in documentation and testing. Models with numbers greater than 100 are designed to align with the example models given for Biogeme.

Adding Parameters

Model.parameter(name[, value, null_value, initial_value, max, min, holdfast])

Add a parameter to the model, or access an existing parameter.

Parameters:
  • name (str or larch.roles.ParameterRef) – The name for the parameter to add or access
  • value (float (optional)) – The value to set for the parameter. If initial_value is not given, this value is also used as the initial value. If not given, 0 is assumed.
  • null_value (float (optional)) – This is the assumed value for a “null” or no information model. For utility parameters, this is typically 0 (the default). For logsum parameters, the null value should usually be set to 1.
  • initial_value (float (optional)) – It is possible to set initial_value seperately from the current value. This can be useful to reconstruct an already-estimated model.
  • max (float (optional)) – If given, set a max bound for the parameter during the estimation process.
  • min (float (optional)) – If given, set a min bound for the parameter during the estimation process.
  • holdfast (int (optional)) – If nonzero, the parameter will be held fast (constrained) at the current value during estimation.
Returns:

ModelParameter

Defining Utility

Model.utility

A core.LinearBundle object representing the qualitative utility of alternatives in the model.

This is the fundamental building block of a discrete choice model based on utility maximization. You’ll likely want to use this in nearly every discrete choice model you create.

Model.quantity

A core.LinearFunction object representing the quantitative value of alternatives in the model.

Because the quantitative measure must be directly related to the alternatives and cannot be a function exclusively based on the decision maker, this term is not a complete core.LinearBundle but only a core.LinearFunction, which is applied on idca data only.

Also, the usual representation of a core.LinearFunction as the dot product of a data vector (which here we denote as \(Z\)) and a parameter vector (denoted with \(\gamma\)), giving \(\sum_{k} \gamma_k Z_k\). In the case of the quantity function, the data vector is treated similarly, but the parameter vector that enters the function is the element-wise exponential of the parameters, so \(\sum_{k} \exp(\gamma_k) Z_k\). This ensures that if all \(Z_k\) are non-negative and at least one \(Z_k\) is strictly positive, then the computed value of the core.LinearFunction will also be strictly positive.

When combined with the qualitative utility, this allows you to write a combined systematic utility function of the form:

\[V_i = \sum_{j} \beta_j X_{ij} + \theta \log \left( \sum_{k} \exp (\gamma_k) Z_{ik} \right)\]

See also

Estimating N in Aggregate Choice Models
More details on the mathematical implications for using quantity.
quantity_scale
The \(\theta\) in the equation above.

Nesting / GEV Network

Nested logit and Network GEV models have an underlying network structure.

Model.nest(id, name=None, parameter=None)

A function-like object mapping node codes to names and parameters.

This can be called as if it was a normal method of Model. It also is an object that acts like a dict with integer keys representing the node code numbers and larch.core.LinearComponent values.

Parameters:
  • id (int) – The code number of the nest. Must be unique to this nest among the set of all nests and all elemental alternatives.
  • name (str or None) – The name of the nest. This name is used in various reports. It can be any string but generally something short and descriptive is useful. If None, the name is set to “nest_{id}”.
  • parameter (str or None) – The name of the parameter to associate with this nest. If None, the name is used.
Other Parameters:
 
  • parent (int, optional) – The code number of the parent node of the nest, for which a link will automatically be created.
  • parents (list of ints, optional) – A list of code numbers for the parent nodes of the nest, for which links will automatically be created.
  • children (list of ints, optional) – A list of code numbers for the child nodes of the nest, for which links will automatically be created.
Returns:

The component object for the designated node

Return type:

larch.core.LinearComponent

Notes

Earlier versions of this software required node code numbers to be non-negative. They can now be any 64 bit signed integer.

Because the id and name are distinct data types, Larch can detect (and silently allow) when they are transposed (i.e. with name given before id).

Model.node

an alias for nest

Model.new_nest(nest_name=None, param_name='', branch=None, **kwargs)

Generate a new nest with a new unique code.

If you don’t want to bother managing the code numbers for nests and instead just work with them more abstractly, this handy function allows you to create a new nest node without worrying about the code number; an otherwise unused number will be selected for you (and returned by this method, so you can use it elsewhere).

Parameters:
  • nest_name (str or None) – The name of the nest. This name is used in various reports. It can be any string but generally something short and descriptive is useful. If None, the name is set to “nest_{id}”, although since you’re not picking your own id, this might not be the best way to go.
  • param_name (str) – The name of the parameter to associate with this nest. If not given, or given as an empty string, the nest_name is used.
  • branch (str or other immutable) – An optional label for the branch of the network that this nest is in. The new code will be populated into the set at model.branches[branch].
Other Parameters:
 
  • parent (int, optional) – The code number of the parent node of the nest, for which a link will automatically be created.
  • parents (list of ints, optional) – A list of code numbers for the parent nodes of the nest, for which links will automatically be created.
  • children (list of ints, optional) – A list of code numbers for the child nodes of the nest, for which links will automatically be created.
Returns:

The code for the newly created nest.

Return type:

int

Notes

It may be convenient to give all of the parent and child linkages when calling this function, but it is not necessary, as linkages can be created seperately later.

Model.new_node()

an alias for new_nest()

A function-like object defining links between network nodes.

Parameters:
  • up_id (int) – The code number of the upstream (i.e. closer to the root node) node on the link. This should never be an elemental alternative.
  • down_id (int) – The code number of the downstream node on the link. This can be an elemental alternative.
Model.edge

an alias for link

Model.root_id

The root_id is the code number for the root node in a nested logit or network GEV model. The default value for the root_id is 0. It is important that the root_id be different from the code for every elemental alternative and intermediate nesting node. If it is convenient for one of the elemental alternatives or one of the intermediate nesting nodes to have a code number of 0 (e.g., for a binary logit model where the choices are yes and no), then this value can be changed to some other integer.

Model.graph

A networkx.DiGraph() representing the nesting structure.

You can use this DiGraph to explore the network structure, and use standard networkx tools to describe and iterate over the graph. Note that this is a read-only attribute; changes to network (nesting) structure must be made using Model.link and Model.nest.

Raises:ImportError – If the networkx module is not installed.
Model.quantity_scale

The scale (logsum) coefficient on the quantity term.

Assign a ParameterRef, or more generally a string naming the parameter, to this attribute to define it. The parameter is coefficient is applied outside the log of quantity in a model that includes a quantity term, serving as \(\theta\) in the expression:

\[V_i = \sum_{j} \beta_j X_{ij} + \theta \log ( \sum_{k} \exp (\gamma_k) Z_{ik} )\]

Note that this parameter is meaningless if the quantity is not defined or otherwise empty.

If the quantity_scale attribute is not defined for a model that does include a quantity function, the coefficient is assumed to be equal to one.

If creating a discrete choice model that includes both utility quantity_scale and other GEV network elements (i.e. nests), it will be necessary to constrain the value for this parameter to be smaller than any other logsum parameter in the model to remain consistent with utility maximization.

See also

Non-Arbitrary Boundaries in Aggregate Choice Models
More details on the mathematical implications for using quantity_scale
quantity
The quantity function

Using Model Objects

Model.maximize_loglike()

Find the likelihood maximizing parameters of the model, using the scipy.optimize module. Depending on the model type and structure, various different optimization algorithms may be used.

Model.roll(filename=None, loglevel=20, cats='-', use_ce=False, sourcecode=True, maxlik_args=(), cache_data=False, **format)

Estimate a model and generate a report.

This method rolls together model estimation, reporting, and saving results into a single handy function.

Parameters:
  • filename (str, optional) – The filename into which the output report will be saved. If not given, a temporary file will be created. If the given file already exists, a new file will be created with a number appended to the base filename.
  • loglevel (int, optional) – The log level that will be used while estimating the model. Smaller numbers result in a more verbose log, the contents of which appear at the end of the HTML report. See the standard Python logging module for more details.
  • cats (list of str, or '-' or '*') – A list of report sections to include in the report. The default is ‘-‘, which includes a minimal list of report setions. Giving ‘*’ will dump every available report section, which could be a lot and might take a lot of time (and computer memory) to compute.
  • sourcecode (bool) – If true (the default), this method will attempt to access the source code of the file where this function was called, and insert the contents of that file into a section of the resulting report. This is done because the source code may be more instructive as to how the model was created, and how different (but related) future models might be created.
Model.estimate()

Deprecated since version 3.3: Use Model.maximize_loglike() instead

Find the likelihood maximizing parameters of the model using deprecated Larch optimization engine. This engine has fewer algorithms available than the scipy.optimize and may perform poorly for some model types, particularly cross-nested and network GEV models. Users should almost always prefer the Model.maximize_loglike() function instead.

Model.loglike([values])

Find the log likelihood of the model.

Parameters:values (array-like, optional) – If given, an array-like vector of values should be provided that will replace the current parameter values. The vector must be exactly as long as the number of parameters in the model (including holdfast parameters). If any holdfast parameter values differ in the provided values, the new values are ignored and a warning is emitted to the model logger.
Model.d_loglike([values])

Find the first derivative of the log likelihood of the model, with respect to the parameters.

Parameters:values (array-like, optional) – If given, an array-like vector of values should be provided that will replace the current parameter values. The vector must be exactly as long as the number of parameters in the model (including holdfast parameters). If any holdfast parameter values differ in the provided values, the new values are ignored and a warning is emitted to the model logger.
Returns:An array of partial first derivatives with respect to the parameters, thus matching the size of the parameter array.
Return type:array
Model.preserve_casewise_logsums

A bool that indicates if case-wise logsums shall be preserved when calculating the log likelihood.

If you don’t need them for anything in particular, there’s no reason to use memory to save them. But if you do need them, it’s much easier to save them than recreate them. The default value for this attribute is False.

Model.setUp()

Set up the model for estimation or calculation.

This method will set up the necessary data arrays and other structures necessary to efficiently process calculations using the model. It should generally be called just before estimation (it is called for you if needed by the maximize_loglike() method) and after all model parameters and attributes are given.

Reporting Tools

Model.title

This string is a descriptive title to attach to this model. It is used in certain reports, and can be set to any string. It has no bearing on the numerical representation of the model.

Model.reorder_parameters(*ordering)

Reorder the parameters in the model.

This method reorders the model parameters as they appear in the model. It should have no material impact on the model results, although it may be convenient for presentation.

The ordering is defined by a series of regular expressions (regex). For each regex, all of the parameter names matching that regex are grouped together and moved to the front of the list, retaining their original ordering within the group. Any subsequent matches for the same parameter are ignored. All unmatched parameters retain their original ordering and move to the end of the list as a group.

Parameters:ordering (list or tuple of str) – A list of regex expressions.

Examples

>>> import larch
>>> m = larch.Model()
>>> m.parameter("A1", value=1)
ModelParameter('A1', value=1.0)
>>> m.parameter("B1", value=2)
ModelParameter('B1', value=2.0)
>>> m.parameter("C1", value=3)
ModelParameter('C1', value=3.0)
>>> m.parameter("A2", value=4)
ModelParameter('A2', value=4.0)
>>> m.parameter("B2", value=5)
ModelParameter('B2', value=5.0)
>>> m.parameter("C2", value=6)
ModelParameter('C2', value=6.0)
>>> m.parameter_names()
['A1', 'B1', 'C1', 'A2', 'B2', 'C2']
>>> m.parameter_values()
(1.0, 2.0, 3.0, 4.0, 5.0, 6.0)
>>> m.reorder_parameters('A','C')
>>> m.parameter_names()
['A1', 'A2', 'C1', 'C2', 'B1', 'B2']
>>> m.parameter_values()
(1.0, 4.0, 3.0, 6.0, 2.0, 5.0)

Troubleshooting

Model.doctor(clash=None)

Analyze the model and data and look for problems.

This function will look for problems with your model or the underlying data, and alert you to them. Exactly what is checked for may vary (generally expand) in future version. These checks may be computationally expensive so the are not completed automatically on every model run, but if you are experiencing difficulty converging or errors in estimation, try this.

Parameters:clash ({None, '+', '-'}) – When a case has an alternative that is chosen but not available, it can be repaired by making the chosen alternative available (‘+’) or by making it not chosen (‘-‘) or left alone (None, the default). Note that making it not chosen will quite possibly result in the entire case having no chosen alternative, which can introduce numerical problems.