4.4 Model Calibration and Validation |

The process of Model Calibration and Validation is vital to producing defensible travel demand forecasts. Florida standards for Model Calibration and Validation were initially defined as part of the Model Update series of studies in the early 1980s. It is recognized that different model applications require a variety of model validation checks and, in some cases, accuracy standards and guidelines. The FDOT has led the development of a validation checklist organized by model application type and the four steps generally used in Travel Demand Modeling, identifying calibration and validation checks, standards, and benchmarks for LRTPs, subarea studies, FTA New Starts, and corridor studies. Models serve engineering and planning applications, each with distinct requirements for sophistication and accuracy, driving the standards based on application needs and requirements. Through this process, Travel Demand Models gain credibility, ensuring they accurately inform decisions in Transportation Planning and Project Development. Model Calibration and Validation serves several purposes, such as providing a level of comfort to modelers, planners, policy and decision makers, and, to some extent, the general public that the model is able to produce accurate results; provide evidence that model results are accurate enough to be used for the desired planning analyses; and account for the errors in observed data used for comparisons. Balancing model complexity and fidelity through this process empowers decision-makers to shape future mobility and project development with reliability.

Overview of Model Calibration and Validation

In Florida, the terms “calibration” and “validation” have typically been distinguished as follows:

Model Calibration – A process where models are adjusted to simulate or match observed household travel behavior in the study area for a base (calibration) year.
Model Validation – The procedure used to adjust models to simulate base year observed data, such as traffic counts and transit ridership figures.

Model Calibration implies the availability of household travel survey data to adjust the model constants and parameters to match observed trip generation rates, trip length frequency distributions, aggregate trip movements, and mode shares. Model validation could include some components of calibration if household survey data is available; however, survey data is not required in adjusting the model to match traffic counts. The calibration and validation guidelines and standards represent optimum levels of accuracy. Achieving the accuracy standards and benchmarks does not ensure that the model was developed correctly, as all assumptions and adjustments to model parameters during calibration and validation must be defensible and documented.

Validation also consists of reasonableness and sensitivity checks beyond matching base year travel conditions. The standards therefore include such checks as the reasonableness of model outputs and the elasticities of demand, with respect to input variables. It should always be remembered that the purpose of the travel model is to estimate or forecast travel conditions for some alternative scenario(s) other than the existing situation. Inclusion of factors, constants, or parameters that do not vary between the base and alternative scenarios implies that what is represented by these parameters does not change between the scenarios. The more a model relies on such parameters, the less explanatory capability it has.

Subarea Model Validation for Project Traffic Forecasting

Subarea transportation studies are becoming increasingly popular in addressing growth management issues at the local level, including Local Government Comprehensive Plans (LGCPs), master plans, subarea studies, proportionate share, and impact fees. Subarea transportation models often include splitting of the regional model TAZs, reevaluating base year and future year socioeconomic estimates, and adding roadways to the model network that are important for local traffic circulation, but not necessarily needed at the regional level.

Validation of the Regional Transportation Model should be completed and approved for use by the FDOT and the local MPO prior to developing a subarea model. Not all model validation checks required for LRTPs and FTA New Starts projects are needed at the subarea level as some of these would potentially be redundant. The subarea should be defined within the model by designating districts and sectors to summarize TAZ and network information for the subarea. Some statistics should be compared between the subarea and regional level to ensure the subarea model validation does not disrupt regional model accuracy should the subarea model be used later for other purposes. A sample of validation measures to compare between subarea and regional levels may include the following:

Input Data – A primary focus of validating a subarea model and include review of socioeconomic data and highway and transit networks.
Trip Generation – Review and comparison of subarea against the regional model based on aggregate trip rates (e.g., trips/person, trips/DU, Home Based Work trips/employee).
Trip Distribution – Comparisons on average trip length and percent intrazonal trips by purpose.
Mode Choice – If the subarea includes transit access, mode shares should be reviewed within the subarea against local data or use professional judgment.
Trip Assignment – Highway validation checks on volume-over-count, VMT-over-count, VHT-over-count, screenline volume-over-count, RMSE, and percent error.

It may be desirable to add cutlines or modify screenlines to better assess trip patterns into, out of, and through the subarea. If the subarea has major freight generators, a review of percent trucks or truck VMT should also be conducted. If there is significant growth between the model base year and the existing year, it is recommended to use the existing year for subarea model validation.

Model Validation Standards in Florida

The accuracy of the Base Year Model is measured by the difference between the model’s outputs and existing conditions. There are many tests to determine the level of accuracy of a model, but for project-level travel forecasting purposes, the focus is on the quality of traffic volumes produced by the model. The FSUTMS-Cube Framework Phase II – Model Calibration and Validation Standards establishes guidelines for model validation at regional, as well as corridor levels. There are two measures that are often used to quantify the differences between model volumes and traffic counts. One is the Volume-Over-Count (V/C) Ratio expressed as a decimal or a percent. V/C ratios can be summarized by area type, facility type, and number of lanes; daily or peak periods; screenlines, cutlines, and cordon lines; and using estimates based on Vehicle Miles Traveled (VMT) and Vehicle Hour Traveled (VHT) calculations.

The other measure to quantify the difference between model volumes and traffic counts is the Root Mean Square Error (RMSE). RMSE is a measure of dispersion and tends to normalize model error better than volume-over-count ratios that allow for high ratios to offset low ratios. The RMSE is often calculated as percent RMSE versus average traffic counts. The formula for calculating %RMSE is shown as follows::

$$\%RMSE = \frac{\sqrt{\frac{\sum_{i = 1}^{n}(V_i - C_i)^2}{n-1}}}{\frac{\sum_{i=1}^{n}{C_i}}{n}}\times{100}$$

Equation 4-1

Where:

$V_i$ = model volume for a roadway segment
$Ci$ = traffic count for the same roadway segment
$n$ = number of roadway segments with traffic counts

4.4.3.1 Regionwide Model Accuracy Assessment

4.4.3.1.1 Volume-Over-Count Ratios by Facility Types and Screenlines

Table 4-1 presents the acceptable and preferable V/C ratios expressed as percentages for regionwide model validations as recommended in the FSUTMS-Cube Framework Phase II Model Calibration and Validation Standards. Prior to using a travel demand forecasting model for project traffic analysis, it is important to verify that the model has been validated to meet the validation standards. The Highway Evaluation Report (HEVAL) module or similar routines are included in FSUTMS models to perform system evaluation activities and to assist in validating a model. The output includes information such as VMT, VHT, average travel speed, comparisons of model volumes with observed traffic counts, and summary statistics that can be used to evaluate the model validation results.

4.4.3.1.2 Percent RMSE by Volume Groups

Percent errors have historically reflected a “plus or minus one lane” criteria in Florida. This concept means that highway assignment accuracy should minimize incorrect future lane calls resulting from projected traffic. Percent error standards are typically established by volume groups with small percent errors allowed for high-volume groups and larger percent errors for low-volume groups. Table 4-2 depicts a range of accepted and preferable accuracy ranges for eight (8) volume groups, as recommended in the FSUTMS-Cube Framework Phase II Model Calibration and Validation Standards. RMSE can also be summarized by screenlines, if needed. In addition, the volume differences can also be reviewed visually by using scatter plots of model estimated volumes versus counts.

Table 4-1 Regionwide Model Accuracy Volume-Count-Ratios

Volume-Over-Count Ratios	Standards
Volume-Over-Count Ratios	Acceptable	Preferable
Facility Type
Freeway Volume-over-Count (FT1x, FT8x, FT9x)	+/- 7%	+/- 6%
Divided Arterial Volume-over-Count (FT2x)	+/- 15%	+/- 10%
Undivided Arterial Volume-over-Count (FT3x)	+/- 15%	+/- 10%
Collector Volume-over-Count (FT4x)	+/- 25%	+/- 20%
One way/Frontage Road Volume-over-Count (FT6x)	+/- 25%	+/- 20%
Peak Period
Freeway Peak Volume-over-Count	75% of links @ +/-20%	50% of links @ +/-10%
Major Arterial Peak Volume-over-Count	75% of links @ +/-30%	50% of links @ +/-15%
VMT/VHT
Assigned VMT-over-Count Areawide	+/-5%	+/-2%
Assigned VHT-over-Count Areawide	+/-5%	+/-2%
Assigned VMT-over-Count by FT/AT/NL	+/-25%	+/-15%
Assigned VHT-over-Count by FT/AT/NL	+/-25%	+/-15%
Screenlines/Cut lines
External Model Cordon Lines	+/- 1%	-
Screenlines with greater than 70,000 AADT	+/-10%	-
Screenlines with 35,000 to 70,000 AADT	+/- 15%	-
Screenlines with less than 35,000 AADT	+/-20%	-

Source: FSUTMS-Cube Framework Phase II Model Calibration and Validation Standards, Table 2.9, “Volume-Over- Count Ratios and Percent Error”, and discussions on Page 2-19.

Table 4-2 Regionwide Model Accuracy Assessment Percent RMSE

Volume Range, Vehicles Per Day	Standards
Volume Range, Vehicles Per Day	Acceptable	Preferable
LT 5,000	100%	45%
5,000-9,999	45%	35%
10,000-14,999	35%	27%
15,000-19,999	30%	25%
20,000-29,999	27%	15%
30,000-49,999	25%	15%
50,000-59,999	20%	10%
60,000+	19%	10%
Areawide	45%	35%

Source: FSUTMS-Cube Framework Phase II Model Calibration and Validation Standards, Table 2.11, "Root Mean Square Error (RMSE)", Page 2-21.

4.4.3.2 Project Level Model Accuracy Assessment

Project level model validation is typically focused on network details within the project Area of Influence (AOI). Many of the same validation checks for regional models still apply. Highway validation checks will require more stringent accuracy standards for volume-over-count ratios for various facilities and screenlines. Table 4-3 shows the link volume-over-count accuracy standards for validation by facility type within a project study area. This is based on the recommendations in the FSUTMS- Cube Framework Phase II Model Calibration and Validation Standards for corridor level validation. It is also recommended that the percent RMSE by volume groups be compared between the project/corridor and regional level to ensure the Project Level Model Validation does not disrupt regional model accuracy.

Table 4-3 Project Level Model Accuracy Assessment V/C Ratios

Volume-Over-Count Ratios	Standards
Volume-Over-Count Ratios	Acceptable	Preferable
Facility Type
Freeway Volume-over-Count (FT1x, FT8x, FT9x)	+/- 6%	+/- 5%
Divided Arterial Volume-over-Count (FT2x)	+/- 10%	+/- 7%
Undivided Arterial Volume-over-Count (FT3x)	+/- 10%	+/- 7%
Collector Volume-over-Count (FT4x)	+/- 15%	+/- 10%
One way/Frontage Road Volume-over-Count (FT6x)	+/- 20%	+/- 15%
Screenlines/Cut lines
External Model Cordon Lines	+/- 0%	-
Screenlines with greater than 70,000 AADT	+/-5%	-
Screenlines with 35,000 to 70,000 AADT	+/- 10%	-
Screenlines with less than 35,000 AADT	+/-15%	-