EcoDataLab Logo
EcoDataLab Logo

Consumption-Based Emissions Inventory Methodology

General Overview

The consumption-based emissions inventory (CBEI) is not a direct measurement of individual households' consumption or behavior. Instead, we use a model (a series of complex calculations) to estimate consumption of goods and services, and associated emissions. Our approach uses a combination of real-world consumption or emissions data where available, along with predictions based upon demographic, regional, and national averages.

This model is based upon an approach first developed by the CoolClimate Network at the University of California, Berkeley, and published extensively in multiple scientific journals. For a more technical description of this methodology, see Consumption-Based Greenhouse Gas Inventory of San Francisco from 1990 to 2015, Appendix 2: Detailed Methodology. For additional related research and references, see the full set of publications from our research partners at the CoolClimate Network at UC Berkeley.


Preparing a complete CBEI involves multiple sub-models, but each one follows the same general formula:

Model Preparation:

We select a nation-wide survey, conducted by the US federal government, that focuses on an important element of the inventory. Presently, our US sub-models are built using the Consumer Expenditures Survey (CE), the National Household Travel Survey (NHTS), and the Residential Energy Consumption Survey (RECS).

These surveys are used to build the full suite of models. CE provides data for all categories of consumption except for gasoline and home energy use. NHTS provides data for the vehicle miles traveled model (which translates to gasoline usage), and RECS provides data for the home energy use models (including electricity, natural gas, and other heating fuels).

Next, we look at the data on household characteristics available from the survey, and identify data for which we can get nationwide data from the US census and other data sources. These data include variables like household size, income, vehicle ownership, etc. We also include geography, climate, and other relevant data where applicable.

Using the nationwide survey and selected household and geographic characteristics, we calculate how strongly each of those demographic variables correlates with each category of consumption in the survey results. This involves an advanced statistical technique called multiple linear regression, and it produces an equation that can take in each of those household characteristics as variables and generate an estimate of consumption for US households with those characteristics.

A single linear regression might take this form:

y = mx + b


where y is the survey result (dependent variable), x is the household and geographic characteristics (independent variable), m is the predicted correlation between x and y (slope), and b is a fixed value that adjusts for any underlying base discrepancy between x and y when x is equal to 0 (intercept).

In multiple linear regression, the equation takes on a more complex form:

y = m1x1 + m2x2 + m3x3 + ... + b


where in this case, each x (x1, x2, x3, etc.) is a different household characteristic, with its own unique correlation (m1, m2, m3, etc.) that together add up to produce the overall result. The number of x variables depends on the survey and available data. Most EcoDataLab consumption models use at least six variables (…x6), with some using a dozen or more to get the most accurate prediction possible.

In addition, many of the characteristics we are including do not scale linearly. Instead, the models often look more like this:

ln(y) = m1x1 + m2*ln(x2) + m3x3 + ... + b


where the survey result might actually be scaled as a natural log variable, and some of the demographic data is also calculated using its natural log. This is generally done in cases where there are nonlinear effects from demographic values, and smaller values have different implications than larger values. For example, a household of 2 is typically two adults, whereas a household of 3 typically includes a child, which can significantly change consumption patterns. Similarly, consumption patterns based on income change significantly once basic needs are met and "luxury goods" start being consumed.

Generating results:

With these multiple linear regression models built (see above), we then collect over 200 points of local data, including data from the US Census Bureau, National Oceanic and Atmospheric Administration, and Energy Information Agency. Those values are transformed to fit the required inputs to the model, and then the model is run with that local data as the independent (x) variables in the model.

After calculating consumption using the models, we then calculate emissions. Most consumption emissions are calcuated using the US EPA's USEEIO Model, which bridges the gap between consumption (dollars) and emissions (tons of CO2e). This model includes data on emissions by sector and supply chain stage, allowing us to differentiating between emissions associated with production, transport, wholesale, and retail, for all US emissions.

For electricity emissions, we use EPA's eGrid emission factors, detailed at the zip code level and then scaled to any geography. For all other direct consumption of fuels (natural gas / methane, gasoline, etc.), we use the latest IPCC GWP values and best available academic literature to estimate life-cycle emissions. This includes fugitive and non-CO2 GHG emissions, as well as any radiative forcing effects from other emissions (such as particulate matter or contrails).

When working with local jurisdictions, we always replace these national or grid average emission factors with the best available local data. We contact state agencies to procure detailed vehicle registration data, which we combine with US DOE fuel economy data to get the most granular and accurate estimate for fuel economy of local residents' vehicles. We work with local jurisdictions to identify local utilities and their geographic coverage, and their local emission factors for electricity, water & wastewater, or methane leakage rates.

While the multiple linear regression model helps us estimate consumption and related emissions, the model doesn't perfectly resemble reality. We adjust for these discrepancies by comparing the model's predicted results with real-world data wherever available, and scaling the model outputs accordingly where real-world data isn't available.

To achieve this, we compare the model results with the actual results for the most granular level of data available. This can be national-level data (in the case of surveys), state-level data (in the case of transportation), or locality-level data (in the case of energy or water consumption). For cases where real-world data is available at the geographic scale of interest, we use the real-world data; otherwise, we run the model at the same geographic level at which data is available and use that to create a scaling factor, which we use to correct the locally modeled data. (For example, we compare modeled state-level energy use with real state-level energy data, and then use that scaling factor to adjust each census tract's modeled energy use). This scaling correction is usually on the order of 10%.

Model Input Variables

The consumption models use the following six variables: household size, average income, vehicle ownership, home ownership, share of household respondents with a bachelor’s degree or higher (educational attainment), and number of rooms (home size).


The vehicle miles traveled model uses household size, average income, vehicle ownership, home ownership, and educational attainment, along with commute time to work, drive alone to work, number of homes per square mile, number of employed people per square mile, employed people per household, family status, children per household, youth per household, adults per household, and Census region.


The home energy models use household size, average income, home ownership, and home size as well as detached home status, heating and cooling degree days, statewide average price of electricity, statewide average price of natural gas, and census division.


Technical Details


The Consumer Expenditures Survey (CE) is the only annual national survey of household consumption in the United States. Within the CE, there are a total of 95 categories and subcategories of expenditures for everything US households consume, including detailed breakdowns of food, utilities, home construction, transportation, household goods and services.


We start with the CE as the initial basis for our consumption models across all categories of expenditures. Because the smaller sub-categories have more uncertainty and error associated with them, we generally develop our models at either first- or second-tier category level across the CE dataset. After running the models at the local level, we normalize local consumption estimates to national data by using a scaling factor based upon the ratio of national modeled results to real-world national survey results, across each category of consumption.


We then map CE expenditure categories to Personal Consumption Expenditures (PCE) developed by the Bureau of Economic Analysis (BEA). Each PCE maps to one or more sectors of the US economy, and each sector has associated full supply chain emissions available through the US EPA’s USEEIO model. We use BEA’s PCE Bridge Tables for 2012 to assign emissions to cradle-to-gate, transportation to market, and trade. We then create custom emission factors (grams CO2e per dollar of CE expenditure) based on our detailed mapping of sectors, PCE and CE categories. This table converts average US household expenditures to total US emissions, broken down by each CE category and in total.


These custom emission factors are then increased to account for fixed capital investments (buildings and infrastructure). Emissions from fixed capital are attributed to each sector based upon that sector's economic weight. This results in a new, final emission factor (grams CO2e per dollar of CE expenditure) that accounts for all lifecycle emissions associated with that category of expenditure.


However, these lifecycle emission factors based upon USEEIO data are only available for the year 2012. To calculate emissions in other years, we adjust them backwards and forwards in time as needed using an average decarbonization rate (assumed 1%). Prior to calculating emissions, we also normalize all modeled and real-world household expenditures to 2012 US dollars using the category-specific Consumer Price Index (CPI) for each category.


While our models started with the CE, we can achieve greater accuracy in calculating emissions by using other household surveys for specific sub-categories: namely, by using the National Household Travel Survey (NHTS) to model household vehicle miles traveled (VMT), and by using the Residential Energy Consumption Survey (RECS) to model household energy usage. These models are the most robust models we could construct using recent and relevant data, and in many cases are a very strong fit. For instance, at the state level, our electricity and natural gas models have a goodness of fit R2 value of about 0.87 and 0.72, meaning they explain about 87% and 72% of the variation in household energy use, for their respective categories of energy. When comparing with specific city and county-level data, we typically find that these modeled results are within ~10% of the real-world data, providing sufficient accuracy for historical back-casting and local tract-level estimates of variation.


In preparing our inventories, we directly replace CE-modeled estimates of expenditures on gasoline, electricity, natural gas, and other fuels with results from these other sub-models. With these models, we apply direct and indirect (well-to-pump) emission factors for both fossil fuels and electricity consumed directly by households.


Gasoline emissions are based on US national average vehicle fuel economy data from the Department of Transportation in locations for which we have not yet collected local vehicle registration data (as of 2022, this is all locations outside the state of Washington). For locations with vehicle registration data, we match vehicle make, model, and year to FuelEconomy.Gov data on vehicle emission factors to prepare local estimates of fuel economy. (We use a population-based weighting to convert from zip-code level data to other geographies).


Electricity emission factors are based on US EPA eGrid region emission factors for each zip code in the country, and scaled to other geographies based on population.


Because of our combination of local characteristics to inform regression modeling and scaling based on real-world national data to capture general trends, this methodology allows us to consistently track changes in the quantity of household consumption over time, and to estimate the impact of consumption on emissions using best-available sources.


As reported in the San Francisco CBEI from 1990 to 2015 (Jones 2020), this consumption-based approach accounts for essentially all GHG emissions in the US economy but allocated to households and government. Figure 7 in that report shows that the CBEI correlates very closely to the traditional inventory (within 10%). One limitation of this approach is that we currently assume imports are produced with the same carbon intensity as domestic production; future work will likely include a proprietary multi-regional input output model (MRIO) (such as Eora or Exiobase3) to account for the carbon intensity of imports. MRIO models allow for more granular analysis of trade between geographic regions, including between US counties and with other countries.


Limitations


Unlike other CBEI approaches, this model approach allows for some ability to see the effect of policy and to track changes over time. The current approach offers this improved tracking by including more policy-relevant variables, including home size, household size, home ownership, education, income, population density, and vehicle ownership.

However, local changes in policy, behavior, infrastructure, and technology which might affect consumption or emissions in ways beyond the model variables are not included in the current approach. If a local policy changed consumption patterns or the carbon intensity of products or services consumed, we would not be able to monitor this with the current methodology. Additional data could supplement the approach in future studies.

The current approach does not include an estimate of total error. Ideally, each estimate of consumption and emissions would include uncertainty bounds and analysis of error. Potential sources of error include reporting error in household survey day, sampling error, model error, categorization error, and other errors typically associated with input-output models (in this case, the USEEIO). Most of these errors are known and could be propagated through formulas in the study in future research.

We also assume the carbon intensity of imported goods to be the same as domestically-produced goods. The current model is unable to track the countries of origin of emissions associated with local consumption. This assumption may affect individual products, such as computers, but is unlikely to have a large impact overall since the United States has a large, fairly carbon-intensive production system, with considerable electricity production from coal, similar to many exporting countries. Future studies could incorporate a multi-regional input output model to provide better data on the effect of international supply chains on consumption-based emissions.

Lastly, we also assume that price corresponds with “value added” economic activity. If residents of an area purchase higher priced goods, then the methodology will linearly scale emissions up with prices. This scaling is appropriate if higher prices are the result of additional economic activity, such as importing products from abroad, but is problematic when prices are artificially raised, such as for branding purposes. Conversely, cheaper products will result in lower emissions in the model. Generally, we assume that price differences average out over thousands of households.


CBEI and CE Categories and Sub-Categories


The table below shows the relationships between the listed CBEI categories, sub-categories, and corresponding Consumer Expenditure Survey categories from the Bureau of Labor Statistics:

A table describing the relationships between different CBEI categories and CE categories
CategorySub-CategoryCorresponding CE Sub-Category(s)
Transportation
Gasoline*Gasoline and Motor Oil
Vehicle PurchasesVehicle Purchases (net outlay)
Other Vehicle ExpensesOther vehicle expenses
Air Travel**Public and other transportation
Housing
ShelterShelter, less Other Lodging
Natural gas*Natural gas
Electricity*Electricity
Other LodgingOther Lodigng
Other Heating Fuels*Fuel oil and other fuels
Food
Eating OutFood away from home
Meats, Poultry, Fish, & EggsMeats, poultry, fish, and eggs
Other FoodOther food at home
DairyDairy products
Alcoholic BeveragesAlcoholic Beverages
Fruits & VegetablesFruits and vegetables
Cereals & Bakery ProductsCereals and bakery products
Goods
Furnishings & AppliancesHousehold furnishings and equipment
ApparelApparel and services
Housekeeping SuppliesHousekeeping supplies
Personal Care ProductsPersonal care products and services
Entertainment GoodsAudio and visual equipment and services
Misc GoodsReading, Tobacco and smoking supplies
Services
HealthcareHealthcare
Entertainment ServicesFees and admissions
EducationEducation
Misc ServicesMiscellaneous, Personal Care Services, Household Operations, Cash contributions
Insurance & PensionsPersonal insurance and pensions

*Not calculated using a CE-derived model. Either uses real-world data, or modeled estimates based on NHTS or RECS data.

**Air travel expenditures make up the bulk of this CE Public and other transportation sector. Government-operated public transportation emissions are attributed separately (see section below), and so only air travel remains. Air travel is modeled exclusively based upon household income, as no other household characteristics were statistically significant.


Other Consumption-Based Emissions


Government

In the consumption-based inventory, government agencies are considered final demand the same way households are, and so government emissions are not attributed directly to households. These emissions are not insignificant – based on GDP data and the same USEEIO emission factors discussed above, federal, state, and local governments across the US had emissions totaling over 660 million MTCO2e. Of this total, roughly 69% came from state & local governments, with the remaining 31% from the federal government split between defense (24%) and non-defense sectors (7%). Like households, government emissions include transportation, buildings, food, and procurement of goods & services.

Government consumption, and associated emissions, are not linked to particular household characteristics or activities in readily traceable ways. While some government activities can be linked to certain households – such as direct cash transfers for unemployment insurance or social security; and healthcare coverage through Medicare, Medicaid, or veteran’s benefits – other government activities, like infrastructure construction and maintenance, national defense and public safety (police & fire), R&D spending, and parks maintenance cannot be attributed directly to households based on any discernible characteristics.

As a result, these emissions can only be effectively allocated to households on a flat average basis. If these emissions were allocated to households, it would be an average of 5.5 MTCO2e per household. These “hidden” emissions are not otherwise captured in the consumption-based emissions inventory, but still contribute to overall emissions nationally and globally.