Concentration Analysis between EPA’s Continuous and Gravimetric PM2.5 Monitors
Project Overview
In the U.S. EPA National PM2.5 monitoring program, the use of Federal Equivalent Method (FEM) continuous monitors continues to increase in comparison to the Federal Reference Method (FRM) gravimetric monitors. There exists a known bias in 24-hour ambient PM2.5 concentrations between these two methods. Monitoring agencies need to understand these biases, their causes, and determine if PM2.5 continuous monitoring is appropriate for their network. Using historical concentration data between 2016–2020, this project aims to investigate these biases, determine the contributing factors, and attempt to predict FRM concentrations based on collocated FEM concentrations and these other factors.
Monitoring and concentration data were extracted from the EPA’s Air Quality System (AQS) through an API (https://aqs.epa.gov/aqsweb/documents/data_api.html). The two critical datasets downloaded were:
- Daily Summary data: Returns data summarized at the daily level. All daily summaries are calculated on a midnight to midnight basis in local time. Variables returned include the date, mean value, maximum value, etc. Data are at the monitor level and may include more than one entry per monitor. There may be multiple entries for different (1) sample durations and (2) pollutant standards.
- Monitor data: Returns operational information about the samplers (monitors) used to collect the data. Includes identifying information, operational dates, operating organizations, etc.
To distinguish between the various PM2.5 monitor types (makes/models), EPA assigns each a unique ‘method code’ (a three-digit value), which is included in the monitor dataset described above. EPA has designated various monitors as both FRM and FEM (40 CFR Part 53). To determine which method codes are designated as FRM/FEM, the EPA method code description dataset can be downloaded from EPA’s website at: https://aqs.epa.gov/aqsweb/documents/codetables/methods_all.html.
The last dataset wrangled included meteorological conditions for each site/date to assess their influence on the concentration bias. Given a monitoring site’s latitude, longitude, elevation, and the sampling date, the local weather conditions (temperature, pressure, wind speed/direction, rain, etc.) were imported through the Meteostat Python library (https://dev.meteostat.net/python/).
Problem Statement
Extracting national U.S. concentration and monitoring site data between 2016–2020 from the EPA’s AQS API (https://aqs.epa.gov/aqsweb/documents/data_api.html), we found a total of 116,658 FRM and FEM monitors with concentration data available from the same monitoring site and date, herein referred to as a collocation. To determine bias, we calculate a percent difference (PD) between each of the collocations as:

By averaging the PDs over various time periods, geographic regions, etc., a bias estimate can be produced over said aggregation.
The objective of this project is to investigate what contributing factors influence the concentration bias and by how much. Given that FEM concentration results are available in near real-time and the FRM results typically take multiple weeks to be determined, a linear regression model is developed through machine learning to predict FRM concentrations given the FEM concentration and the other contributing factors. Therefore, this model will give us an opportunity to estimate the FRM concentrations prior to the results being made available.
Metrics
An R2 regression score will be used to measure the performance of the model, which is an appropriate score for a linear model. To set a baseline for how well our model predicts, we first simply assume that the FRM concentration equals the FEM concentration and derive an R2 regression score. A model with an R2 higher than the baseline indicates an improvement in prediction capability in which the contributing factors are useful in predicting FRM concentrations.
Data Exploration and Visualization
The overall bias estimate from the entire collocated dataset is +12.3%, indicating that the measured FEM concentrations are on average ~12% higher than their collocated FRM counterparts. The boxplot below illustrates the PD statistics aggregated per year, which shows the consistency in FEM’s concentrations being larger than FRM’s.

We investigated various potential contributing factors to the concentration bias, including:
- Concentration level, Monitor Make/Model, Monitor Probe Height Difference, Geographical location, Time of Year, Meteorological Conditions, among others.
To consider concentration level impacts on the bias, we present below the PD statistics for all sample pairs within non-overlapping FRM concentration categories (bins). Each bin represents a quintile of the total distribution of observed FRM sample concentrations. In other words, of the 116,658 sample pairs, in 20% of these pairings, the FRM concentration was ≤ 3.8 µg/m3, another 20% were greater than 3.8 µg/m3 and less than or equal to 5.6 µg/m3, and so on. The boxplot shows a clear relationship between bias and concentration level. At higher concentration levels, there is much better agreement between the FRM and FEM monitors. At lower concentrations, the positive bias increases. The variability also increases as ambient concentrations decrease, as indicated by the larger interquartile range and spread between whiskers.

Our dataset consists of 40 unique method code combinations between FRM and FEM monitors. The bar plot below shows the bias estimate ± 95 confidence intervals for only those method code combinations with at least 500 pairings. Results indicate that bias varies greatly by monitor type. While the overall bias is positive (FEM > FRM) that isn’t the case for every make/model combination. Of the 40 unique combinations, 10 exhibited a negative bias and the bias estimates ranged from -47% to +257%.

We next assess if a temporal trend exists in the bias estimates. The time series below shows the bias estimate per month, which clearly indicates a monthly trend in the concentration differences. Bias is highest in the Spring, peaking in April. The bias then decreases into the summer months, with the FRM and FEM concentrations being most similar in July and August. This can generally be explained by the differences in concentration levels from season to season. PM2.5 concentrations are higher in the summer than the winter. Recall from a few graphs above that bias decreases as concentrations increase. Therefore, in the summer, when concentrations are high, we observe better agreement between the FRM and FEM monitors.

Of the meteorological data examined, both temperature and pressure showed a strong correlation to bias. The 2D heat map below shows that the influence of temperature on bias is greater than that from pressure, but both parameters have an impact. Bias clearly decreases w/ increasing temperature no matter the pressure. It appears that the bias increases more drastically w/ increasing temperature when the pressure value is smaller as compared to when it is larger. In other words, the largest (smallest) bias exists when the pressure is low and the temperature is low (high).

Data Preprocessing
A total of 1,464,416 daily concentration data records were downloaded from the API. Monitoring site and method code data were merged with the concentration data. Individual concentration data that were nullified through EPA’s quality control method were then dropped from the dataset.
In pivoting the dataset next, which contained one row of data per individual concentration point, monitoring concentration data from the same site and on the same date were paired and on the same row. Multiple collocations could exist at a given site/date, depending on how many monitors were active at the site. All unique collocations were then determined, and separate data columns were created and filtered to only include collocations between individual FRM and FEM concentrations for a given pairing. There were a total of 116,712 FRM vs FEM collocations included in the dataset between 2016–2020. Meteorological data were then extracted for each collocation site and merged with the concentration dataset.
To prepare for the model evaluation, any row with a NaN value was dropped from the dataset. This reduced the dataset to 62,827 rows. For categorical columns (e.g., method code) used in the regression model, dummy variables of zeros and ones were created for each unique category. The first of the dummy variables were dropped from the dataset, as it can be implied from the others created.
Implementation and Refinement
We next implement a machine learning model to predict the FRM concentrations provided the FEM concentrations and other factors. Using Python’s Scikit-learn package, we employ a linear regression model on the dataset. We utilize various predictor variables (those described above), split the dataset into separate training (70%) and testing (30%) datasets, and fit the model to the training data. The final parameter settings for the model were as follows:
- fit_intercept = True (Y-intercept was calculated),
- normalize = True (the regressors X was normalized before regression by subtracting the mean and dividing by the l2-norm),
- copy_X = True (X was copied rather than overwritten),
- n_jobs = None (only required for larger problems), and
- positive = False (does not force the coefficients to be positive).
The only complications encountered with the model implementation occurred when rarely used/unique parameter values were included. In some instances, this led to predictors being included in the training dataset but not in the testing dataset or vice versa. Parameters to include were carefully selected to ensure only those useful to improving the model and readily available were included.
To refine the model, various predictor parameters were included/excluded from the dataset in various model runs. Model R2 scores varied based on predictors used, ranging between ~0.81–0.89.
Model Evaluation and Validation
To set a baseline for how well our model predicts, we first simply assume that the FRM concentration equals the FEM concentration. Under this assumption, the R2 regression score = 0.816. Just by knowing the FEM concentration, we can predict relatively well the FRM concentration. We next consider how much our model improves based on the inclusion of various predictor variables. The top-performing model included the following predictors:
- FEM concentration, probe height difference, month, method code combination (split into various dummy variables), average daily temperature, wind speed, pressure, and daily total precipitation.
Using these predictors, our model improved with an updated R2 score = 0.885. The scatterplot below illustrates the model’s predicted FRM concentration vs the actual FRM concentration from the test dataset. It shows that at higher concentrations the model performs quite well, with the scatter along the line of equality. At lower concentrations, the predictions wander. This can be expected, however, given the higher variability in PDs at lower concentrations as previously noted.

To demonstrate that the optimized model is robust, we performed a time-series k-fold cross validation utilizing Sci-kit Learn’s TimeSeriesSplit. At five splits, the R2 regression score proved to be stable (fluctuating between 0.863 to 0.884) and always above the baseline score. Given the stability and lack of fluctuations, we can argue that the model is robust against small perturbations in the training dataset.
Justification
A known bias exists between FRM and FEM PM2.5 monitor concentrations. Herein, we set out to:
- quantify this bias,
- determine what, if any, parameters influence this bias, and
- if we could successfully predict FRM concentrations from the FEM concentrations and these other parameters.
We were successful in showing that there are various parameters influencing concentration bias. We also created a model that can, with relatively good accuracy (R2=0.885), predict the FRM concentration from the FEM concentration and other factors.
Reflection
In this project, PM2.5 ambient air monitoring data were scraped from an EPA API along with supplemental meteorological data. The dataset was cleaned and ‘collocated’ FRM and FEM monitors were identified and paired for further evaluation. After pairing the monitor types, the percent differences in concentrations were calculated and explored, specifically aimed to identify other parameters that influence the concentration bias. Various parameters proved to be important, including monitor method code, time of year, ambient temperature, and barometric pressure (among others). A linear regression machine learning model was then employed to predict FRM concentrations provided the FEM concentration and the influential parameters. Model parameters were refined to produce the best R2 in predicted vs actual FRM concentrations. The tuned model was able to produce an R2 = 0.885, a decent improvement from the baseline (R2 = 0.816).
Improvement
Future iterations of this project could improve upon this analysis by:
- Evaluating a larger dataset beyond the timeframe investigated here.
- Investigating further what we define as a collocated monitor. This assessment considered any monitors as being collocated if they ran at the same site and date. However, two monitors may be located at the same site but not be close to one another. In these situations, we would not expect concentrations to be equivalent. Results could be negatively influenced if a substantial amount of these cases existed in the dataset (it is unlikely that there were many cases as described).
- Considering different methods for dealing with NaNs in the model implementation. Recall, in this assessment if any parameter included a NaN value the entire row was dropped from the dataset. This cut the dataset used to train and test the model in half. Other methods of handling NaNs include imputing missing values or building models that can work with NaNs.
- Consider other models. Some of these data hinted that there was not a linear correlation between the bias and other parameters. Investigating other, nonlinear models, could improve the predicting capability.