A new health affairs paper "proves" mask mandates slow covid-19
But does it live up to the hype?
A new analysis is out now, and it claims that mask mandates slowed COVID19 spread between March -Nov of 2020; The pre-vaccine era! Right off the bat, Let me say it is a more sophisticated analysis than many others. But, in the years to come, I expect still more complex analyses will emerge— and reaching both conclusions! Some will “prove” masking and/or mask mandates saved lives, as well as those proving that they didn’t help.
I hope to persuade you that although well done, this analysis fails to show that the policy of masking mandates slows sars-cov-2 spread. It also fails to answer a related, but different question: whether asking willing people to wear a mask slows sars-cov-2 spread. This essay has 3 parts: Some introductory basics; A theoretical framework for why observational studies will face difficulty answering the question; and finally some specific issues with this paper and the individual vs. policy question:
The Basics:
Here are the basics of this paper.
Mask mandates had to happen btw April - Aug 2020 (pre vax; height of panic)
Follow up & baseline measurements occurred between March to October
Counties were matched to one another in same geographic region
Counties included “ had to have at least three consecutive days of daily case count exceeding 5 as of August 31, 2020, and meet at least one of the following criteria: contain at city with a population exceeding 100,000 people, contain a state capital, be the most populated county in the state, or have an average daily case incidence that exceeded twenty during July 1–August 31, 2020”
A county could be a control county (no mask mandate), but actually implement a MASK mandate, as long as it did so 3 weeks after the counties it was matched with. (will return to this)
Finally, most importantly, there is something different about places that instituted mask mandates vs. those that did not implement mask mandates. Proof of that is that some places implemented the mandate, while others in the “exact same predicament” did not. And that is not a random choice. (We will also return to this point.)
Conceptual framework
Before I get into the paper, let me explain why this is such a hard nut to crack.
The analysis is trying to simulate a randomized trial where we enroll places where a politician is thinking about dropping a mask mandate, and we randomize them to mandate or not, and then measure spread. That is the core idea. Of course, we didn’t do that, so how can we re-create it?
Comparability: The first thing you have to do is convince the reader that you are comparing places that instituted a mandate to those who did not, which are otherwise similar. The control group is supposed to be the counterfactual— what would have happened to mask mandate county had they not placed the mask mandate. A randomized trial would have randomly assigned the intervention which is perfect; how to do it here?
In order to prove comparability, the authors cleverly use cases and R (reproductive coefficient). Specifically they match mask mandate counties with those that did not mask mandate for at least 3 more weeks by these variables, “population density, total population, presidential election voting patterns in 2016, case incidence and instantaneous reproduction number (Rt) in the two weeks before time zero”
Because they do this, they can get a pretty figure like this, showing pre-mandate, the counties have equal Covid-19 spread. The lines superimpose because they literally matched based on this.
It is clever, but it confuses 2 things. What the want to do is match counties based on the actual rate of spread (the pandemic trajectory), what they are doing is matching based on the rate of reported cases. That’s not the same.
The best way to measure pandemic trajectory would be random, serial sero-prevalance assays in the counties; or random, repeated testing. They are using the case numbers submitted to CDC. It’s a HUGE problem.
Their method assumes that among all people getting sick in these 2 counties, they are sending tests in equal rates and at the same time. The average Joe in these counties who had a runny nose or cough was equally likely to test for Covid. But that is almost surely not true. Places that implement mask mandates are likely testing on the margin more than places that didn’t (well at least for 3 more weeks). Mask mandates are a general marker of caring more about COVID19, and people who care more; test more, and test more on the margin.
Moreover, people in no-mask counties might also test later, on the margin. The average Joe in I-don’t-care-for-mask-ville might test a few days later than Mr.Mask Smith, only if and when symptoms worsen,. and this again means you are not matching actual trajectories. (time delay). Likely both of these things are happening!
All this means is they think they are matching trajectories, but the actual pandemic is almost surely doing different things. If I were to bet, it is more steep and brisk in no-mask mandate places. Then the mask mandate is implemented, and of course, covid-19 spread, which is non linear, may grow substantively over time, but that is going to happen disproportionately more in no-mandate counties as they started time zero with much brisker epidemic spread.
This is a damning limitation that thwarts the entire paper in my opinion. It cannot be saved by back calculations. Trust me, I wasted an afternoon in excel trying. Finally, this likely explains why the effect size seen here is too good to be true. Free surgical masks and strong advocacy had an 11% relative risk reduction in Bangladesh and cloth failed entirely. How can a mask mandate— mostly cloth masks let’s be honest— achieve a 20-35% reduction in cases? I suspect mismatching is the root reason why the effect is too good to be true.
The paper also has the biggest non-randomized masking research problem
These authors also face the biggest masking research problem that will plague the literature for decades to come (ps, only the non-randomized literature— RCTs were the solution).
It’s this
Mask mandates might lead people to 2 things (left). One, filter the air coming out of their blowholes. I hear that is how cloth works. Two, it might change their behavior. Get them to stand further apart, spend less time in stores, etc. Mask mandates SHOULD get credit for both, in so far as it leads to both. Alternatively, if it makes you more cavalier, you own that too. The policy is enacting the mandate, and if it works via behavior change or via filtration, you get the points.
But mask mandates also could be occurring in the scenario on the right. People are scared. They go on TV and panic. Politicians shout: We are mandating masks! Things are crazy here! People soil themselves. They stay home more, stand far apart, etc etc. And they wear a mask. But it wasn’t the masking that changed their behavior. It was the lunatic on TV who lead to both. Fear drove both. Mask mandates SHOULD NOT get credit for this behavior change, as even without it, people would have changed behavior. They scared!
Any mask mandate study needs to separate these scenarios, and it is impossible to do in non-randomized data in my opinion. This study also fails to disambiguate the scenarios.
Yet, I believe the authors understand the scenario on the right might be in play, and they do try, specifically they write:
“Social distancing was quantified using daily cellphone movement provided by Unacast (which collects and aggregates human mobility data), measuring the percentage change in visits to nonessential businesses within each county compared with visits in a four-week prepandemic baseline period between February 10 and March 8, 2020.”
They they put it in the model to adjust for it, and it is a significant covariate:
The data comes from this website, that offers 3 ways to get social distancing. Shown in a very dull video.
But this creates a few problems.
In so far as distancing is a downstream effect of masking, you don’t want to adjust for it. You are adjusting for a mediator. Not good. They will say this only biases towards the null, but it is a longer conversation.
Of 3 distancing metrics, why did they pick change in visits to non-essential places? One of a thousand opportunities for multiple analytic plans? Why not the other metrics (change in avg mobility for e.g.)?
The metric has social distancing in it, but does not capture the “holy crap, I am scared” distancing that comes from fear. It doesn’t capture how far apart we stand in a store. How fast I run, if I hear a throat clear in home-depot. It doesn’t capture whether I get scared in a hot packed room with stagnant air, and split, vs. how I am a bit more relaxed in an open foyer with doors open and a breeze, and linger. How many people gather at my mothers house? How far do we sit? Is the window open? In other words, if fear drives both mask mandates and the many choices around social distancing, this metric is insensitive to that. It is only a crude measure (ps is it even accurate) of visiting non-essential businesses, and not how far I stand, how long I stay etc.
In fact, the more you think about it, social distancing is not a single number, but many, many, many numbers, and the authors are not capturing any of that.
This is in short, the second major flaw of the paper. It does not offer, nor can it provide a conceptual framework to separate behavioral choices that happen AS A RESULT of mask mandates—They remind me to keep my distance— from those that happen ALONGSIDE mask mandates— Someone scare the shit of out me. This is key because if the cause is being scared, then mask or no mask, that was enough to do the trick. But we want to know if the mask mandate was necessary. Finally, the variable they use to adjust for distancing is a very crude & insensitive measure of behavior and merely gives the illusion they are considering this.
Now allow me to shift to some very specific comments (these are also damning)
No mandate for 3 weeks
The control arm includes counties that did have mask mandates, but didn’t start it for 3 more weeks. This introduces a new issue. The type of place that didn’t mandate for say 15 cases per 100k, but does mandate later includes places that needed to see more carnage to drop the mandate. As long as some or many control counties ultimately drop mandates, this almost surely guarantees that control counties would see greater rises in cases immediately after mandate, because that it was it takes for them to be persuaded to mandate. Of course, the authors do not say you have to eventually have a mandate, which would make it a total tautological exercise, but the mere fact many do, is enough I think to bias the results. [PS I am open to further thoughts on this point. Am I wrong? Talk to me in comments!]
Diabetes?
Why is the percent of people in the county with diabetes a covariate?
“Covariates County-level covariates with the potential to confound the analysis results were considered for each model, including social distancing, population density, wet-bulb temperature, proportion of county residents with diabetes, and proportion of county residents earning less than 200 percent of the federal poverty level.”
First, this also assumes equal screening for diabetes, which is almost surely not true. Second, diabetes is a risk factor for bad-covid outcomes, but does covid spread fast among diabetics? And if so, does it also spread faster in counties with higher avg. BMI? Then why is BMI not a covariate?
Masks don’t work in the burbs
The authors downplay that masks only work in urban counties. No effect in the suburbs! Reminds me of how masks didn’t work when used by daycare providers, but no one discussed it.
A strange analytic choice for a super-spreader disease
It is well know that COVID spread can be explosive. Single superspreader events can lead to dozens or more cases. As long as that is true, why do this? You might be throwing away real data
“Data were assessed for outliers relative to the Rt before analysis. Days with an Rt value outside of the 2.5–97.5 percentiles (Rt of 0.35 at the 2.5 percentile and 3.50 at the 97.5 percentile) were excluded from the analysis.”
Some of those outliers are real data. Why discard it?
The study has nothing to do with present day
We can’t forget none of this has anything to do with the present day. This is all pre-vax, pre-Omicron. We have no idea if it applies post-vax. Also no idea if it applies to current, highly transmissible strain. It is of historical interest only.
The last thing I want to say is
Whether masking works is a prerequisite for a mask mandate to work but not the same thing. This study tests whether mask mandates work. That is a reasonable policy question. Bangladesh RCT tested whether free masks and encouragement worked. That is different. In this study almost all masks used were likely cloth. And cloth masking failed in Bangladesh. I think that makes it highly unlikely the large effect seen here is due to the masking. It is almost surely the first objection I raise.
I could write more, I found some other issues, but I don’t have any more time for this paper. I see patients, work on a bunch of academic articles, am writing my new book, and make podcasts and videos, and I am presently sleepy. So that is all for now!
I think by now, everyone knows that masks are of limited utility - if any
That won't stop some people swearing by them, and of some places mandating them no matter what
The Mask has become an identity - ranging from "I DON'T WANNA DIE" person to the "I care SO much"
This is no longer a matter of science of efficacy - seriously if the mask worked it would have worked by now. Its a psycho-political-cultural matter now. Time to move on.
Thank you for this interesting analysis, and for highlighting the potentially confounding affect a modulation of fear that is independent of that induced by wearing a mask.
When reading the Bangladesh trial, I did wonder why this wasn’t a potential confounder here, too, but you didn’t seem to comment on it (or maybe you did, but I missed it - apologies if this was the case!). The arm that received the mask in the Bangladesh trial also received “… information about the importance of mask-wearing”. I would have thought that, in a well designed RCT, the intervention should be presented identically - and neutrally - to both arms: i.e. “We are doing this trial as we do not currently have high quality information as to whether or not masks are effective”. Presenting information about “the importance of wearing mask-wearing” seems to pre-judge the outcome, and presenting this only to the mask-wearing group might be expected to increase the level of fear in this group. As such, it is not clear to me that the significant increase in physical distancing shown by the study can necessarily be entirely attributed fear / apprehension that is secondary to mask wearing (and so be legitimately included as an effect of mask-wearing effect) rather than secondary to fear / apprehension due to the instruction that mask wearing is important: an instruction that was given only to the mask-wearing participants (i.e. if you don’t wear a mask you are at risk). The true effect size of masking is still then somewhat uncertain, but could presumably be less than that found in the study.
Of course, compliance with the intervention in the Bangladesh study is “important”, and this should rightly be communicated to participants. However, better instruction would be “It is important you wear your mask as we have instructed, so that the results of the trial [either positive OR negative] are valid”, rather than “mask wearing is important [i.e. mask wearing is effective]”. Strictly, even this modified instruction creates a potential asymmetry - much better would be to have the non-intervention arm also actively instructed to NOT wear masks, so that identical instructions (and, presumably, identical levels of non-mask induced fear) are given to both groups: “It is important you wear / do not wear your mask as we have instructed, so that the results of the trial are valid”.