Walking It Back

My last post was surprisingly popular—and not just among people who know me personally. I even managed to pick up a few new followers, who I’m afraid will be put off a bit when they discover travel writing isn’t aligned the usual subjects of this blog (but hopefully not!).

Anyway, as you may or may not recall, the last post incorporated a graph of the distance I’d walked the days before, during, and after various legs of my trip through Italy:

miles walked

In the graph’s caption, I glibly blamed my apparent sedentarism on my office job and commute. I like to think of myself as a decently fit person, you see. Surely, I reasoned, my desk job must be impeding an otherwise active lifestyle. I mean, I have a standing desk—clearly I’m a man who values his physical fitness.

It occurred to my a few days later that my hypothesis was actually pretty testable: if work and commuting were really to blame, my weekends should be significantly more active (measured by distance walked/run) than average. Apple has, for some reason, elected to make exporting health data from iPhones an incredibly difficult process. So, with the zeal of an intern, I manually entered 242 days worth of mileage, attempting to evidence my claim.

Looking back, my naiveté was almost cute. In the era of “binge-watching,” I really believed myself exceptional.

The raw data is pretty depressing. The mean distance walked is 1.54 miles. But the data is right-skewed, meaning outliers on the upper end of the distribution are pulling the mean higher. (The median distance walked over this period is a shockingly low .985 miles.) It’s also telling that the distribution isn’t bimodal, which would indicate two distinct populations—in the case of my hypothesis, weekdays and weekends.

Miles Walked histboxmiles

I could have quit here, but I’ve touched on the importance of publishing negative results before and therefore had a cross to bear. To make the data set more normal, I removed outliers (in this case, all values greater than 3.73 miles) and used a square-root transformation:

Square Root Miles Walked, no outliers

The means of our new, outlier-free population and the “weekend” sample (n=61) are, respectively, 1.023² miles and 1.046² miles, and the population standard deviation is .377² miles. At the 95% confidence level, the sample would have to have a mean of about 1.106² miles to be statistically higher than the average.

It is with great shame that I reject the alternate hypothesis. And I do hereby humbly apologize to office life for blaming it for what is clearly a personal shortcoming.

A few caveats, in case my health insurance provider is reading:

  • I do exercise most days before work. But mostly pull-ups, lunges, and other anaerobic stuff. I only run sporadically—and when I do, I don’t always bring my phone with me.
  • I can’t vouch for the accuracy of the iPhone’s pedometer. Anecdotally, I’ve heard it isn’t great, and light research confirms it has trouble measuring steps under some common conditions, like being held or kept in a backpack.
  • The combination of the above suggests iPhone health data offers a convenient but incomplete metric to assess one’s activity. For example, July 31, a day my phone credits me with walking 4.7 miles, also happens to be a day I went for a 30-mile bike ride.
  • Including Fridays in the “weekend” sample raises the mean distance slightly, to 1.08 miles, but still not enough to achieve statistical significance.
  • Uh, I will try to do better.

Science Has a Reproducibility Crisis

If your Facebook feed is anything like mine, you may have recently heard about how Bill Nye–the Science Guy himself–“slammed” Tucker Carlson on the latter’s evening show on Fox. THIS. (If you live somewhere else you may have been treated to an equally smug reaction from people claiming that Carlson “won.”)

However you feel about it, the timing, coupled with Nye’s reliance on scientific consensus as a proxy for objective correctness, is somewhat serendipitous. Mounting evidence that the results of scientific studies are often not replicable has caused Nature, a prolific scientific journal, to very publicly tighten its standards for submissions as of its latest issue.

In May of 2016, a survey by Nature revealed that over two thirds of researchers surveyed had tried and failed to reproduce the results of another scientist’s study. Over half of them had been unable to reproduce their own results. Fifty two percent of researchers polled said there was a “significant crisis” of reproducibility.

This is a big deal. The ability to replicate the results of studies is crucial to both scientific integrity and progress. Clinical researchers, for example, depend on reliable results from prior trials to form the building blocks of new drug advancements. In the field of cancer biology, merely 10% of results from published literature were found to be reproducible. Meanwhile, the credibility of scientific literature is understandably compromised by dubious, often sensational findings.

The root of the problem, according to Dame Ottoline Leyser, director of the Sainsbury Laboratory at the University of Cambridge, stems from today’s scientific culture. As quoted in BBC, she cites “a culture that promotes impact over substance, flashy findings over the dull, confirmatory work that most of science is about.”

Others blame a pressure to publish. There has also been, in recent years, doubt cast on the integrity of the peer review process, especially with regard to climate science.

Whatever the culprit, plans to combat issues of reproducibility are emerging. Nature has developed a checklist to serve as guidelines for authors submitting writing to the publication. Efforts shouldn’t end there, the journal argues. Reform at all levels of the scientific process could go a long way:

Renewed attention to reporting and transparency is a small step. Much bigger underlying issues contribute to the problem, and are beyond the reach of journals alone. Too few biologists receive adequate training in statistics and other quantitative aspects of their subject. Mentoring of young scientists on matters of rigour and transparency is inconsistent at best. In academia, the ever increasing pressures to publish and chase funds provide little incentive to pursue studies and publish results that contradict or confirm previous papers. Those who document the validity or irreproducibility of a published piece of work seldom get a welcome from journals and funders, even as money and effort are wasted on false assumptions.

Tackling these issues is a long-term endeavour that will require the commitment of funders, institutions, researchers and publishers. It is encouraging that NIH institutes have led community discussions on this topic and are considering their own recommendations. We urge others to take note of these and of our initiatives, and do whatever they can to improve research reproducibility.

Census Data Are Weird

For those of you with better things to do than scroll through Paul Krugman’s twitter feed, I have news: last Tuesday the Census Bureau released its annual report on Income and Poverty, and people are stoked.

Here’s the upshot: Median household income increased 5.4% from last year after nine years of general decline. It’s now only 1.6% lower than it was in 2007, the year before the recession, and 2.4% lower than its historic peak in 1999.

While Asian households didn’t see a significant increase, black, white, and Hispanic households did. Median household incomes increased in all regions of the country and, for the first time since the recession, real income gains are distributed beyond the top earners.

Sounds like great news! It might well be, but before you celebrate there are some things to note about these statistics. The following isn’t a refutation of the conclusion that the economy is improving. Rather, it’s an indictment of the statistics that lead us to such conclusions. Here are three things to consider:

  • Household income data aren’t all they’re cracked up to be

All statistics have limits, but median household income is particularly misleading in the wrong hands. For years now, economists and politicians have cited median household income data to paint grim pictures of the American economic landscape. While the story is nicer this year, the logic behind the choice to measure households, rather than individuals, is still suspect.

A positive or negative change in median household income doesn’t imply a similar change in individuals. That’s because the characteristics of households vary across time and population.

Average household size has decreased from 3.6 to 2.5 people since 1940. Demographic shifts can also affect household incomes, because average household sizes differ between races.

Another limitation of household income data is that individuals aren’t equally distributed among households of different income levels. There are far more individuals–let alone workers–in the top quintile of income-earning households than the bottom. People who have vested interests in portraying an economically lopsided America tend to cite household data for this reason, without noting this.

Households expand and contract as more people are able to afford their own places. This can strangely cause median household income to rise while people are making less money. For example, if I were demoted and had to move in with my mom because I was now making half as much money, the median household income would increase as our two households merged, despite less aggregate income for the individuals involved.

The same works in reverse. When I started making enough money, I moved out of my mom’s house. Even though our combined income was greater, median household income fell.

Speaking of which…

  • Millennials are living at home longer and in greater numbers than previous generations

Fully 32% of 18-34 year-old Americans live with their parents, making it the most common living arrangement for that group. There are a couple of reasons for this: higher unemployment among young adults; an accompanying delay in or aversion to marriage; and a changing ethnic makeup of America, among others.

While Millennials are more likely to live with mom and dad, we’ve also become the largest generation in the workforce. A larger part of the workforce consolidating in fewer households could explain part of the rise in household income.

This probably isn’t too big of a factor, but since we’re measuring households it’s worth mentioning that about a third of people ages 18-34 are living with mom and dad.

  • “Low-income households” and poor people aren’t necessarily the same

This is a big one. Part of the elation about the Census data comes from the fact that lower-earning households have seen more of a bump in income than they have in recent years.

The problem is income isn’t the same as wealth. It’s closer to a derivative of wealth, like a stillframe is to a film. It’s a simplistic method of gauging standard of living, hobbled by the fact that it doesn’t consider government transfers of money, assets, or liabilities. Economists would probably argue that consumption data are more informative indicators of standard of living.

A wealthy elderly couple and a part-time minimum-wage earner might both be in the lowest income quintile in a given year. That doesn’t mean their standards of living are similar.

Rising incomes of the lowest earners might indicate lots of things: for example, that people are being forced back into the labor market after retiring. As I’ve noted here before, most poor households have no income earners, according to data from the Federal Reserve Board of San Francisco. Unless the rate of employment among the poor grew at the same time, there could be reason to believe that the increase in low-earning households is due to something other than increased income of “the poor.”

Another common assumption is that the households’ positions within income brackets are stagnant, as if we lived in a world without job churn. The households in the bottom 10% of income earners this year aren’t necessarily the same ones that were there in 2008.

We’re used to seeing data based on groups of income earners, not individuals. That’s how the Census reports. However, studying individuals tells a more relevant story.

The United States Treasury tracked individuals’ tax returns from 1990 to 2005. They found that over half of people in the bottom quintile as of 1990 had moved to a higher quintile by 2005.

Screen Shot 2016-09-22 at 10.36.01 PM.png

The Census statistics measure exactly what they measure: nothing more. That doesn’t mean that information is useless, it just means we shouldn’t lose our heads over it. Extrapolating a verdict about America’s economic health from median household income data exposes us to opportunities to make mistakes based on a deceptively simplistic figure.


Don’t mistake my skepticism of stats for pessimism about the American economy. Where long-term trends in the American economy are concerned, optimism is never a bad idea.

Smith College Protests: Beneath Outrage, Statistical Confusion

Two leaked letters between staff and administrators at the Smith College School for Social Work have led to mass student protests of perceived institutional racism.

Professors alleged that admissions staff were doing a disservice–particularly to minority students–by admitting unprepared students to the program, despite “overwhelming data that demonstrates that many…students, including white­-identified students, cannot offer clients a social work intervention that is based upon competence, skills and ethics.”

The unnamed source of the leak, as well as students, took umbrage over some of the terminology and implications of the letters, citing “violent, racist rhetoric.”

Protests at Smith College are somewhat of a perennial occurrence, but this case is particularly interesting because it deals in part with matters that can be verified through existing data. Moreover, arguments from the students present good opportunities to debunk common fallacious assumptions and underscore the importance of viewing statistics in proper context.

The first such assumption is that members of differing groups should be expected to achieve similar results and outside factors are to blame when this isn’t the case.

Contemporary politics are inundated with references to various forms of inequality. Why would we expect that there would be huge differences across people of varying (ethnic) groups except when it comes to academic performance? Indeed, racial achievement gaps are a widely acknowledged phenomenon.

For this reason, student Chris Watkins’ statement that a “disproportionate amount of black and Latino students” are under review, which can endanger their chances of graduation, isn’t enough to indicate racism. There’s no reason to assume that black and Latin students as a group would do as well as whites or Asians besides that we might find it ideologically appealing.

The second assumption is that multivariate groups of people can be divided neatly by single variables. Speaking of black, Latino, and white students assumes that the students that fall into these categories share all other variables that might affect educational performance.

Students could just as easily be separated by family income or some other variable that correlates to academic success. We wouldn’t expect black students from poor, single-parent households and upper class black students whose parents are Ivy League alumni to succeed at the same rates, even though both are black. Any discussion of racial outcomes that doesn’t take other factors into account is too blunt to deserve much weight.

Even if racism were a factor in determining which students are put on review, as Watkins seems to allege, the proportion of students on review by race wouldn’t tell us that, which brings me to my third point: Gross statistics are easily digestible, but can rarely be trusted to convey the nuances of a situation. Discrimination could be inferred statistically, but Watkins is looking in the wrong place for evidence.

In 1991 the Federal Reserve Bank of Boston found that after adjusting for several factors, blacks loan applicants were rejected about 17% of the time, compared with 11% for white applicants. The Boston FED felt that this was enough to infer racial discrimination on the part of lenders, which confirmed an existing belief held by the researcher.

It was only later that a writer at Forbes pointed out that racial discrimination would be evidenced not in the percentage of rejected applicants, but in the default rates of the borrowers. Lower rates of default among black borrowers would indicate that their applications were being held to tighter standards than comparable white applications. Since black and white default rates were even, it appeared that race was not a deciding factor in the lenders’ decision-making process.

We can apply the same logic to this accusation of faculty racism at Smith College. It’s not enough to demonstrate that a higher proportion of black and Latino students are placed under review; we need to know they’re outperforming white students who are also placed on review, or inversely, if among students not under review whites had a lower GPA than black and Latino students.

Similarly, we can evaluate the complaints of Professor Dennis Miehls and the “Concerned Adjuncts.”

If, as the letters from staff seem to imply, unqualified students were being admitted because of an administrative predilection for non-academic qualities,[1] we would probably find some evidence of that in the incoming GPAs (or other metric of gauging academic preparedness) of students under review relative to their successful peers. Since the Smith College MSW program doesn’t require applicants to take a GRE, work experience, undergraduate GPA, and SAT scores might be the best such indicators.

In a school so often embattled by protests and accusations of racism, students and faculty should take this chance to quantitatively assess whether or not racism is affecting the performance of minority students on campus. With any luck, someone with access to the right data will perform a competent analysis.

[1] It would hardly be the first time a university admitted based on preferential characteristics that had nothing to do with academic success. Among students admitted to medical schools between 2013 and 2016, black and Latino students have lower median GPAs and MCAT scores than white applicants, who are similarly behind Asian applicants. Among applicants with comparable MCAT and GPAs, black and Latino students are far more likely to be accepted, indicating an admissions preference.