Categories Aiml
7 days
30 days
All time
Recent
Popular
🤖 This example from the Google AI blog has been bothering me for ages. It can’t possibly be right. The density of living tissue is always nearly 1, and pears feel heavier in the hand than most fruit.
So today I resolved to ¡SCIENCE IT! and bought a pear and can report that this one, at least, sinks in water. (Just barely. Its density must be a hair over 1.)
Camera angle isn’t great, but best I could do—it’s looking through the side of a glass bowl, beneath the surface level.
So what’s going on here? I took the Google AI blog as saying that the answer was generated by one of their AI systems, and that’s been the assumption of other analyses on the internet, but they don’t actually say that! It’s just an example problem. https://t.co/RCdv5KF6NH
The problem comes from the StrategyQA benchmark, which I downloaded. Here it is!
The answer according to the benchmark includes a density claim of 0.59.
The plot thickens…
Oh, they crowdsourced the data set from Mechanical Turk (a platform which pays random people in poor countries a few cents to do mindless tasks as quickly as possible). They were supposed to use only Wikipedia to generate Q/A pairs. But it doesn’t give pear density, so …

So today I resolved to ¡SCIENCE IT! and bought a pear and can report that this one, at least, sinks in water. (Just barely. Its density must be a hair over 1.)
Camera angle isn’t great, but best I could do—it’s looking through the side of a glass bowl, beneath the surface level.

So what’s going on here? I took the Google AI blog as saying that the answer was generated by one of their AI systems, and that’s been the assumption of other analyses on the internet, but they don’t actually say that! It’s just an example problem. https://t.co/RCdv5KF6NH

The problem comes from the StrategyQA benchmark, which I downloaded. Here it is!
The answer according to the benchmark includes a density claim of 0.59.
The plot thickens…

Oh, they crowdsourced the data set from Mechanical Turk (a platform which pays random people in poor countries a few cents to do mindless tasks as quickly as possible). They were supposed to use only Wikipedia to generate Q/A pairs. But it doesn’t give pear density, so …
