We’ve recently seen research about so-called “bots” and misinformation on Twitter and wanted to share our perspective on why findings that might seem remarkable at first are likely inaccurate. We’re working on a more detailed explanation, but some comments for now.

We continue to be excited by the research opportunities that Twitter data provides. Our service is the largest source of real-time social media data, and we make this data available to the public for free through our public API. No other major service does this.
Many researchers, academics, and journalists use our public API — a set of tools for programmatically accessing information on Twitter. We make all public Twitter content available via our APIs. You can learn more about them here: https://t.co/QJQ0USRvI2
The basic issue with much of the research based on our public APIs is simple: The APIs don't provide insight into our defensive actions to protect Twitter from manipulation, including bots.
Because of this, API-based research can't distinguish between accounts we've already identified as bad (and hidden or removed) and real, authentic ones.
This means that our primary actions here — challenging, filtering, and removing bad actors before they have a chance to disrupt people's experience on Twitter — are not reflected.
Why not include this data? Because doing so would make it easier for bad actors to get around our defenses. https://t.co/Q5yweOXc1x
Let’s take a step back and look at the issue of “bots” in general. Even among researchers, there’s little agreement about what “bot” means. It’s a term used to refer to everything from accounts that post automatically to spammers to real people that Tweet something controversial.
The lack of understanding of what a “bot” is and is not contributes to fear, uncertainty, and distrust — in short, unhealthy conversations.
The same way we sometimes see people dismissing facts as "fake news," we also see real people labeling each other as bots rather than engaging with each other — to the detriment of the public conversation.
We've also seen bot detectors and dashboards created by commercial entities, which claim conversations are full of bots, seemingly in an effort to boost their own business models.
When we talk about bots, we mean accounts engaged in platform manipulation and spam. Even then, identifying bots using only public data is very difficult.
Since nobody other than Twitter can see non-public, internal account data, third parties using Twitter data to identify bots are doing so based on probability, not certainty.
One of the most common signals used to predict if someone is a bot is how often they Tweet, or how many times they Retweet. The obvious problem there is, people who are passionate about politics, or sports, or music also Tweet a lot.
Some people only Retweet. There are lots of different ways to use Twitter, and labeling certain uses “bot-like” is unhelpful. Other signals, like political views, the presence of a profile photo, frequency of Retweets, or number of followers seem obvious, but are not clearcut.
These behaviors differ globally, across age groups, language usage, and people’s individual choices about their own privacy and self-expression online.
Many of the common “bot detectors” or “troll hunters” use machine learning techniques to return a “bot score.” What does this actually mean? The answer is very little.
In order to train a machine learning model, you have to start with a training set of users you “know” are bots, so the model can predict whether other users are similar to or different from them.
These tools and approaches are deeply flawed. In our experience, most people aren’t very good at identifying bots from public information alone.
The end result is a staggering margin of error, and one that builds in preconceptions and biases about Tweet volume, political beliefs, and user behavior. These issues rarely make it into media reports, but are often the reasons why some numbers are surprisingly large.
Much of what is being presented as categorical findings is in fact an extrapolated guess and not even close to being accurate. There isn’t really a bot behind every flag. This concern was articulated by one leading researcher in this Buzzfeed piece: https://t.co/WqydQjiYIE
We continue to be committed to enabling academic research, at scale, using Twitter data. Our policies are written to support this work — including when the results are unflattering to Twitter.
However, we believe that to protect our efforts promoting healthy public conversations, there’s a need to speak up here — a lot of this “bot research” is not peer reviewed and not reflective of the facts on any level.
These types of studies, that are covered widely in the media, do not stand up to scrutiny and undermine healthy public conversation, our singular mission as a company.
Oh, and if you see a suspicious account, use our new reporting feature and let us know. It helps our work to make this place better for everyone. Thanks for reading. https://t.co/kypOkCyWk9

More from Tech

So we had to develop technologies like this to barely manage control over limited areas in Iraq's few urban centers. Only ~8 in 100 Iraqi adults owns a personal vehicle. That rate is > 1 car/adult in America yet I have never seen any doctrine paper or work of fiction address this


We've seen and struggled in civil conflicts with instant, local, universal, distributed communications (cell phone era, basically every conflict since 2000). We've seen and struggled in conflicts with instant, global, universal distributed communications (everything since 2011).

The world's most overfunded military and glow in the dark agencies struggle and largely fail to contain conflicts where fhe vast, vast majority of people are locked into a ~5mi radius of their home.

How can they possibly contain a conflict in a nation with universal car ownership and the most developed road network in the world? The average car can travel over 400 miles on one tank of gas, how can you contain the potential of that kind of mobility?

I think that's partially why the system was so freaked out by 1/6. Yes, most of it is histrionics but you don't decide to indefinitely turn your capital into the Baghdad Green Zone with fortifications and 25k troops over histrionics alone.

You May Also Like

And here they are...

THE WINNERS OF THE 24 HOUR STARTUP CHALLENGE

Remember, this money is just fun. If you launched a product (or even attempted a launch) - you did something worth MUCH more than $1,000.

#24hrstartup

The winners 👇

#10

Lattes For Change - Skip a latte and save a life.

https://t.co/M75RAirZzs

@frantzfries built a platform where you can see how skipping your morning latte could do for the world.

A great product for a great cause.

Congrats Chris on winning $250!


#9

Instaland - Create amazing landing pages for your followers.

https://t.co/5KkveJTAsy

A team project! @bpmct and @BaileyPumfleet built a tool for social media influencers to create simple "swipe up" landing pages for followers.

Really impressive for 24 hours. Congrats!


#8

SayHenlo - Chat without distractions

https://t.co/og0B7gmkW6

Built by @DaltonEdwards, it's a platform for combatting conversation overload. This product was also coded exclusively from an iPad 😲

Dalton is a beast. I'm so excited he placed in the top 10.


#7

CoderStory - Learn to code from developers across the globe!

https://t.co/86Ay6nF4AY

Built by @jesswallaceuk, the project is focused on highlighting the experience of developers and people learning to code.

I wish this existed when I learned to code! Congrats on $250!!
@EricTopol @NBA @StephenKissler @yhgrad B.1.1.7 reveals clearly that SARS-CoV-2 is reverting to its original pre-outbreak condition, i.e. adapted to transgenic hACE2 mice (either Baric's BALB/c ones or others used at WIV labs during chimeric bat coronavirus experiments aimed at developing a pan betacoronavirus vaccine)

@NBA @StephenKissler @yhgrad 1. From Day 1, SARS-COV-2 was very well adapted to humans .....and transgenic hACE2 Mice


@NBA @StephenKissler @yhgrad 2. High Probability of serial passaging in Transgenic Mice expressing hACE2 in genesis of SARS-COV-2


@NBA @StephenKissler @yhgrad B.1.1.7 has an unusually large number of genetic changes, ... found to date in mouse-adapted SARS-CoV2 and is also seen in ferret infections.
https://t.co/9Z4oJmkcKj


@NBA @StephenKissler @yhgrad We adapted a clinical isolate of SARS-CoV-2 by serial passaging in the ... Thus, this mouse-adapted strain and associated challenge model should be ... (B) SARS-CoV-2 genomic RNA loads in mouse lung homogenates at P0 to P6.
https://t.co/I90OOCJg7o