Problem Statement: Prevalence of Spam Accounts on Twitter
Twitter defines four types of spam behavior in their policy: commercially-motivated spam, inauthentic engagements, coordinated activity, and coordinated harmful activity. These violations can be enacted by promotional accounts, false/fake accounts, bots, malicious accounts (trolls), and compromised accounts.
This report examines the prevalence of spam accounts on Twitter, evaluates Twitter's spam detection processes, and analyzes the effects of spam accounts on the platform. We will use multiple methodologies to estimate the true rate of spam accounts and assess whether Twitter's reported figures are accurate.

by Trevor Davis

Methodology for Estimating Spam Account Prevalence
1
Count and Extrapolate
Sample mDAU accounts, identify spam, and extrapolate to full list
2
mDAU Modeling
Create machine learning model to distinguish spam from legitimate users
3
Twitter's Other Spam Systems
Apply Twitter's Smyte and Botmaker workflows to mDAU data
4
Scientific Consensus
Use state-of-the-art tools like Botometer to establish baseline
Key Findings on Spam Account Prevalence
1
Manual Review
Re-estimated mDAU spam account rate of 10.32% (vs Twitter's 0.72%)
2
Machine Learning Model
8.01% of active mDAU accounts predicted to be spam
3
Scientific Literature
Studies consistently report 9-21% spam accounts
4
Spam Activity
Spam accounts nearly twice as active as non-spam accounts
Issues with Twitter's Spam Evaluation Process
1
Uncertainty Handling
Process collapses uncertainty toward "Good" classification
2
Limited Information
Evaluators lack access to crucial non-public account data
3
Coordination Detection
Insufficient investigation of coordinated inauthentic behavior
4
Off-Platform Activity
Overlooks common cross-platform amplification tactics
Effects of Spam Accounts on Twitter
1
Scams and Fraud
Cryptocurrency schemes, counterfeit products, phishing
2
Stock Manipulation
Coordinated efforts to influence market perceptions
3
Political Disinformation
Distortion of discourse, amplification of false narratives
4
Public Health Misinformation
Spread of unverified medical claims, conspiracy theories
Recommendations for Improving Spam Detection
1
Update Training Materials
Regularly revise to address evolving spam tactics
2
Enhance Coordination Detection
Implement tools to identify inauthentic networks
3
Improve Annotator Assessment
Regularly evaluate and retrain spam detection staff
4
Cross-Platform Analysis
Consider off-Twitter activity in spam determination
Implications for Twitter's Business Model
The higher prevalence of spam accounts has significant implications for Twitter's advertising-based business model. Spam accounts generate a disproportionate amount of activity, potentially inflating engagement metrics presented to advertisers.
Additionally, the presence of spam accounts degrades the user experience for legitimate users, potentially impacting user retention and growth. Twitter must balance aggressive spam removal with the risk of falsely flagging legitimate accounts.
Conclusions and Next Steps
Our analysis suggests that Twitter's reported spam account rate significantly underestimates the true prevalence of spam on the platform. The company's spam evaluation process has several flaws that lead to systematic undercounting of spam accounts.
To address these issues, Twitter should:
1
Revise Methodology
Implement more robust spam detection techniques
2
Increase Transparency
Provide more detailed reporting on spam account prevalence
3
Invest in Technology
Develop advanced AI/ML tools for spam detection