The tech community is engaged in a heated discussion about X's recent terms of service update, which explicitly allows the platform to use user content for AI training. While some view this as a natural evolution of social media platforms, others raise concerns about data quality and the implications for creative communities.
The Reality of Data Scraping
Community discussions reveal a pragmatic perspective on data collection, with many users acknowledging that public content is likely being scraped for AI training regardless of terms of service. As one community member points out, posting on X is essentially equivalent to publishing a blog post, and the platform is actually being more transparent about its intentions compared to other social media platforms.
Quality Concerns and Bot Content
A significant concern emerging from the community is the quality of the training data itself. Users are questioning the ratio of genuine human posts versus bot-generated content in X's dataset. This raises important questions about the potential effectiveness of AI models trained on such data, with some comparing it to the questionable quality of mortgage-backed securities in 2006.
Impact on Creative Communities
The artistic community's response to this policy change has been particularly negative. Artists have historically shown strong opposition to generative AI, and this move could potentially lead to an exodus of creative communities from the platform. This highlights a growing tension between content creators and platforms seeking to leverage user-generated content for AI development.
Privacy and Data Control
While the new terms of service are primarily focused on public posts, there's ongoing discussion about the distinction between public and private content. The platform's previous policy explicitly excluded private account posts from training Grok, its AI chatbot, but the new terms don't make this distinction clear.
Future Implications
The community discussion extends beyond X to broader industry trends. There's speculation about other tech giants' approaches to AI training, including potential local machine learning on personal devices. Apple, for instance, is mentioned as potentially well-positioned to offer personalized AI functionality through unified memory and vector databasing.
Legal and Copyright Considerations
An interesting point raised in the community discussion is the asymmetry in copyright enforcement between large corporations and individual users. While companies can freely use public internet content for AI training, individual users face severe consequences for copyright infringement.
The new terms take effect on November 15, 2024, and users continuing to use the platform after this date will automatically agree to these conditions. The debate continues about whether this represents a necessary evolution in social media or a concerning trend in data rights and privacy.