How X's "For You" algorithm actually works

Updated: January 21, 2026

Here’s the inside scoop that changes everything about how X’s recommendations work now. X didn’t just swap out one algorithm for another. They open-sourced the entire “For You” feed architecture, and what’s inside reveals a fundamentally different approach to content discovery than any other platform. I’ve reviewed the entire technical documentation, and the architecture is more sophisticated than the initial announcements suggested. This isn’t engagement metrics versus AI. It’s a multi-stage system that eliminates hand-engineered features entirely and lets the Grok-based transformer learn what matters from your behaviour alone.

What X actually built

The system combines three distinct components that work together to generate your “For You” feed. Thunder handles in-network content from accounts you follow. Phoenix retrieval finds relevant out-of-network posts using a two-tower model. Then the Phoenix ranking transformer scores everything using predictions for 15 different engagement types. X’s technical team eliminated every single hand-engineered feature and most heuristics from the system. The Grok-based transformer does all the heavy lifting by understanding your engagement history and using that to determine what content is relevant to you.

This is genuinely different from what Meta, Instagram and TikTok are running. Those platforms layer machine learning on top of traditional engagement frameworks. X rebuilt the entire foundation.

Thunder solves the in-network problem

Thunder is an in-memory post store that tracks recent posts from every user on the platform. It consumes post-create and delete events from Kafka in real time and maintains per-user stores for original posts, replies, reposts and video posts. When you request your For You feed, Thunder serves posts from accounts you follow without hitting an external database. The system automatically trims posts older than the retention period.

This enables sub-millisecond lookups for in-network content. Instead of querying a database for “show me recent posts from the 500 accounts this user follows,” Thunder already has that data indexed in memory and ready to serve. For creators with engaged followers, this means your posts reach their feeds faster than traditional database-backed systems can.

Phoenix retrieval discovers what you don’t know exists

The retrieval component uses a two-tower model that encodes both users and posts into embeddings. The user tower processes your features and engagement history into a single embedding. The candidate tower encodes every post on the platform into embeddings. The system then runs a dot-product-based similarity search to find the top posts that match your interests.

This is how X surfaces relevant content from accounts you don’t follow. The similarity search operates across the global corpus of posts, not just your network. A user interested in AI research who has never followed anyone in that space will still see relevant AI content because the two-tower model learned to encode “interested in AI” from their engagement patterns on other content.

The key difference from competitor approaches is that X uses hash-based embeddings instead of traditional learned embeddings. Multiple hash functions generate the embedding lookups, reducing model size and speeding up inference while maintaining relevance quality.

Candidate isolation makes ranking consistent

After Thunder and Phoenix retrieval generate candidate posts, the Phoenix transformer ranks everything together. This is where the Grok-based architecture becomes critical. The transformer uses special attention masking to prevent candidates from attending to each other during scoring. Each post only considers your engagement history.

This design choice solves a fundamental problem with batch scoring systems. If candidate A’s score depends on candidate B being in the same batch, scores become inconsistent across requests. You can’t cache scores. You can’t parallelize scoring efficiently. Candidate isolation means the score for a specific post, given your context, remains constant regardless of which other posts are in the batch. This makes scores cacheable and consistent.

The transformer predicts probabilities for 15 different engagement types: favourite, reply, repost, quote, click, profile click, video view, photo expand, share, dwell, follow author, not interested, block author, mute author and report. Each prediction represents the model’s estimate of how likely you are to take that action on this specific post.

Weighted scoring combines predictions into relevance

The weighted scorer combines the 15 probability predictions into a final relevance score. Positive actions like favourites, reposts, and shares receive positive weights. Negative actions like block, mute and report get negative weights. The formula is straightforward: final score equals the sum of each weight multiplied by its corresponding probability.

This weighted combination means the system can balance different types of engagement based on what actually matters for user satisfaction. A post you’re 80% likely to like but 5% likely to report scores differently than a post you’re 60% likely to like but 0% likely to report. The model learned these patterns from millions of user engagement sequences.

After weighted scoring, the author diversity scorer attenuates repeated author scores to ensure feed diversity. If your top 20 candidates all come from the same three accounts, the diversity scorer reduces scores for repeated appearances to prevent your feed from becoming monotonous.

Filtering happens at two stages

Pre-scoring filters run before the model sees candidates. These remove duplicates, posts older than the threshold, your own posts, content from blocked or muted accounts, posts containing your muted keywords, posts you’ve already seen and ineligible subscription content you can’t access. The repost deduplication filter removes multiple reposts of the same underlying content.

Post-selection filters run after the system selects the top candidates. The visibility filter removes posts flagged as deleted, spam, violence or gore. The conversation deduplication filter handles multiple branches of the same conversation thread to prevent showing you five replies to the same original post.

According to X’s technical documentation, this two-stage filtering approach reduces unnecessary computation. Running expensive ML predictions on posts that will eventually get filtered wastes resources. Pre-scoring filters eliminate ineligible content before it reaches the model.

No hand-engineered features change everything

Traditional recommendation systems rely heavily on engineered features. Follower count, engagement rate, author authority scores, content freshness multipliers, topic category assignments and dozens of other manually designed signals get fed into the model. Each feature represents an assumption about what matters.

X eliminated all of that. The Grok-based transformer receives your engagement history sequence and candidate post data. No follower count features. No manually tuned freshness multipliers. No engineered authority scores. The model learns directly from patterns in your behaviour what signals matter for predicting your future engagements.

This approach significantly reduces the complexity of data pipelines and serving infrastructure. Every engineered feature requires computation, storage and maintenance. When platform dynamics shift, hand-engineered features need manual retuning. A pure learning approach adapts automatically as user behaviour changes.

For creators, this shift is significant. The old system rewarded optimizing for specific engineered features. Game the follower count. Maximize early engagement velocity. Hit the right topic categories. The new system rewards content that actually drives genuine engagement from people interested in your topic. You can’t game features that don’t exist.

Multi-action prediction reveals true relevance

Predicting 15 distinct engagement types rather than a single relevance score provides the system with a much richer signal about content quality. A post that gets likes but also gets reports tells you something different than a post that gets likes without negative signals. A post that drives profile clicks and follows reveals strong creator-viewer alignment. A post that generates long dwell time but no explicit engagement still indicates value.

The weighted scorer can balance these signals based on what research shows actually correlates with user satisfaction. Industry research on recommendation system design patterns demonstrates that optimizing for multiple signals produces better long-term outcomes than optimizing for engagement alone.

For creators building audiences, understanding this multi-action framework matters. The system isn’t just asking “will people like this?” It’s asking “will people like, share, click through, follow me and not report this?” Content that scores well across multiple positive signals while avoiding negative signals ranks higher than content that maximizes just one metric.

What this means for small accounts

The original article predicted that Grok would solve the new account visibility problem by evaluating content quality directly. The reality is more nuanced but still significant. The system doesn’t use follower count or engagement history as features. A brilliant thread from someone with 200 followers gets scored using the same process as content from verified accounts with millions of followers.

However, the two-tower retrieval model still needs to find your content during similarity search. If your engagement patterns don’t yet generate a strong enough signal for the user tower to encode, your content might not surface during retrieval, even if it would score well during ranking. This means small accounts still face a discovery challenge, just at a different stage of the pipeline.

The advantage is that once your content reaches the ranking stage, it competes fairly. The system can’t penalize you for having a small following because follower count isn’t a feature. Your content gets scored based on predicted engagement from the specific user viewing it, nothing else. For niche experts and community builders, this creates a genuine opportunity if you can solve the initial retrieval problem.

Creator strategy shifts from signals to substance

The old playbook was clear: optimize for engagement signals that the algorithm could measure. Get replies early. Drive shares within the first hour. Maximize quote tweets. These tactics worked because traditional algorithms used engagement velocity as a proxy for quality.

The new system learns what constitutes quality directly from user behaviour sequences. If users who engage with your content tend to follow you, dwell longer, share more and report less, the model learns to surface your content to similar users. You can’t fake this pattern by gaming individual metrics.

This doesn’t mean engagement tactics become irrelevant. Content that generates genuine engagement still performs better because the model predicts engagement probabilities. But the system learned to distinguish between engagement that predicts long-term user satisfaction and engagement that just generates clicks. A controversial take that gets 500 angry replies might generate short-term engagement without positive downstream effects.

The practical guidance is straightforward. Create content that genuinely serves your audience’s interests. Write threads that people want to share because they found them valuable. Post insights that make people want to follow you for more. Build content that viewers engage with positively across multiple action types. The weighted scorer rewards breadth of positive signals, not just volume of any single metric.

Technical architecture enables real innovation

The composable pipeline architecture X built enables rapid iteration on the recommendation system. The candidate-pipeline framework separates pipeline execution from business logic. Sources, hydrators, filters, scorers and selectors are independent components that can be added, removed or modified without rebuilding the entire system.

This matters because recommendation systems need constant refinement. User behaviour shifts. Content formats evolve. Platform dynamics change. A tightly coupled architecture slows and risks iteration. X’s framework enables parallel execution of independent stages with graceful error handling and easy addition of new components.

For platform observers and competitors, this architectural approach signals serious long-term investment in recommendation quality. Building a flexible framework rather than a monolithic system indicates X expects to iterate frequently. The open-source release suggests confidence that the architecture itself provides a competitive advantage, not just the specific model weights.

What competitors should actually be concerned about

X didn’t just build a better algorithm. They rebuilt the entire recommendation infrastructure to enable continuous learning and rapid iteration. The elimination of hand-engineered features means the system adapts automatically as user behaviour changes. The candidate isolation approach makes scoring consistent and cacheable. The multi-stage filtering reduces wasted computation. The composable architecture enables fast experimentation.

If this system works well in production, it creates a structural competitive advantage. Better discovery drives more valuable time spent, attracts creators, improves content quality, and drives better discovery. The flywheel effect is real if the foundation works.

The four to six-week timeline mentioned in the original announcement has passed. The system is now live, and the technical implementation is open source. Early reports from creators suggest feed quality has improved, though quantitative validation will take longer to establish. The real test is whether this architecture sustains improvement over months as user behaviour adapts to the new system.

The payment infrastructure problem remains

The original article identified creator payment as a critical gap, and that observation still holds. Algorithm improvements don’t retain top creators if they can’t earn meaningfully from their audience. X acknowledged payment infrastructure needs work, and until that improves, expect creators to continue hedging across multiple platforms.

Research on creator economy trends shows payment structure ranks as the second most important factor creators consider when choosing platforms, after audience size. The creator economy market is projected to grow significantly, with platforms that master both discovery and monetization gaining an advantage.

An excellent recommendation system supporting creators who can’t earn is fundamentally incomplete. The technical foundation X built enables great discovery. Payment infrastructure determines whether that discovery translates to creator retention.

You’re ahead of the curve now. Implement immediately.

The technical architecture X released represents a genuine shift in how social platforms approach content discovery. Moving from hand-engineered features to pure learning-based systems enables faster adaptation and reduces the gaming that plagues traditional algorithms. Multi-action prediction provides richer signals about content quality than single-metric optimization. Candidate isolation makes scoring consistent and cacheable.

For creators building audiences on X, the strategic shift is clear. Focus on content that generates genuine positive engagement across multiple action types. Don’t optimize for gaming features that no longer exist. Build substance that makes viewers want to like, share, follow and engage deeply without triggering negative signals.

For platform builders and competitors, the open-source release provides a detailed blueprint of where social recommendation systems are heading. The elimination of hand-engineered features, the two-tower retrieval approach, the Grok-based ranking transformer, and the composable pipeline architecture all represent technical decisions worth studying closely.

The next several months will reveal whether this architecture delivers sustained improvement in discovery quality and creator satisfaction. Early signals look promising. The technical foundation is sound. Whether execution matches the architecture’s potential will determine if other platforms follow X’s lead or maintain their current approaches.

Watch how your content performs under this system. Test what drives genuine engagement versus superficial metrics. Start optimizing for substance over signals today. The recommendation landscape is shifting, and understanding the technical foundation puts you ahead of competitors still optimizing for the old playbook.

You’ve got the inside scoop now. Implement immediately.