Introduction
LinkedIn's semantic search represents a paradigm shift from traditional keyword-based search to understanding user intent. This case study explores how LinkedIn built a semantic job search system serving hundreds of millions of users.
The Evolution of Search
Traditional Keyword Search
- Exact matching: "software engineer" finds only exact matches
- Synonym expansion: Manual mappings of related terms
- TF-IDF ranking: Statistical relevance scoring
Semantic Search
- Intent understanding: "coding jobs" understands software engineering
- Embedding similarity: Neural representations capture meaning
- Hybrid retrieval: Combines lexical and semantic signals
Architecture Overview
Query Understanding
- Query encoding: Transform queries into dense vectors
- Intent classification: Identify search intent (job title, skill, location)
- Query expansion: Add semantically related terms
Document Indexing
- Job embedding generation: Pre-compute embeddings for all jobs
- Multi-field indexing: Separate embeddings for title, description, skills
- ANN index building: Efficient similarity search structures
Retrieval Pipeline
Query -> Query Encoder -> ANN Search -> Candidate Jobs -> Ranker -> Results
Technical Challenges
Scale Considerations
- Billions of job-query pairs for training
- Real-time embedding updates for new jobs
- Sub-millisecond latency requirements
Quality Improvements
- Contrastive learning on click data
- Hard negative mining for better discrimination
- Multi-task learning for diverse signals
Results and Impact
- 30% increase in job application rates
- Improved matching for non-obvious queries
- Better experience for international users
Lessons Learned
- Hybrid search outperforms pure semantic or keyword
- User feedback is essential for training
- Incremental deployment reduces risk
Explore more about retrieval systems in our RAG Systems at Scale course.