case study 2024-12-18 11 min read

Building LinkedIn's Semantic Search: From Keywords to Understanding

Explore how LinkedIn transformed its job search from keyword matching to semantic understanding using embeddings and neural retrieval.

LinkedIn semantic search embeddings retrieval NLP

Introduction

LinkedIn's semantic search represents a paradigm shift from traditional keyword-based search to understanding user intent. This case study explores how LinkedIn built a semantic job search system serving hundreds of millions of users.

The Evolution of Search

Traditional Keyword Search

  • Exact matching: "software engineer" finds only exact matches
  • Synonym expansion: Manual mappings of related terms
  • TF-IDF ranking: Statistical relevance scoring

Semantic Search

  • Intent understanding: "coding jobs" understands software engineering
  • Embedding similarity: Neural representations capture meaning
  • Hybrid retrieval: Combines lexical and semantic signals

Architecture Overview

Query Understanding

  1. Query encoding: Transform queries into dense vectors
  2. Intent classification: Identify search intent (job title, skill, location)
  3. Query expansion: Add semantically related terms

Document Indexing

  1. Job embedding generation: Pre-compute embeddings for all jobs
  2. Multi-field indexing: Separate embeddings for title, description, skills
  3. ANN index building: Efficient similarity search structures

Retrieval Pipeline

Query -> Query Encoder -> ANN Search -> Candidate Jobs -> Ranker -> Results

Technical Challenges

Scale Considerations

  • Billions of job-query pairs for training
  • Real-time embedding updates for new jobs
  • Sub-millisecond latency requirements

Quality Improvements

  • Contrastive learning on click data
  • Hard negative mining for better discrimination
  • Multi-task learning for diverse signals

Results and Impact

  • 30% increase in job application rates
  • Improved matching for non-obvious queries
  • Better experience for international users

Lessons Learned

  1. Hybrid search outperforms pure semantic or keyword
  2. User feedback is essential for training
  3. Incremental deployment reduces risk

Explore more about retrieval systems in our RAG Systems at Scale course.

Want to Go Deeper?

This article is part of our comprehensive curriculum on building ML systems at scale. Explore our full courses for hands-on learning.