Adapted from Hang Li, MSR Asia, 09

  Learning to Rank Pradeep Ravikumar !

CS 371R

  !

Adapted from Hang Li, MSR Asia, 09

!

  Ranking

Coke or Pepsi or Sprite)?

  Ranking

Ranking Chess Players

  Ranking

Ranking Chess Players

Golf Players, etc.

  Whither Machine Learning?

  Coke, Pepsi, Sprite Coke > Sprite > Pepsi List of Objects

  Ranking of Objects

Machine Learning Magic Box

  Whither Machine Learning?

  Coke, Pepsi, Sprite Coke > Sprite > Pepsi List of Objects

  Ranking of Objects From “training” data

Machine Learning Magic Box

  Ranking

  offerings

Objects

  O o o o , , ,

  N

  1

  2

  ranking(of(offerings

Ranking of Objects

  o s ,

  1

  request

Query

  o s ,

  2 s f s o

  ( , ) o s n

  ,

  s

  16 Ranking via Machine Learning Learning(to(Rank

  1 ,

  1 2 , 1 1 ,

  2

  O o o o , , ,

  o s f N

  ) , (

  1 m m m m m m o s f o o s f o

  1 1 ,

  1

  1

  1 2 , 1 1 ,

  ) , ( ) , ( 2 ,

  1 m s

  Ranking( System

  Learning( System

  , 2 , 1 ,

  1 n o o o s m n m m m m o o o s

  1

  1 “Training” data

Is learning to rank necessary for IR?

  • Has been successfully used in web-document search
  • • Has been the focus of increasing research at the intersection of machine

    learning + IR

  Example: Machine Translation

  sentence(source(language

  f e

  Generative

  e

  2

  ranked(sentence candidates(in(target( language

  e 1 0 0 0

  Re5Ranking(

  ~ e

  Model re5ranked(top sentence(in(target language(

  138 Example: Collaborative Filtering I tem1 I tem2 I tem3 ...

  Use r1

  5

  1

  2

  2 ...

  ? ? ? Use r M

  4

  3 Example: Collaborative Filtering I tem1 I tem2 I tem3 ...

  Use r1

  5

  4 Use r2

  1

  2

  2 ...

  ? ? ? Use r M

  4

  3 Input: User Ratings on some items Output: User Ratings on other items Example: Collaborative Filtering I tem1 I tem2 I tem3 ...

  Use r1

  5

  4 Use r2

  1

  2

  2 ...

  ? ? ? Use r M

  4

  3 Any user is a query; based on this query, 


rank the set of items (based on user-item features)

Example: Key Phrase Extraction

  • Input: Document • Output: Key phrases in document
  • • Pre-processing step: extraction of possible phrases in document

  • Ranking approach: rank the extracted phrases

  

Other IR applications of learning to rank

  • Question Answering • Document Summarization Opinion Mining • Sentiment Analysis • Machine Translation • Sentence Parsing • ….

  Our focus: ranking documents

  ranking(based(on( documents relevance,(importance,( preference

  D d , d , , d

  1

  ranking(of(documents query

  d q

  ,

  1 q d q ,

  2

( q , d ) f d q , n q Traditional approach to “learning to rank” Probabilistic(Model

  N d d d

  2

  1 q

  ) , | ( ~ ) , | ( ~ ) , | ( ~

  2

  2

  1

  1 n n

  P d q r d P d q r d P d q r d

  query documents ranking(of(documents

  } , 1 { ) , | (

  R P d q r Modern approach to learning to rank Learning(to(Rank

D d d d , ,

  1 2 , 1 1 ,

  1

  2

  N

  ) , ( d q f

  1 1 , 1 m m m m m m d q f d d q f d

  1

  1 ,

  1 2 , 1 1 ,

  ) , ( ) , ( 2 ,

  1 m q

  Ranking( System

  Learning( System

  , 2 , 1 ,

  1 n d d d q m n m m m m d d d q

  1

  1

  

Learning to Rank: Training Process

d

  # # d

  # 1 ,

  1

  & &

  &

  d d & &

  1 ,

  2 &

  q q

  1

  $ $

  1 $

  & &

  &

  & &

  &

  d d

  1 , n

  1 % %

  %

  1.(Data(Labeling (rank) d d

  # # # m ,

  1

  & & & d d m ,

  2

  & & & q q m m $

  $ $ & & & & & & d d m

  , n

  m % % %

  

Learning to Rank: Training Process

d y

  # #

  1 ,

  1 1 ,

  1

  d

  # 1 ,

  1

  & &

  &

  d y

  1 ,

  2 1 ,

  2

  1 ,

  2 &

  d & &

  q q

  1

  $ $

  1 $

  & &

  &

  & &

  &

  d y d

  1 , n 1 , n

  1

  1

  1 , n

  1 % %

  %

  1.(Data(Labeling 2.(Feature( (rank) Extraction d d y

  # # # m ,

  1 m , 1 m ,

  1

  & & & d d y m ,

  2 m , 2 m ,

  2

  & & & q q m m $

  $ $ & & & & & & d d y m m m

  , n , n , n

  m m m % % % Learning to Rank: Training Process features are functions of d y x y

  # #

  1 ,

  1 1 ,

  1 1 ,

  1 1 ,

  1

  d

  # 1 ,

  1

  query AND document & &

  &

  d y x y

  1 ,

  2 1 ,

  2 1 ,

  2 1 ,

  2

  1 ,

  2 &

  d & &

  q q

  1

  $ $

  1 $

  & &

  &

  & &

  &

  d y x y d

  1 , n 1 , n 1 , n 1 , n

  1

  1

  1 , n

  

1

  1 1 % %

  %

  1.(Data(Labeling 2.(Feature( 3.(Learning (rank) Extraction x

  ( ) f d d y x y

  # # # m m m ,

  1 m , 1 m , 1 , 1 ,

  1

  & & &

x y

d d y m ,

  2 m ,

  2

  m ,

  2 m , 2 m ,

  2

  & & & q q m m $

  $ $ & & & & & & x y d d y m , n m , n m m m

  , n , n , n m m

  m m m % % %

  

Learning to Rank: Testing Process

1.(Data(Labeling (rank)

  & & % & & $ #

  & & % & & $ #

  1

  ,

  1 2 , 1 1 ,

  1

  1

  m n m

  m m m

  d d d q

  & & % & & $ #

  1

  m m m m d d d q

  & & % & & $ #

  

Learning to Rank: Testing Process

1.(Data(Labeling (rank)

  1

  m m m x x x

  & & % & & $ #

  m m n m n m m m m m m y d y d y d q

  1

  1

  1 1 ,

  1 2 , 1 1 ,

  , 1 , 1 2 ,

  1

  & & % & & $ #

  2.(Feature( Extraction & & % & & $ #

  d d d q

  m m m

  m n m

  1

  1

  1 2 , 1 1 ,

  ,

  1

  & & % & & $ #

  Ground truth ranks Learning to Rank: Testing Process 1.(Data(Labeling (rank)

  2.(Feature( Extraction 3.((Ranking( with ) (x f

  1 2 , 1 1 ,

  m m n m n m m m m m y x y x y x

  1

  1 1 ,

  1 2 , 1 1 ,

  , 1 , 1 2 ,

  1

  1

  & & % & & $ #

  m m n m n m m m m m m y d y d y d q

  1

  1

  1 1 ,

  , 1 , 1 2 ,

  & & % & & $ #

  1

  1

  & & % & & $ #

  d d d q

  m m m

  m n m

  1

  1

  1 2 , 1 1 ,

  ,

  1

  & & % & & $ #

  Ground truth ranks Learning to Rank: Testing Process 1.(Data(Labeling (rank)

  2.(Feature( Extraction

  m m n m n m m m m m m y d y d y d q

  & & % & & $ #

  1

  1

  , 1 , 1 2 ,

  1 2 , 1 1 ,

  1 1 ,

  1

  1

  & & % & & $ #

  m m m

  1

  1

  , 1 , 1 2 ,

  1 2 , 1 1 ,

  1 1 ,

  1

  m m n m n m m m m m y x y x y x

  Ground truth ranks

  d d d q

  m n m

  28 3.((Ranking( with ) (x f

  1 1 , 1 1 ,

  & & % & & $ #

  ) ( ) ( ) (

  1

  1

  1 , 1 ,

  1 ,

  1 2 , 1 2 ,

  1 2 , 1 1 ,

  1 m m m

  1

  n m n m n m

  m m m m m m

  y x f x y x f x y x f x

  4.((Evaluation Evaluation

  & & % & & $ #

  1

  ,

  1 2 , 1 1 ,

  1

Result

  Issues in learning to rank

  • Data Labeling • Feature Extraction • Evaluation Measure

    • Learning Method (Model, Loss Function, Algorithm)

  Doc(A Doc(B Doc(C

  Data Labeling E.g.,(relevance(of(documents(w.r.t.(query

  • Ground truth/supervision: “ranks” of set of documents/objects

  Query

Data Labeling: Supervision takes different forms

  • Multiple levels (e.g., relevant, partially relevant, irrelevant) ‣ Widely used in IR
  • Ordered Pairs ‣ e.g. ordered pairs between documents (e.g. A>B, B>C) ‣ e.g. implicit relevance judgments derived from click-through data
  • Fully ordered list (i.e. a permutation) of documents is given

Data Labeling: Implicit Relevance Judgement

  ranking(of(documents(at(search(system Doc(A users(often(clicked(on(Doc(B Doc(B ordered(pair

  B(>(A Doc(C Feature Extraction Query5document(feature Document(feature

  Feature(Vectors .............

  Doc(A BM25 PageRank Query .............

  Doc(B PageRank BM25

  .............

  Doc(C PageRank BM25

  35

Feature Extraction: Example Features

  • BM25 (a classical ranking score function)
  • • Proximity/distance measures between query and document

  • Whether query exactly occurs in document
  • PageRank

Evaluation Measures

  • • Allows us to quantify how good the predicted ranking is when compared with ground truth

    ranking
  • Typical evaluation metrics care more about ranking top results correctly
  • Popular Measures ‣ NDCG (Normalized Discounted Cumulative Gain) ‣ MAP (Mean Average Precision) ‣ MRR (Mean Reciprocal Rank) ‣ WTA (Winners Take All)

  Kendall’s Tau

  • -! #

  

Comparing(agreement(between(two(permutations

P Q

  • 1

  n n ( 1 )

  2

  1 n n

  ( 1 ) Number(of(pairs(

  P Q

  ,

  pairs(

  • 1

  Completely(disagree(

  Kendall’s Tau

  • ! # 0 "-1

  Example

C((A((B

  3

  1

  3 2 -

  1

  

Learning to Rank Methods

  • Three major approaches ‣ Pointwise approach   ‣ Pairwise approach ‣ Listwise approach

Pointwise Approach

  • • Transforming ranking to regression, classification, or ordinal regression

  • Query-document group structure is ignored

  Pointwise Approach

Pointwise(Approach

Learning Regression Classification Ordinal2Regression

  

Feature(vector

x

Input

  Real(number Category Ordered category

Output

  y x y (x ) sign( ( )) y thresold ( ( x )) f f f

  Ranking(function

Model

  x ( ) f

Loss2

  Ordinal(regression( Regression(loss Classification loss Function loss

Pairwise Approach

  • • Transforming ranking to pairwise classification

  • Query-document group structure is ignored

  Pairwise Approach Pairwise(Approach

Learning

  

Ordered(feature(vector(pair

Input

  ( x , x ) i j

  Classification on(order(of(vector(pair

Output

  y sign( ( x x )) i , j f i j

Model

  Ranking function( x

  ( ) Ranking function(( f

Loss2

  Pairwise classification(loss( Function Pairwise Ranking Converting(document(list(to(document(pairs

Query

  Doc(A Doc(B Doc(C

  Query Doc(A Doc(B

  Doc(C

  Doc(A Doc(B Doc(C Rank(1 Rank(2 Rank(3

  

Transformed Pairwise Classification Problem

Transformed(Pairwise(Classification(Problem x x

  1 3 ( ; ) f x w

  Positive(Examples x x

  2

  3 x x

  1

  2 x x

  3

  1 x x

  2

  1

  • 1

  Negative(Examples

  51

  x x

  3

  2

  66

Listwise Approach

  • Entire list of documents as input-instance
  • Query-document group structure is used
  • Straightforwardly represents learning to rank problem ‣ Many state of the art methods: ListNet, ListMLE, etc.

  Listwise Approach

Learning &2Ranking

  Feature(vectors

Input

  n { } x x

  1 i i

  Permutation(on(feature(vectors

Output

  n sort( { ( )} ) f x

  1 i i Ranking(function

Model

  ( ) f x

  Listwise loss(function Loss2Function

  (ranking(evaluation(measure)