Adapted from Hang Li, MSR Asia, 09
Learning to Rank Pradeep Ravikumar !
CS 371R
!
Adapted from Hang Li, MSR Asia, 09
!Ranking
Coke or Pepsi or Sprite)?
Ranking
Ranking Chess Players
Ranking
Ranking Chess Players
Golf Players, etc.Whither Machine Learning?
Coke, Pepsi, Sprite Coke > Sprite > Pepsi List of Objects
Ranking of Objects
Machine Learning Magic Box
Whither Machine Learning?
Coke, Pepsi, Sprite Coke > Sprite > Pepsi List of Objects
Ranking of Objects From “training” data
Machine Learning Magic Box
Ranking
offerings
Objects
O o o o , , ,
N
1
2
ranking(of(offerings
Ranking of Objects
o s ,
1
request
Query
o s ,
2 s f s o
( , ) o s n
,
s
16 Ranking via Machine Learning Learning(to(Rank
1 ,
1 2 , 1 1 ,
2
O o o o , , ,
o s f N
) , (
1 m m m m m m o s f o o s f o
1 1 ,
1
1
1 2 , 1 1 ,
) , ( ) , ( 2 ,
1 m s
Ranking( System
Learning( System
, 2 , 1 ,
1 n o o o s m n m m m m o o o s
1
1 “Training” data
Is learning to rank necessary for IR?
- Has been successfully used in web-document search
• Has been the focus of increasing research at the intersection of machine
learning + IR
Example: Machine Translation
sentence(source(language
f e
Generative
e
2
ranked(sentence candidates(in(target( language
e 1 0 0 0
Re5Ranking(
~ e
Model re5ranked(top sentence(in(target language(
138 Example: Collaborative Filtering I tem1 I tem2 I tem3 ...
Use r1
5
1
2
2 ...
? ? ? Use r M
4
3 Example: Collaborative Filtering I tem1 I tem2 I tem3 ...
Use r1
5
4 Use r2
1
2
2 ...
? ? ? Use r M
4
3 Input: User Ratings on some items Output: User Ratings on other items Example: Collaborative Filtering I tem1 I tem2 I tem3 ...
Use r1
5
4 Use r2
1
2
2 ...
? ? ? Use r M
4
3 Any user is a query; based on this query,
rank the set of items (based on user-item features)
Example: Key Phrase Extraction
- Input: Document • Output: Key phrases in document
• Pre-processing step: extraction of possible phrases in document
- Ranking approach: rank the extracted phrases
Other IR applications of learning to rank
- Question Answering • Document Summarization Opinion Mining • Sentiment Analysis • Machine Translation • Sentence Parsing • ….
Our focus: ranking documents
ranking(based(on( documents relevance,(importance,( preference
D d , d , , d
1
ranking(of(documents query
d q
,
1 q d q ,
2
( q , d ) f d q , n q Traditional approach to “learning to rank” Probabilistic(Model
N d d d
2
1 q
) , | ( ~ ) , | ( ~ ) , | ( ~
2
2
1
1 n n
P d q r d P d q r d P d q r d
query documents ranking(of(documents
} , 1 { ) , | (
R P d q r Modern approach to learning to rank Learning(to(Rank
D d d d , ,
1 2 , 1 1 ,
1
2
N
) , ( d q f
1 1 , 1 m m m m m m d q f d d q f d
1
1 ,
1 2 , 1 1 ,
) , ( ) , ( 2 ,
1 m q
Ranking( System
Learning( System
, 2 , 1 ,
1 n d d d q m n m m m m d d d q
1
1
Learning to Rank: Training Process
d# # d
# 1 ,
1
& &
&
d d & &
1 ,
2 &
q q
1
$ $
1 $
& &
&
& &
&
d d
1 , n
1 % %
%
1.(Data(Labeling (rank) d d
# # # m ,
1
& & & d d m ,
2
& & & q q m m $
$ $ & & & & & & d d m
, n
m % % %
Learning to Rank: Training Process
d y# #
1 ,
1 1 ,
1
d
# 1 ,
1
& &
&
d y
1 ,
2 1 ,
2
1 ,
2 &
d & &
q q
1
$ $
1 $
& &
&
& &
&
d y d
1 , n 1 , n
1
1
1 , n
1 % %
%
1.(Data(Labeling 2.(Feature( (rank) Extraction d d y
# # # m ,
1 m , 1 m ,
1
& & & d d y m ,
2 m , 2 m ,
2
& & & q q m m $
$ $ & & & & & & d d y m m m
, n , n , n
m m m % % % Learning to Rank: Training Process features are functions of d y x y
# #
1 ,
1 1 ,
1 1 ,
1 1 ,
1
d
# 1 ,
1
query AND document & &
&
d y x y
1 ,
2 1 ,
2 1 ,
2 1 ,
2
1 ,
2 &
d & &
q q
1
$ $
1 $
& &
&
& &
&
d y x y d
1 , n 1 , n 1 , n 1 , n
1
1
1 , n
1
1 1 % %
%
1.(Data(Labeling 2.(Feature( 3.(Learning (rank) Extraction x
( ) f d d y x y
# # # m m m ,
1 m , 1 m , 1 , 1 ,
1
& & &
x y
d d y m ,2 m ,
2
m ,
2 m , 2 m ,
2
& & & q q m m $
$ $ & & & & & & x y d d y m , n m , n m m m
, n , n , n m m
m m m % % %
Learning to Rank: Testing Process
1.(Data(Labeling (rank)& & % & & $ #
& & % & & $ #
1
,
1 2 , 1 1 ,
1
1
m n m
m m m
d d d q
& & % & & $ #
1
m m m m d d d q
& & % & & $ #
Learning to Rank: Testing Process
1.(Data(Labeling (rank)1
m m m x x x
& & % & & $ #
m m n m n m m m m m m y d y d y d q
1
1
1 1 ,
1 2 , 1 1 ,
, 1 , 1 2 ,
1
& & % & & $ #
2.(Feature( Extraction & & % & & $ #
d d d q
m m m
m n m
1
1
1 2 , 1 1 ,
,
1
& & % & & $ #
Ground truth ranks Learning to Rank: Testing Process 1.(Data(Labeling (rank)
2.(Feature( Extraction 3.((Ranking( with ) (x f
1 2 , 1 1 ,
m m n m n m m m m m y x y x y x
1
1 1 ,
1 2 , 1 1 ,
, 1 , 1 2 ,
1
1
& & % & & $ #
m m n m n m m m m m m y d y d y d q
1
1
1 1 ,
, 1 , 1 2 ,
& & % & & $ #
1
1
& & % & & $ #
d d d q
m m m
m n m
1
1
1 2 , 1 1 ,
,
1
& & % & & $ #
Ground truth ranks Learning to Rank: Testing Process 1.(Data(Labeling (rank)
2.(Feature( Extraction
m m n m n m m m m m m y d y d y d q
& & % & & $ #
1
1
, 1 , 1 2 ,
1 2 , 1 1 ,
1 1 ,
1
1
& & % & & $ #
m m m
1
1
, 1 , 1 2 ,
1 2 , 1 1 ,
1 1 ,
1
m m n m n m m m m m y x y x y x
Ground truth ranks
d d d q
m n m
28 3.((Ranking( with ) (x f
1 1 , 1 1 ,
& & % & & $ #
) ( ) ( ) (
1
1
1 , 1 ,
1 ,
1 2 , 1 2 ,
1 2 , 1 1 ,
1 m m m
1
n m n m n m
m m m m m m
y x f x y x f x y x f x
4.((Evaluation Evaluation
& & % & & $ #
1
,
1 2 , 1 1 ,
1
Result
Issues in learning to rank
- Data Labeling • Feature Extraction • Evaluation Measure
• Learning Method (Model, Loss Function, Algorithm)
Doc(A Doc(B Doc(C
Data Labeling E.g.,(relevance(of(documents(w.r.t.(query
- Ground truth/supervision: “ranks” of set of documents/objects
Query
Data Labeling: Supervision takes different forms
- Multiple levels (e.g., relevant, partially relevant, irrelevant) ‣ Widely used in IR
- Ordered Pairs ‣ e.g. ordered pairs between documents (e.g. A>B, B>C) ‣ e.g. implicit relevance judgments derived from click-through data
- Fully ordered list (i.e. a permutation) of documents is given
Data Labeling: Implicit Relevance Judgement
ranking(of(documents(at(search(system Doc(A users(often(clicked(on(Doc(B Doc(B ordered(pair
B(>(A Doc(C Feature Extraction Query5document(feature Document(feature
Feature(Vectors .............
Doc(A BM25 PageRank Query .............
Doc(B PageRank BM25
.............
Doc(C PageRank BM25
35
Feature Extraction: Example Features
- BM25 (a classical ranking score function)
• Proximity/distance measures between query and document
- Whether query exactly occurs in document
- PageRank
- …
Evaluation Measures
• Allows us to quantify how good the predicted ranking is when compared with ground truth
ranking- Typical evaluation metrics care more about ranking top results correctly
- Popular Measures ‣ NDCG (Normalized Discounted Cumulative Gain) ‣ MAP (Mean Average Precision) ‣ MRR (Mean Reciprocal Rank) ‣ WTA (Winners Take All)
Kendall’s Tau
-! #
Comparing(agreement(between(two(permutations
P Q- 1
n n ( 1 )
2
1 n n
( 1 ) Number(of(pairs(
P Q
,
pairs(
- 1
Completely(disagree(
Kendall’s Tau
- ! # 0 "-1
Example
C((A((B
3
1
3 2 -
1
Learning to Rank Methods
- Three major approaches ‣ Pointwise approach ‣ Pairwise approach ‣ Listwise approach
Pointwise Approach
• Transforming ranking to regression, classification, or ordinal regression
- Query-document group structure is ignored
Pointwise Approach
Pointwise(Approach
Learning Regression Classification Ordinal2Regression
Feature(vector
xInput
Real(number Category Ordered category
Output
y x y (x ) sign( ( )) y thresold ( ( x )) f f f
Ranking(function
Model
x ( ) f
Loss2
Ordinal(regression( Regression(loss Classification loss Function loss
Pairwise Approach
• Transforming ranking to pairwise classification
- Query-document group structure is ignored
Pairwise Approach Pairwise(Approach
Learning
Ordered(feature(vector(pair
Input
( x , x ) i j
Classification on(order(of(vector(pair
Output
y sign( ( x x )) i , j f i j
Model
Ranking function( x
( ) Ranking function(( f
Loss2
Pairwise classification(loss( Function Pairwise Ranking Converting(document(list(to(document(pairs
Query
Doc(A Doc(B Doc(C
Query Doc(A Doc(B
Doc(C
Doc(A Doc(B Doc(C Rank(1 Rank(2 Rank(3
Transformed Pairwise Classification Problem
Transformed(Pairwise(Classification(Problem x x1 3 ( ; ) f x w
Positive(Examples x x
2
3 x x
1
2 x x
3
1 x x
2
1
- 1
Negative(Examples
51
x x
3
2
66
Listwise Approach
- Entire list of documents as input-instance
- Query-document group structure is used
- Straightforwardly represents learning to rank problem ‣ Many state of the art methods: ListNet, ListMLE, etc.
Listwise Approach
Learning &2Ranking
Feature(vectors
Input
n { } x x
1 i i
Permutation(on(feature(vectors
Output
n sort( { ( )} ) f x
1 i i Ranking(function
Model
( ) f x
Listwise loss(function Loss2Function
(ranking(evaluation(measure)