LexRank algorithm explained: a step-by-step tutorial with examples
LexRank is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. I used LexRank to summarize and extract the most relevant information from hundreds of articles, e.g. analysis on how to lose weight and how to succeed on a Medium as a writer. LexRank proved to work effectively — in my humble opinion.
At the first glance, the theory behind the algorithm seems complicated, but in reality, it’s very simple.
Let’s start with the official general description of the concept.
We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences.
Let’s divide LexRank algorithm into pieces so that step-by-step, with explained mathematical concepts, we get to the model output: