Member-only story

LexRank algorithm explained: a step-by-step tutorial with examples

Maciej Zalwert
6 min readMay 5, 2021

--

Image by Nothing Ahead from Pexels

LexRank is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. I used LexRank to summarize and extract the most relevant information from hundreds of articles, e.g. analysis on how to lose weight and how to succeed on a Medium as a writer. LexRank proved to work effectively — in my humble opinion.

At the first glance, the theory behind the algorithm seems complicated, but in reality, it’s very simple.

Let’s start with the official general description of the concept.

We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences.

Let’s divide LexRank algorithm into pieces so that step-by-step, with explained mathematical concepts, we get to the model output:

  1. Input to the model
  2. Word embeddings
  3. Intra-sentence cosine similarity
  4. Adjacency matrix
  5. Connectivity matrix
  6. Eigenvector centrality
  7. Output of the model

Input to the model

The basic input to the model may be just an article or set of articles. It depends on whether your goal is to summarize one article or many articles.

LexRank can consume any text like:

  • “This is an example sentence of the article. This is the second sentence. This is the third sentence, that is the most important cause it says other sentences are just examples.”

Most probably when we utilize LexRank in the above example it would give only the last sentence as a summary of the text.

--

--

Maciej Zalwert
Maciej Zalwert

Written by Maciej Zalwert

Experienced in building data-intensive solutions for diverse industries

No responses yet

Write a response