TOP 3 Artificial Intelligence advices on how to really lose weight based on knowledge from hundreds of articles (Natural Language Processing techniques with python code)

Image by Cats Coming from Pexels

It’s easy to get lost among thousands of articles when we’re looking for ways to lose weight. The problem is that most of the articles claim to have an ideal recipe for fast and straightforward fat reduction.

In this article I use Natural Language Processing Techniques (NLP) to compare, extract and analyse hundreds of articles to find a holy grail of a natural, safe and long-lasting weight lost.

Note: remember to always advice with your doctor before kickstart any rigorous diet.

There are three parts of the article:

  1. Part 1: summary for people interested only in findings
  2. Part 2: technical part with python code
  3. Part 3: theory behind algorithm

OK, but what’s Natural Language Processing?

In short, NLP (a subfield of AI) is about algorithms that allow to understand human language by computers. For example NLP is used to:

  • translate foreign languages
  • help with text searching
  • create digital assistants (like Siri or Alexa)
  • spell check
  • filter spam messages
  • analyse text (semantics, topics, duplicates)

Part 1

Let’s just go to summary with AI findings

Note (for a curious reader): the success and behaviour of AI depends on data. In general, the more and better quality data is, the better AI.

Here is a list of top 3 things our AI advises to be able to have a six pack ;)

Image by Dani Alejandro from Pexels

Number 1. Top 1 advice

Getting regular physical activity

Number 2. Top 2 advice

Cut back on refined carbs

Number 3. Top 3 that is not a straight advice, but actually tries to engage our minds for a change by asking questions to ourselves.

Am I willing to change activity habits?
Am I willing to change eating habits?

Extras. Below “advice” was very common in the 8- 32 positions, that’s why I put it here as extras.

intermittent fasting

Part 2

Technical part of the article with python code

Data

I collected 304 articles found via duckduckgo.com search engine by looking for titles like: “ideal diet to lose weight”, “how to lose weight” or like “natural fat reduction diet”. (here, here and here are examples of articles I’ve used).

Algorithm

For the algorithm I used LexRank, which is available here. Look into the Part 3 to find out more on the theory behind it.

Let’s start tutorial code:

Loading of data: Python looks for every file with the file extension .txt and append them into one list variable documents = []

DATA_FOLDER = 'data'

documents = []
data_path = Path(os.path.join(os.getcwd(), DATA_FOLDER))

for file_path in data_path.files('*.txt'):
with file_path.open(mode='rt', encoding='utf-8', errors="ignore") as fp:
documents.append(fp.readlines())

Next part is to loop over sentences in every article in the documents list. This makes sure that we will have a proper structures off sentences like:

all_example = [‘sentence one’, ‘sentence two’ … ‘sentence xxx’]

all = []
for doc in documents:
for sentence in doc:
all.append(sentence)

At the end we are feeding the algorithm and print the results:

lxr = LexRank(documents, stopwords=STOPWORDS['en'])

summary = lxr.get_summary(all, summary_size=15, threshold=.1)
print(summary)

If you are interested in more of NLP techniques check out this LINK where I check on how AI understand Steve Jobs legendary speeches.

Part 3

Theory behind the algorithm (in short, this will be a separate article)

The algorithm used in the code is an unsupervised graph based approach. The main goal is to summarize a text. Here is a very nicely detailed theory.

In simple words, it calculates the importance of sentences and picks up the most critical ones. Scoring is based on the concept of eigenvector centrality in a graph representation of sentences.

Where sentences x are present at the vertices and wx are weights represented on the edges.

A sentence score is based on the number of words, that are most common within a given sentence. In simple words, sentences with the highest score are those with the words that are most frequent.

Each sentence is represented as a bag of words vector in the form:

example_word = [0, 0, 0, 1, 1, ..., 0, 1, 1]

where 1 means a word is present and zero means the opposite.

Finally, calculation of centrality is represented by the largest eigenvalue vector of a matrix — more details soon if anyone still read it ;)

All the best and have a nice day!

Appendix 1. Complete python code

from lexrank import STOPWORDS, LexRank
import os
from path import Path

DATA_FOLDER = 'data'

documents = []
data_path = Path(os.path.join(os.getcwd(), DATA_FOLDER))

for file_path in data_path.files('*.txt'):
with file_path.open(mode='rt', encoding='utf-8', errors="ignore") as fp:
documents.append(fp.readlines())

all = []
for doc in documents:
for sentence in doc:
all.append(sentence)

lxr = LexRank(documents, stopwords=STOPWORDS['en'])

summary = lxr.get_summary(all, summary_size=15, threshold=.1)
print(summary)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store