Introduction to NLP with Traditional Language Model : N-grams and Hidden Markov Model (Part I)

Warren Chris Randhall LATA
9 min readJul 10, 2024

Introduction

Language models are fundamental to natural language processing (NLP). They predict the likelihood of word sequences and are crucial for various applications, including speech recognition, machine translation, and text generation. Prior to the rise of deep learning models, traditional language models such as n-grams and hidden Markov models were prevalent. This article will introduce “N-grams,” discussing their limitations, benefits, and implementation in Python.

1.N-grams

N-grams is a fundamental concept in NLP and a term you have surely heard of in you are interested in NLP. It is a contiguous sequence of N items extracted from a text. Those items can be word (most common), character and even syllable depending on the granularity desired. It helps capturing patterns and relationships within a sequence of words and has various application in NLP. The N as said earlier is just the number of word you want to take a look of to find pattern and to analyse word within a context.

Example

A “unigram” is a sequence of one word, thus a language model based solely on the frequency of that word.

--

--

Warren Chris Randhall LATA
Warren Chris Randhall LATA

Written by Warren Chris Randhall LATA

Data scientist | NLP | Computer vision

No responses yet