The Porter Stemming Algorithm is a rule-based stemming technique used in text preprocessing to reduce words to their root or base form. It works by iteratively removing common suffixes such as -ing, -ed, -s, and -ies based on a set of linguistic rules. This helps in reducing vocabulary size and improves the efficiency of information retrieval and text mining systems.
def porter_stem(word):
word = word.lower()
# Step 1a
if word.endswith("sses"):
word = word[:-2]
elif word.endswith("ies"):
word = word[:-2]
elif word.endswith("ss"):
pass
elif word.endswith("s"):
word = word[:-1]
# Step 1b
if word.endswith("ing"):
word = word[:-3]
elif word.endswith("ed"):
word = word[:-2]
# Step 1c
if word.endswith("y"):
word = word[:-1] + "i"
return word
# input words
words = ["horsess", "flies", "agreed", "running", "happy", "studies"]
for w in words:
print(w, "->", porter_stem(w))
$ python porter_stemmer_alg.py
horsess -> horsess
flies -> fli
agreed -> agre
running -> runn
happy -> happi
studies -> studi
1. Write a Python Program to perform following tasks on text.
a) Tokenization
b) Stop word Removal
View Solution
2. Write a Python program to implement Porter stemmer algorithm for stemming. View Solution
3. Write Python Program for
a) Word Analysis
b) Word Generation. View Solution
4. Create a Sample list for at least 5 words with ambiguous sense and Write a Python program to implement WSD. View Solution
5. Install NLTK tool kit and perform stemming. View Solution
6. Create Sample list of at least 10 words POS tagging and find the POS for any given word. View Solution
7. Write a Python program to
a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing. View Solution
8. Using NLTK package to convert audio file to text and text file to audio files. View Solution