Menu

NATURAL LANGUAGE PROCESSING [ Lab Programs ]


Aim:

  Write a Python Program to perform following tasks on text.
a) Tokenization
b) Stop word Removal.

Solution :

DESCRIPTION:

Tokenization :
Tokenization is the process of breaking a text into smaller units called tokens, usually words or terms. It helps in analyzing text by separating meaningful elements from a sentence. Tokenization is a basic and essential step in text preprocessing for natural language processing. It simplifies further text analysis tasks.

Stop Word Removal :
Stop word removal is the process of eliminating commonly used words such as is, the, and, in that carry little meaningful information. These words are removed to reduce text size and improve processing efficiency. It helps focus on important words that contribute to the actual meaning of the text.

PROGRAM: ( tokenization_stopword_removal.py )

 
# To read text from user
text = input("Enter the text: ")

# Convert text to lowercase
text = text.lower()

# Tokenization
tokens = clean_text.split()

print("\nTokens after Tokenization:")
print(tokens)

# Stop word removal

# List of stop words
stop_words = [
    "is", "am", "are", "was", "were", "be",
	"a", "an", "the", "and", "or", "but", "if", "in", "on",
    "at", "to", "for", "with", "by", "of", "as", "from"
]

filtered_tokens = []
for word in tokens:
    if word not in stop_words:
        filtered_tokens.append(word)

print("\nTokens after Stop Word Removal:")
print(filtered_tokens)


OUTPUT:

 
$ python nlpw1.py
Enter the text: NLP is a powerful tool to work with text data

Tokens after Tokenization:
['nlp', 'is', 'a', 'powerful', 'tool', 'to', 'work', 'with', 'text', 'data']

Tokens after Stop Word Removal:
['nlp', 'powerful', 'tool', 'work', 'text', 'data']




Related Content :

1. Write a Python Program to perform following tasks on text.
a) Tokenization
b) Stop word Removal    View Solution


2. Write a Python program to implement Porter stemmer algorithm for stemming.    View Solution


3. Write Python Program for
a) Word Analysis
b) Word Generation.    View Solution


4. Create a Sample list for at least 5 words with ambiguous sense and Write a Python program to implement WSD.    View Solution


5. Install NLTK tool kit and perform stemming.    View Solution


6. Create Sample list of at least 10 words POS tagging and find the POS for any given word.    View Solution


7. Write a Python program to
a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing.    View Solution


8. Using NLTK package to convert audio file to text and text file to audio files.    View Solution