NATURAL LANGUAGE PROCESSING [ Lab Programs ]

Aim:

☛ Write a Python Program to perform following tasks on text.
a) Tokenization
b) Stop word Removal.

Solution :

DESCRIPTION:

Tokenization :
Tokenization is the process of breaking a text into smaller units called tokens, usually words or terms. It helps in analyzing text by separating meaningful elements from a sentence. Tokenization is a basic and essential step in text preprocessing for natural language processing. It simplifies further text analysis tasks.

Stop Word Removal :
Stop word removal is the process of eliminating commonly used words such as is, the, and, in that carry little meaningful information. These words are removed to reduce text size and improve processing efficiency. It helps focus on important words that contribute to the actual meaning of the text.

PROGRAM: ( tokenization_stopword_removal.py )

 
# To read text from user
text = input("Enter the text: ")

# Convert text to lowercase
text = text.lower()

# Tokenization
tokens = clean_text.split()

print("\nTokens after Tokenization:")
print(tokens)

# Stop word removal

# List of stop words
stop_words = [
    "is", "am", "are", "was", "were", "be",
	"a", "an", "the", "and", "or", "but", "if", "in", "on",
    "at", "to", "for", "with", "by", "of", "as", "from"
]

filtered_tokens = []
for word in tokens:
    if word not in stop_words:
        filtered_tokens.append(word)

print("\nTokens after Stop Word Removal:")
print(filtered_tokens)

OUTPUT:

 
$ python nlpw1.py
Enter the text: NLP is a powerful tool to work with text data

Tokens after Tokenization:
['nlp', 'is', 'a', 'powerful', 'tool', 'to', 'work', 'with', 'text', 'data']

Tokens after Stop Word Removal:
['nlp', 'powerful', 'tool', 'work', 'text', 'data']

Menu

NATURAL LANGUAGE PROCESSING [ Lab Programs ]

Aim:

☛ Write a Python Program to perform following tasks on text.
a) Tokenization
b) Stop word Removal.

Solution :

DESCRIPTION:

PROGRAM: ( tokenization_stopword_removal.py )

OUTPUT:

Related Content :

Menu

NATURAL LANGUAGE PROCESSING [ Lab Programs ]

Aim:

☛ Write a Python Program to perform following tasks on text. a) Tokenization b) Stop word Removal.

Solution :

DESCRIPTION:

PROGRAM: ( tokenization_stopword_removal.py )

OUTPUT:

Related Content :

☛ Write a Python Program to perform following tasks on text.
a) Tokenization
b) Stop word Removal.