Module 4: Semantic Analysis

Questions:

What is Word Sense Disambiguation (WSD)? Explain the dictionary based approach to Word Sense Disambiguation. (10 marks)
Explain the Lesk algorithm for Word Sense Disambiguation. (10 marks)
Demonstrate lexical semantic analysis using an example. (10 marks)
Explain semantic analysis in Natural Language processing. (5 marks)
Explain the ambiguities associated at each level with example for Natural Language processing. (10 marks)

Module 5: Pragmatic & Discourse Processing

Questions:

Explain Discourse reference resolution in detail. (10 marks)
Describe in detail Centering Algorithm for reference resolution. (10 marks)
Illustrate the reference phenomena for solving the pronoun problem. (10 marks)
Explain Anaphora Resolution using Hobbs and Cantering Algorithm. (10 marks)
Explain three types of referents that complicate the reference resolution problem. (5 marks)

Module 6: Applications

Questions:

Demonstrate the working of machine translation systems. (10 marks)
Explain the Information retrieval system. (10 marks)
Explain information retrieval versus Information extraction systems. (10 marks)
Explain Machine Translation Approaches used in NLP. (5 marks)

Semantic Analysis in NLP – Module 4

Module 4: Semantic Analysis

1. Word Sense Disambiguation (WSD)

What is WSD?

• Word Sense Disambiguation is the process of identifying the correct meaning of a word in a given context when the word has multiple meanings.

Dictionary-based Approach to WSD:

• Uses dictionary definitions (glosses) to determine the correct sense of a word

• Steps involved:

1. Collect all possible definitions of the ambiguous word from a dictionary

2. Gather context words around the ambiguous word

3. Compare dictionary definitions with context words

4. Select the definition that has maximum overlap with context

Example: “bank”
– Context: “I will bank the money tomorrow”
– Possible meanings:
1. Financial institution
2. River bank
3. To rely on something
→ Based on context words “money,” the financial institution sense is selected

2. Lesk Algorithm for WSD

• An algorithm developed by Michael Lesk in 1986

• Works on the principle that words used together in text are related to each other

Steps of Lesk Algorithm:

1. Identify the word to be disambiguated

2. Get all definitions (senses) from the dictionary

3. For each sense:

– Create a bag of words from its definition

– Compare with context words

– Count overlapping words

4. Select the sense with maximum overlap

Example: “pine cone”
Pine definition 1: “type of evergreen tree”
Pine definition 2: “to long for something”
Context: “forest, tree, needle”
→ Definition 1 has more overlap with context words

3. Lexical Semantic Analysis Example

• Lexical semantics deals with the meaning of individual words and their relationships

Example Sentence: “The mouse clicked”
Lexical Analysis:
1. “mouse” has multiple meanings:
– Computer device
– Small rodent
2. “clicked” helps disambiguate:
– Related to computer operations
3. Therefore: “mouse” refers to computer device

4. Semantic Analysis in NLP

• Process of understanding the meaning of text by analyzing:

1. Word meanings

2. Sentence structure

3. Context

4. Relationships between words

• Key Components:

– Named Entity Recognition

– Semantic Role Labeling

– Sentiment Analysis

– Contextual Understanding

5. Ambiguities in NLP

Lexical Ambiguity:

Words with multiple meanings:
“I am going to bank”
(Financial institution or river bank?)

Syntactic Ambiguity:

Sentence structure confusion:
“I saw a man with a telescope”
(Who has the telescope – speaker or man?)

Semantic Ambiguity:

Meaning-based confusion:
“The chicken is ready to eat”
(Is the chicken going to eat or be eaten?)

Pragmatic Ambiguity:

Context-dependent confusion:
“It’s cold” could mean:
– Request to close window
– Comment about weather
– Food temperature

Pragmatic & Discourse Processing – Module 5

Module 5: Pragmatic & Discourse Processing

1. Discourse Reference Resolution

Definition:

• Process of determining what entities are being referred to by referring expressions in text

Key Components:

1. Identifying referring expressions (pronouns, names, descriptions)

2. Finding potential antecedents

3. Selecting the correct antecedent

Important Aspects:

• Context analysis

• Grammatical agreement

• Semantic compatibility

• Distance between reference and antecedent

Example:
“John bought a car. He drove it to work.”
– “He” refers to “John”
– “it” refers to “car”
Resolution based on context and gender/number agreement

2. Centering Algorithm for Reference Resolution

Basic Concepts:

• Forward-looking centers (Cf): Potential referents

• Backward-looking center (Cb): Main entity being talked about

• Preferred center (Cp): Most likely next topic

Algorithm Steps:

1. Identify all entities in current utterance

2. Rank entities based on grammatical role:

– Subject > Object > Others

3. Create forward-looking centers list (Cf)

4. Determine backward-looking center (Cb)

5. Update for next utterance

Example:
Text: “John went to the store. He bought some milk.”
Utterance 1:
– Cf = [John, store]
– Cp = John (subject)
Utterance 2:
– Cb = John (referenced by “He”)
– Cf = [John, milk]

3. Reference Phenomena for Pronoun Resolution

Types of Reference:

1. Anaphoric Reference

• Refers back to previously mentioned entity

2. Cataphoric Reference

• Refers forward to entity mentioned later

3. Zero Reference

• Implied but not explicitly stated

Examples:
Anaphoric: “Mary had a book. She read it.”
Cataphoric: “Before she arrived, Mary called.”
Zero: “[You] Please close the door.”

Resolution Factors:

• Syntactic constraints

• Semantic compatibility

• Discourse structure

• World knowledge

4. Anaphora Resolution: Hobbs and Centering Algorithms

Hobbs Algorithm:

1. Start at pronoun’s location

2. Search parse tree in specific order

3. Check for gender/number agreement

4. Apply binding constraints

Centering Algorithm for Anaphora:

• Tracks focus of attention

• Uses transition states:

– Continue: Same center

– Retain: Center changes but remains prominent

– Shift: New center introduced

Comparison:
Hobbs: Syntax-based, tree traversal
Centering: Discourse-based, focus tracking

5. Three Types of Referents Complicating Resolution

1. Inferential References:

• Requires world knowledge

“I went to a wedding. The bride was beautiful.”
(Bride not previously mentioned but inferred)

2. Non-referential Pronouns:

• Don’t refer to specific entities

“It is raining.”
“They say exercise is good.”

3. Split Antecedents:

• Pronoun refers to multiple entities

“John met Mary. They went to dinner.”
(“They” refers to both John and Mary)

NLP Applications – Module 6

Module 6: NLP Applications

1. Working of Machine Translation Systems

Components of MT System:

1. Input Processing

• Text preprocessing

• Tokenization

• Part-of-speech tagging

2. Analysis Phase

• Morphological analysis

• Syntactic analysis

• Semantic analysis

3. Transfer Phase

• Source to target language mapping

• Rule application

4. Generation Phase

• Target language synthesis

• Morphological generation

• Word ordering

Working Example:
Input: “The cat sits on the mat” (English to Spanish)
1. Analysis: [The(det) cat(noun) sits(verb) on(prep) the(det) mat(noun)]
2. Transfer: Mapping words and structure to Spanish
3. Generation: “El gato se sienta en la alfombra”

2. Information Retrieval System

Components:

1. Document Processing

• Indexing

• Term weighting

• Document representation

2. Query Processing

• Query formulation

• Query expansion

• Query optimization

3. Matching Process

• Similarity calculation

• Ranking algorithms

• Result presentation

Example Process:
1. User Query: “effects of climate change”
2. System Processing:
• Tokenization
• Stopword removal
• Term matching
3. Returns: Ranked list of relevant documents

3. Information Retrieval vs Information Extraction

Information Retrieval

• Finds relevant documents

• Returns whole documents

• Query-based search

• General information need

• Example: Google Search

Information Extraction

• Extracts specific facts

• Returns structured data

• Pattern matching

• Specific information need

• Example: Named Entity Recognition

Comparative Example:
Text: “Apple Inc. reported $100B revenue in 2023”
IR: Returns the entire document
IE: Extracts: {Company: Apple Inc., Revenue: $100B, Year: 2023}

4. Machine Translation Approaches

1. Rule-Based MT:

• Uses linguistic rules

• Dictionary-based translation

• Good for similar languages

2. Statistical MT:

• Uses statistical models

• Trained on parallel corpora

• Probability-based translation

3. Neural MT:

• Uses neural networks

• End-to-end learning

• Better context understanding

4. Hybrid Approaches:

• Combines multiple methods

• Uses best features of each

Comparison Example:
Input: “It’s raining cats and dogs”
Rule-Based: Literal translation
Statistical: Learns from parallel texts
Neural: Better idiom handling