Intelligent Agents - Unit 8: Understanding Natural Language Processing (NLP)

Overview

As per the course website, "This week’s seminar aims to support your understanding of the underpinning theories and concepts in NLP. This week provides an opportunity to investigate a demonstration of Word2Vec in action with a simple worked example, using Python. This is followed by a formative activity to reinforce the way various parse tree structures are created."

My Reflection

Overall Reflection

This week extended the discussion of the previous week on Natural Language Processing (NLP), with a seminar, readings and a formative activity.

The seminar covered the fundamental NLP terminologies, most of which I knew or was familiar with from before. Still, I found a few things to refresh my mind or to learn. For example, I was familiar with NLTK and Spacy Python libraries, but I did not no that Transformers is a third library that can belong to the same list. I had also forgotten the difference between Stemming and Lemmatisation, and it was good to see Word Embeddings in this context again, enforcing what I learnt about it from the extended readings that I did last week.

Meanwhile, the readings and the formative activity was focused on parse trees - the hierarchical tree structure that represents the syntactic structure of a piece of text. The only required reading was an article on the topic from Towards Data Science, but the unit invited the students to do self-directed readings on the topic, so I did my research and came across a few useful and interesting resources, like lecture notes from Princeton University on constituency parsing, which is a form of parsing that focuses on the syntactic structure based on context-free grammar rather than on the dependency between the components of the sentence. Also, I came across a website and a GitHub repo that aims to track all the advancements in the domain of NLP. Additionally, I came across Stanza, the Python library for NLP developed by researchers at Stanford University, which I used to complete my formative activity.

Artefacts

Formative Activity

The format activity was to create constituency-based parsing trees for three sentences: "The government raised interest rates.", "The internet gives everyone a voice." and "The man saw the dog with the telescope."

I used Stanza to do the constituency parsing, converted Stanza's tree objects to NLTK tree objects to obtain ASCII tree structures, and used the svgling library to create those trees as SVGs as well.

Full practice code notebook can be seen here. The notebook is published as part of a repository that I just initiated with the intention that I will use it to further practice and learn NLP in the future.

ASCII Parse Trees
        

>>> The government raised interest rates.

ROOT | S _______________|________________________ | VP | | _______|______ | NP | NP | ___|______ | ______|____ | DT NN VBD NN NNS . | | | | | | The government raised interest rates .
        

>>> The internet gives everyone a voice.

ROOT | S ____________________|____________________ | VP | | ______|__________ | NP | NP NP | ___|_____ | | ___|____ | DT NN VBZ NN DT NN . | | | | | | | The internet gives everyone a voice .
        

>>> The man saw the dog with the telescope.

ROOT | S ___________________|___________________________ | VP | | ___________|_________ | | | | PP | | | | ____|___ | NP | NP | NP | ___|___ | ___|___ | ___|______ | DT NN VBD DT NN IN DT NN . | | | | | | | | | The man saw the dog with the telescope .
Notes and Interpretation

As a note, the final sentence is an example of structural ambiguity, as the propositional phrase can be attached to either the verb -'saw'- or the noun -'the dog'. However, Stanza gives the output as the most probable structure parsing, attaching the propositional phrase under the branch of the verb -"saw"-.

To make sure I understand the trees and the meanings of the tags clearly, I decided to note down here the meaning of each tag used in the trees above:

Tag Meaning
S Simple declarative clause.
NP Noun phrase (e.g. the tall man, the internet)
VP Verb phrase (e.g. saw the dog, gives everyone a voice)
PP Prepositional phrase, a phrase that consists of a preposition and the object of the preposition (e.g. "with": preposition, "the telescope": the object of the preposition)
DT Determiner (e.g. the, this, that, a, an... etc.)
NN Noun, singular (e.g. man, dog, information)
NNS Noun, plural (e.g. men, dogs)
VBD Verb, past tense (e.g. gave, saw)
VBZ Verb, 3rd-person singular present (e.g. 'she gives', 'he sees')
IN Proposition or subordinating conjunction (e.g. in, with , on)