Intelligent Agents - Unit 7: Natural Language Processing (NLP)
Overview
As per the course website, "This week’s lecturecast focuses on one of the most rapidly growing technologies in
intelligent agents, which is Natural Language Processing (or NLP). The lecturecast covers some of the background
to NLP and why it is hard for machines to understand the way that we talk before diving into some examples of
methods and technologies that can be deployed."
My Reflection
Overall Reflection
This first week of the second half of the module moves on to the topic of Natural Language Processing (NLP). I
hope that this would be a transition towards approaching the modern-day agents, after spending the first six
weeks mainly focusing on symbolic agents, satisfying my eagerness and feeling of the inevitable coverage of
language models-based agents in a true MSc program module about Intelligeng Agents today. But frankly, after all
of my experience with this program, I doubt there will be such a transition. My doubt is doubled by my
disapointment from the fact that this unit's materials did not even mention large lanuage models (LLMs) and all
the advancements that they have been witnessing lately. Therefore, this unit -and module overall- would have
been perfect to study in 2016. Unfortunately, I am studying it in 2026.
The unit included a lecturecast, three papers to read and a requirement for a summary post for the second
collaborative learning discussion, started in Unit 5. The lecturecast focused on a
few historical approaches to make machines capable of dealing with natural language, why that is a difficult
task. It went through the difference between syntax, semantics and pragmatics of any language, and mentioned
some characteristics that make natural language tricky to be 'understood' by machines. These characteristics
include ambiguity, vagueness, polysemy (a word can have multiple word senses depending on the context) and how
sentences can be grammatically (or syntactically correct) but still have no meaning (no semantics or
pragmatics). I found Noam Chomsky's example of "colourless green ideas sheep furiously" interesting and I intend
to look more into it in the future. The lecturecast also touched on different approaches of machine natural
language processing approaches, like Word2Vec, Google News Corpus and maths of language, Hearst Patterns. It
also introduced the concept of hyponymy, for which there have been several approaches to automatically detect,
and which is also covered by one of the papers in the readings list.
Readings Reflection
The first reading paper, by Mikolov et al. (2013), which was a follow-up on a previous paper
introducing
the skip-gram model, the model that introduced the possibility of predicting the neighbours of a word. This
paper introduced three main enahncements on the model:
Negative Sampling (NEG): to deal with noise and irrelevant context that is cheaper and faster than the
originally proposed hierarchical softmax method.
Subsampling of Frequent Words: to deal with frequent words, like stop words, that are the most frequent and
yet provide the least information. This method makes sure to not deal with those words equally to the actual
words with meanings, and to drop them from the computation. As far as I understand, before this enhancement,
skip-gram model treated frequent words equally with actual words.
Phrase Vectors: dealing with phrases rather than single words only, enabling the detection of phrases, like
"Boston Globe" as a newspaper instead of treating it as a natural compositional combination of "Boston" and
"Globe".
These enhancements in the skip-gram model was the foundation upon which the word2vec package usable at scale,
extending as one of the roots to the evolution of NLP leading to Generative Pre-trained Transformers (GPT) and
their continuous development since 2017.
The second paper was the original paper in which Marti Hearst introduced his Hearst Patterns to detect
hyponyms. It was interesting for me to see how he was able to formulate logical rules to detect and extract
hyponyms in natural language, and I was able to see this must have required a deep understanding of linguitics
and semantics in addition to computer science.
The third paper was a more recent one by Aqab and Tariq (2020) that introduced the use of Convolutional
Neural Networks (CNNs) in a workflow of pre-processing and post-display as an enhanced technique for English
handwriting recognition. Before the introduction of this approach by this paper, the previous approaches
included implementing Hidden Markov Model (HMM) statistical model, and other machine learning models like
Support Vector Machines (SVM), as well as a few attempts with Artitifical Neural Networks (ANNs). It was nice to
revisit CNNs, deep learning and machine learning in general for the first time in the program for the first time
since the Machine Learning Module. It was also worthy to notice for me that the
paper was based on an empirical study, comprising university students and professors, which I did not see very
frequently in similar computer science studies.
Reference List
Aqab, S. and Tariq, M.U. (2020) ‘Handwriting Recognition using Artificial Intelligence Neural Network and Image
Processing’, International Journal of Advanced Computer Science and Applications, 11(7). Available at:
https://doi.org/10.14569/IJACSA.2020.0110719.
Hearst, M. (1992) ‘Automatic Acquisition of Hyponyms from Large Text Corpora’, in COLING. The 14th
International Conference on Computational Linguistics. Available at: https://aclanthology.org/C92-2082.pdf
(Accessed: 13 March 2026).
Mikolov, T. et al. (2013) ‘Distributed Representations of Words and Phrases and Their Compositionality’,
arXiv (Cornell University) [Preprint]. Available at: https://doi.org/10.48550/arxiv.1310.4546.
Artefacts
Collaborative Discussion 2
In this unit, we were required to conclude the second discussion of the module with a 300-word summary post,
summarising our initial post and the peer responses we received. Noting that I received only one response, below
was my summary post:
Summary Post
Languages (ACLs) like KQML compared to method invocation techniques in programming languages such as Python or
Java. I used the analogy of making a request within the same organisation, where the requester knows exactly who
should carry it out and how, versus sending a request to an independent organisation, stating the intent and
leaving execution to them. The former illustrates method invocation, while the latter reflects how ACLs like
KQML operate.
With this analogy, I wanted to show that KQML focuses on the pragmatics rather than the syntax or semantics of
code, separating the message from its content. This enables collaboration between diverse agents in distributed
systems (Finin et al., 1994). Conversely, method invocation in programming demands identical code structure,
syntax and semantics (Mayfield, Labrou and Finin, 2005).
However, I highlighted a challenge with ACLs and KQML: KQML-based agents must share the same or interoperable
ontologies. If not, performatives, which is the foundation of KQML messages, can be interpreted differently by
each agent.
Ross Bulat’s peer response addressed this issue, suggesting several solutions for ontological alignment among
KQML-based agents. These include explicitly documenting and validating agents’ ontological commitments before
deployment, adopting common ontology standards like Web Ontology Language (OWL) or Resource Description
Framework (RDF) to ensure consistent message interpretation, and introducing middleware agents to facilitate
communication and ontology translation, acting as semantic translators. Overall, the response stressed the
importance of good ontology governance by developers.
This discussion was valuable for understanding ACLs, KQML, and their differences from method invocation. I
appreciated how KQML addresses the heterogeneity of coding structures by focusing on simple syntax to transmit
any message content. It was also insightful to consider solutions when heterogeneity occurs at the ontological
level itself.
Reference List
Finin, T. et al. (1994) ‘KQML as an Agent Communication Language’, in Proceedings of the Third International
Conference on Information and Knowledge Management. Conference on Information and Knowledge Management,
New York, NY, USA: Association for Computing Machinery, pp. 456–463. Available at:
https://doi.org/10.1145/191246.191322.
Mayfield, J., Labrou, Y. and Finin, T. (2005) ‘Evaluation of KQML as an Agent Communication Language’, in
Intelligent Agents II: Agent Theories, Architectures, and Languages. IJCAI’95-ATAL Workshop, Berlin,
Heidelberg: Springer.
Artefacts
Additional Reading
To ease off a bit of my frustration regarding the relevant unrecency of the unit's content, I decided to learn
(or even revisit) a bit about LLMs and transformers. The following well-known video was one way to do so: