Intelligent Agents

Intelligent Agents - Unit 7: Natural Language Processing (NLP)

Overview

As per the course website, "This week’s lecturecast focuses on one of the most rapidly growing technologies in intelligent agents, which is Natural Language Processing (or NLP). The lecturecast covers some of the background to NLP and why it is hard for machines to understand the way that we talk before diving into some examples of methods and technologies that can be deployed."

My Reflection

Overall Reflection

This first week of the second half of the module moves on to the topic of Natural Language Processing (NLP). I hope that this would be a transition towards approaching the modern-day agents, after spending the first six weeks mainly focusing on symbolic agents, satisfying my eagerness and feeling of the inevitable coverage of language models-based agents in a true MSc program module about Intelligeng Agents today. But frankly, after all of my experience with this program, I doubt there will be such a transition. My doubt is doubled by my disapointment from the fact that this unit's materials did not even mention large lanuage models (LLMs) and all the advancements that they have been witnessing lately. Therefore, this unit -and module overall- would have been perfect to study in 2016. Unfortunately, I am studying it in 2026.

The unit included a lecturecast, three papers to read and a requirement for a summary post for the second collaborative learning discussion, started in Unit 5. The lecturecast focused on a few historical approaches to make machines capable of dealing with natural language, why that is a difficult task. It went through the difference between syntax, semantics and pragmatics of any language, and mentioned some characteristics that make natural language tricky to be 'understood' by machines. These characteristics include ambiguity, vagueness, polysemy (a word can have multiple word senses depending on the context) and how sentences can be grammatically (or syntactically correct) but still have no meaning (no semantics or pragmatics). I found Noam Chomsky's example of "colourless green ideas sheep furiously" interesting and I intend to look more into it in the future. The lecturecast also touched on different approaches of machine natural language processing approaches, like Word2Vec, Google News Corpus and maths of language, Hearst Patterns. It also introduced the concept of hyponymy, for which there have been several approaches to automatically detect, and which is also covered by one of the papers in the readings list.

Readings Reflection

The first reading paper, by Mikolov et al. (2013), which was a follow-up on a previous paper introducing the skip-gram model, the model that introduced the possibility of predicting the neighbours of a word. This paper introduced three main enahncements on the model:

Negative Sampling (NEG): to deal with noise and irrelevant context that is cheaper and faster than the originally proposed hierarchical softmax method.
Subsampling of Frequent Words: to deal with frequent words, like stop words, that are the most frequent and yet provide the least information. This method makes sure to not deal with those words equally to the actual words with meanings, and to drop them from the computation. As far as I understand, before this enhancement, skip-gram model treated frequent words equally with actual words.
Phrase Vectors: dealing with phrases rather than single words only, enabling the detection of phrases, like "Boston Globe" as a newspaper instead of treating it as a natural compositional combination of "Boston" and "Globe".

These enhancements in the skip-gram model was the foundation upon which the word2vec package usable at scale, extending as one of the roots to the evolution of NLP leading to Generative Pre-trained Transformers (GPT) and their continuous development since 2017.

The second paper was the original paper in which Marti Hearst introduced his Hearst Patterns to detect hyponyms. It was interesting for me to see how he was able to formulate logical rules to detect and extract hyponyms in natural language, and I was able to see this must have required a deep understanding of linguitics and semantics in addition to computer science.

The third paper was a more recent one by Aqab and Tariq (2020) that introduced the use of Convolutional Neural Networks (CNNs) in a workflow of pre-processing and post-display as an enhanced technique for English handwriting recognition. Before the introduction of this approach by this paper, the previous approaches included implementing Hidden Markov Model (HMM) statistical model, and other machine learning models like Support Vector Machines (SVM), as well as a few attempts with Artitifical Neural Networks (ANNs). It was nice to revisit CNNs, deep learning and machine learning in general for the first time in the program for the first time since the Machine Learning Module. It was also worthy to notice for me that the paper was based on an empirical study, comprising university students and professors, which I did not see very frequently in similar computer science studies.

Reference List

Aqab, S. and Tariq, M.U. (2020) ‘Handwriting Recognition using Artificial Intelligence Neural Network and Image Processing’, International Journal of Advanced Computer Science and Applications, 11(7). Available at: https://doi.org/10.14569/IJACSA.2020.0110719.

Hearst, M. (1992) ‘Automatic Acquisition of Hyponyms from Large Text Corpora’, in COLING. The 14th International Conference on Computational Linguistics. Available at: https://aclanthology.org/C92-2082.pdf (Accessed: 13 March 2026).

Mikolov, T. et al. (2013) ‘Distributed Representations of Words and Phrases and Their Compositionality’, arXiv (Cornell University) [Preprint]. Available at: https://doi.org/10.48550/arxiv.1310.4546.

Artefacts

Collaborative Discussion 2

In this unit, we were required to conclude the second discussion of the module with a 300-word summary post, summarising our initial post and the peer responses we received. Noting that I received only one response, below was my summary post:

Summary Post

Languages (ACLs) like KQML compared to method invocation techniques in programming languages such as Python or Java. I used the analogy of making a request within the same organisation, where the requester knows exactly who should carry it out and how, versus sending a request to an independent organisation, stating the intent and leaving execution to them. The former illustrates method invocation, while the latter reflects how ACLs like KQML operate.

With this analogy, I wanted to show that KQML focuses on the pragmatics rather than the syntax or semantics of code, separating the message from its content. This enables collaboration between diverse agents in distributed systems (Finin et al., 1994). Conversely, method invocation in programming demands identical code structure, syntax and semantics (Mayfield, Labrou and Finin, 2005).

However, I highlighted a challenge with ACLs and KQML: KQML-based agents must share the same or interoperable ontologies. If not, performatives, which is the foundation of KQML messages, can be interpreted differently by each agent.

Ross Bulat’s peer response addressed this issue, suggesting several solutions for ontological alignment among KQML-based agents. These include explicitly documenting and validating agents’ ontological commitments before deployment, adopting common ontology standards like Web Ontology Language (OWL) or Resource Description Framework (RDF) to ensure consistent message interpretation, and introducing middleware agents to facilitate communication and ontology translation, acting as semantic translators. Overall, the response stressed the importance of good ontology governance by developers.

This discussion was valuable for understanding ACLs, KQML, and their differences from method invocation. I appreciated how KQML addresses the heterogeneity of coding structures by focusing on simple syntax to transmit any message content. It was also insightful to consider solutions when heterogeneity occurs at the ontological level itself.

Reference List

Finin, T. et al. (1994) ‘KQML as an Agent Communication Language’, in Proceedings of the Third International Conference on Information and Knowledge Management. Conference on Information and Knowledge Management, New York, NY, USA: Association for Computing Machinery, pp. 456–463. Available at: https://doi.org/10.1145/191246.191322.

Mayfield, J., Labrou, Y. and Finin, T. (2005) ‘Evaluation of KQML as an Agent Communication Language’, in Intelligent Agents II: Agent Theories, Architectures, and Languages. IJCAI’95-ATAL Workshop, Berlin, Heidelberg: Springer.

Artefacts

Additional Reading

To ease off a bit of my frustration regarding the relevant unrecency of the unit's content, I decided to learn (or even revisit) a bit about LLMs and transformers. The following well-known video was one way to do so: