This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Jingjing Wang, School of Computing;
(2) Joshua Luo, The Westminster Schools;
(3) Grace Yang, South Windsor High School;
(4) Allen Hong, D.W. Daniel High School;
(5) Feng Luo, School of Computing.
Table of Links
V. DISCUSSION
In this section, we discuss the limitations of GPT model, the conclusion of our study and future work.
A. Limitations
From the experiment results, we can learn that the current GPT-3.5 model still has some limitations.
Lack of Deep Contextual Understanding: GPT model, despite its advanced capabilities, is still lacking a deep understanding of human nuances, societal norms, and cultural contexts, which are often critical to accurately interpreting and responding to subjective tasks. This limitation becomes particularly pronounced when the GPT model is faced with local idioms, colloquialisms, or culturally specific references.
Difficulty in Understanding Implicit Meaning: LLMs typically struggle with interpreting and generating content that contains implicit or hidden meanings, especially those requiring an understanding of human emotions, intentions, or sarcasm. This limitation becomes particularly apparent in tasks involving the interpretation of sarcasm or offensiveness in memes, as these tasks often require a nuanced understanding of cultural or social contexts.
Biases in Training Data: LLMs are trained on vast amounts of data from the internet, which may contain various forms of biases. The presence of biases inside the model might accidentally impact its responses in subjective tasks, leading to much less accurate outcomes.
B. Conclusion and Future Work
Our efforts to assess how well GPT could analyze sentiments in memes led us to some interesting and insightful findings.
On one hand, the model performed impressively when it came to classifying non-hateful memes and understanding the positive mood of a meme. The significant accuracy achieved in non-hateful meme classification, positive sentiment determination, and humor recognition, demonstrate the GPT’s ability to comprehend and interpret the content in a majority of memes correctly
However, the accuracy rate dropped a lot when encountered with detection of hateful, sarcasm and offensive content in memes. The lower accuracy rates seen in these categories shows the intricate nature of identifying concealed hateful or offensive content inside memes.
As GPT-4 released in recent days, it will be beneficial if we can integrate the latest model in our framework; moreover, as fine-tuning GPT-3.5-Turbo is available, we can fine tune our exist models to make some improvements in its ability to detect hateful and offensive content. We leave these as future work.