The role of cognitive computing in NLP

Orynbay, Laura; Bekmanova, Gulmira; Yergesh, Banu; Omarbekova, Assel; Sairanbekova, Ayaulym; Sharipbay, Altynbek

doi:10.3389/fcomp.2024.1486581

REVIEW article

Front. Comput. Sci., 10 January 2025

Sec. Networks and Communications

Volume 6 - 2024 | https://doi.org/10.3389/fcomp.2024.1486581

This article is part of the Research Topic Advancing Ethical and Explainable AI in Cognitive Computing: Future Directions View all articles

The role of cognitive computing in NLP

$\r\nLaura Orynbay$ Laura Orynbay¹^*

Gulmira Bekmanova²

Banu Yergesh³

Assel Omarbekova³

Ayaulym Sairanbekova¹

Altynbek Sharipbay¹

¹Department of Artificial Intelligence Technologies, L.N. Gumilyov Eurasian National University, Astana, Kazakhstan
²L.N. Gumilyov Eurasian National University, Astana, Kazakhstan
³Department of Digital Development, L.N. Gumilyov Eurasian National University, Astana, Kazakhstan

The integration of Cognitive Computing and Natural Language Processing (NLP) represents a revolutionary development of Artificial Intelligence, allowing the creation of systems capable of learning, reasoning, and communicating with people in a natural and meaningful way. This article explores the convergence of these technologies and highlights how they combine to form intelligent systems capable of understanding and interpreting human language. A comprehensive taxonomy of Cognitive Computing technologies in NLP is presented, which classifies key tools and techniques that improve machine understanding and language generation. The article also explores practical applications, in particular, to improve accessibility for people with visual impairments using advanced Artificial Intelligence-based tools, as well as to analyze political discourse on social networks, where these technologies provide insight into public sentiment and information dynamics. Despite significant achievements, several challenges persist. Ethical concerns, including biases in AI, data privacy and societal impact, are critical to address for responsible deployment. Language complexity poses interpretative challenges, while biases in multimodal data and real-world deployment difficulties impact model performance and scalability. Future directions are proposed to overcome these challenges through improved robustness, generalization, and explainability in models, as well as enhanced data privacy and scalable, resource-efficient deployment. This article thus provides a comprehensive view of current advancements and outlines a roadmap for a responsible and inclusive future of Cognitive Computing and NLP.

1 Introduction

Rapid advances in Artificial Intelligence (AI) have led to the development of Cognitive Computing systems that mimic human thought processes and perform tasks that previously required human intelligence. Prominent among these advances is the fusion of Cognitive Computing and Natural Language Processing (NLP). Cognitive Computing aims to create systems that can learn, reason and interact with humans in a natural and meaningful way, while NLP focuses on enabling machines to understand and generate human language. The merging of these two fields has the potential to revolutionize the way machines interact with humans, offering more intuitive and intelligent systems capable of understanding, interpreting and responding to human language with unprecedented accuracy and context-awareness.

This integration is not just a technical achievement, but a paradigm shift in human-computer interaction. Cognitive Computing, with its ability to process and analyse vast amounts of unstructured data, combined with the linguistic capabilities of NLP, allows machines to engage in complex communications, provide information and support real-time decision-making. The applications of Cognitive Computing and NLP are vast and diverse, ranging from virtual assistants performing everyday tasks to healthcare systems analyzing medical histories and assisting in diagnosis.

However, along with these advances come challenges. The complexity of human language, ethical considerations and the need to integrate multimodal data present major hurdles to overcome. In addition, ensuring that these systems operate transparently and ethically, especially in high-stakes environments, is key to their widespread adoption.

This article explores these topics by first providing an overview of how Cognitive Computing and NLP are integrated to create more intelligent systems. The article then provides a comprehensive categorization of the technologies within these fields and their categorization into key areas of human language understanding and interpretation. The discussion also extends to real-world applications, particularly how these technologies improve accessibility for the visually impaired and their role in analyzing political discourse on social media.

Finally, the article examines the current limitations and ethical challenges associated with these technologies, discusses current research aimed at overcoming these challenges, and outlines possible future developments. By considering these diverse aspects, this article aims to provide a thorough understanding of the current state and future potential of Cognitive Computing and NLP, highlighting both the opportunities and obstacles that await this rapidly evolving field.

2 Integrating cognitive computing with natural language processing

Cognitive Computing refers to advanced systems that learn on a large scale, reason purposefully, and communicate organically with humans. Unlike traditional computing systems, cognitive systems can learn and change based on their interactions with humans and the environment. They are meant to manage complex, unstructured data and can generate hypotheses, rational arguments, and suggestions (Kelly, 2015).

Cognitive Computing uses a variety of strategies to imitate human cognition in the computer environment. Machine Learning (ML), Artificial Intelligence (AI), Natural Language Processing (NLP), Exploratory Data Analytics (EDA), and Predictive Analytics are all valuable techniques for emulating human cognitive functions such as comprehension, memory, learning, and perception. Additionally, technologies such as the Internet of Things (IoT), Human-Computer Interaction (HCI), and Big Data help to make cognitive systems reactive (Aghav-Palwe and Gunjal, 2021). More organic and interactive communication between people and machines is made possible by NLP, which enables Cognitive Computing systems to comprehend and interpret human language.

The convergence of Cognitive Computing and NLP is a significant development in the field of AI that aims to create smarter systems capable of understanding, interpreting and interacting with human language in a natural and meaningful way. This intersection leverages the strengths of Cognitive Computing, which focuses on the simulation of human thought processes in a computer-based model, and NLP, which deals with the interaction between computers and humans through natural language.

Cognitive Computing uses a combination of ML, reasoning, NLP, and HCI to emulate human cognitive functions. Hurwitz et al. (2015) describe how this technology has been developed to process and analyse vast amounts of data, learn from it and provide insights and solutions that are contextually relevant and similar to humans in their reasoning.

According to Jurafsky (2000), NLP classification can be either by the type of linguistic information being processed (e.g., lexical, syntactic, semantic) or by a specific application or purpose (e.g., translation, generalization, Sentiment Analysis).

For example, in customer service, Cognitive Computing combined with NLP can form the basis of virtual assistants and chatbots that understand customer queries, provide accurate answers and learn from each interaction to improve the quality of work over time (Adamopoulou and Moussiades, 2020b). In healthcare, such systems can help diagnose diseases by understanding and analyzing patient symptoms described in natural language (Aung et al., 2021).

Combining these technologies not only improves the understanding and interaction of machines but also opens the way for advancements in various fields such as education, finance and social network (Ghahramani, 2015). With machines better understanding and generating human language, we are getting closer to creating truly intelligent systems that can easily integrate into our daily lives, supporting and empowering humans in unprecedented ways.

3 A taxonomy of cognitive computing technologies in NLP

In Natural Language Processing (NLP), Cognitive Computing refers to various tools and techniques designed to improve the understanding, interpretation and generation of human language by a machine. In Figure 1, we present the overall classification of the technologies that were reviewed. Cognitive Computing technologies in NLP can be categorized into four main areas, each focusing on different aspects of understanding, interpreting, and generating human language.

Figure 1

Figure 1. A taxonomy of cognitive computing technologies in NLP.

The first category is knowledge representation and reasoning, which is about structuring and interpreting knowledge:

1. Semantic Networks and Ontologies are graphs and structures that specify the connections between concepts.

(a) According to Keet (2018), Ontologies are structures that specify the connections between ideas in a domain and promote the interoperability of systems and a common understanding. In Hao et al. (2021), the authors present a method for mapping medical data to standard Ontologies using a hybrid graph neural network approach. This involves creating and enriching an Ontology based on a medical database and aligning it with existing standard Ontologies, such as SNOMED CT.

For example, the MEDTO framework can extract medical concepts from databases and organize them into a structured Ontology, which represents various medical entities and their relationships. This Ontology can include information about diseases, symptoms, treatments, and medications, among other things. The hybrid graph neural network then matches the newly created Ontology to existing Ontologies, like SNOMED CT, to ensure consistency in how medical data is interpreted across different systems. This Ontology matching process improves the integration of medical data from different sources, leading to more accurate analysis, better decision-making, and improved patient outcomes. The MEDTO framework leverages graph representation learning to capture the hierarchical relationships between medical concepts, ensuring the integrity and accuracy of the Ontology.

(b) Semantic Networks are Graph-based conceptual and relational representations that facilitate the understanding of complicated linguistic connections (Sowa et al., 1992). In their study, Zheng et al. (2021) developed a method for representing sentences based on a multi-layered Semantic Network. This approach involves multiple attention mechanisms to extract semantic information from sentences at various levels. By using relative location masks, the researchers could incorporate word order information into the model and reduce uncertainty caused by variations in word order.

The method was tested on tasks such as text summarization and emotion classification. The results showed that it outperformed traditional models in terms of accuracy and completeness of sentence representation. In the multilevel Semantic Network used in the method, different layers represent different semantic features of sentences, such as word meanings and syntactic structures. The multitasking attention mechanism ensures that all possible semantic connections are explored in the sentence. The ability of this method to capture the complex interactions between words in a sentence highlights the usefulness of Semantic Networks for NLP tasks, where understanding nuanced meanings and relationships is essential.

For instance, consider the sentence “A person is training his horse for a competition.” A Semantic Network model, such as that proposed by Zheng et al. (2021), would capture relationships like “person-training,” “training-horse,” and “horse-competition,” representing them as a multilevel structure. This enables the system to more accurately understand the meaning of the sentence than simpler models that may not capture these nuances.

2. Probabilistic reasoning in Cognitive Computing, particularly in NLP, is essential for dealing with uncertain and incomplete information.

(a) A framework called Markov Logic Networks (MLN) manages linguistic comprehension uncertainty by combining probabilistic graphical models with first-order logic to enable flexible reasoning under uncertainty (Richardson and Domingos, 2006). The authors of the paper by Li et al. (2018) propose a method that leverages MLNs to predict judicial outcomes in divorce cases. The approach involves extracting legal factors from judicial documents and using these factors to build an MLN. The MLN then learns relationships between these factors and judicial outcomes, allowing for accurate predictions based on the facts of a case.

For instance, in a divorce case, the legal factors might include variables such as “mutual affection status,” “number of children,” and “marriage duration.” The MLN uses these variables to form a network of probabilistic relationships. The network is composed of first-order logic formulas, such as:

Formula: (lovestatus(x, 0) ⇒ judgement(x, 1))

Interpretation: If the mutual affection status (lovestatus) is broken (represented by 0), then there is a high probability that the judgement (judgement) will be a divorce (represented by 1).

The MLN is trained using historical data from judicial cases, which includes both the fact descriptions and the outcomes. This training allows the MLN to learn the weights associated with each logic formula, thus enabling it to predict the outcome of new cases based on the input legal factors.

The interpretability of MLNs is a significant advantage over other Machine Learning (ML) models. Each prediction made by the MLN can be traced back to specific logic rules, providing insights into how the decision was reached. This is particularly useful in legal contexts, where transparency and reasoning are crucial.

(b) Bayesian networks provide a structured approach to representing and reasoning with uncertain knowledge based on the principles of probability theory. They are graphical models that represent a set of variables and their dependencies using a directed acyclic graph. Each node in the graph corresponds to a variable, and edges between nodes represent the conditional relationships between those variables. Their application in industrial processes, such as predicting the silicon content in hot metal, demonstrates their value in real-world scenarios where accurate predictions are essential for optimizing outcomes and maintaining quality (Cardoso and di Felice, 2021).

The second category is advanced Machine Learning (ML) and Deep Learning (DL), which provides the ability to learn from vast amounts of data and improve over time:

1. DL techniques include advanced models such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Transformers, which are characterized by the understanding of context and the generation of human-like text.

(a) CNNs are widely used in Sentiment Analysis and other classification tasks. They work by identifying local characteristics and patterns in the text (Gandhi et al., 2021). In the context of Sentiment Analysis, CNNs are employed to classify the sentiment polarity of textual data, such as tweets or product reviews. Dang et al. (2020) outlines a CNN architecture designed for sentiment polarity classification, which includes multiple convolutional layers, pooling layers, and fully connected layers.

(b) Long-term short-term memory (LSTM) and Gated Recurrent Unit (GRU) are two examples of RNNs that are ideal for sequential data processing tasks such as machine translation and language modeling (Huang et al., 2015). Siddique et al. (2021) focuses on building a machine translation system from English to Bangla using an RNN-based model. The RNN architecture incorporates both LSTM and GRU layers to process sequential data, which is crucial for handling the dependencies and contextual relationships within language translation tasks.

The study found that while both LSTM and GRU layers performed well, GRU had a lower mean error rate compared to LSTM in the translation task. Specifically, GRU achieved a mean error of 0.508, which was more efficient than the 0.602 mean error of LSTM.

The choice between LSTM and GRU layers was determined based on the trade-off between computational efficiency and the ability to capture long-term dependencies. GRU was chosen for the final model due to its faster processing time and adequate performance.

(c) Transformers refer to a specific type of neural network architecture introduced by Vaswani (2017). The transformer model is designed to handle sequential data, such as text, more efficiently than previous models like RNNs or LSTM networks. The key innovation of transformers is the self-attention mechanism, which allows the model to weigh the importance of different words in a sentence relative to each other, regardless of their position in the sequence.

2. Reinforcement learning enables systems to learn optimal actions through trial and error, guided by rewards. Reinforcement learning is useful in NLP for tasks like dialogue management, where the system needs to learn how to produce relevant and suitable responses in a given environment (Den Hengst et al., 2019).

3. Large Language models (LLM) are models built on top of the transformer architecture. LLMs are pre-trained on vast amounts of text data and have billions of parameters, enabling them to perform a wide range of language tasks with high accuracy. Recent work by Zhao et al. (2023) provides an extensive review of LLMs, examining their technical evolution, emerging abilities, and scaling laws that improve their performance on increasingly complex NLP tasks. The study emphasizes how LLMs have become a fundamental part of NLP, with abilities ranging from language generation to context-aware adaptation and multimodal integration.

(a) Natural-sounding speech synthesis depends on the Text-to-Speech (TTS) system's ability to comprehend and replicate human speech nuances, like changing tones and pauses, made possible by BERT's contextualized embeddings (Granero-Moya et al., 2023).

(b) By leveraging GPT's text generation capabilities, the TTS system can produce speech that is not only fluent but also dynamically modulated to reflect the intended emotions and emphasis of the generated text. This integration helps in creating more engaging and human-like interactions in applications such as virtual assistants and conversational agents (Granero-Moya et al., 2023).

The third category is Natural Language Understanding (NLU) and Generation (NLG), with a focus on understanding and generating human language:

1. NLU involves the interpretation and comprehension of human language by machines, enabling systems to extract meaning from text or speech in a way that is both contextually accurate and semantically rich.

(a) According to Zhang (2023), Syntactic Parsing is the process of examining a sentence's grammatical structure to determine its syntactic functions. An example of Syntactic Parsing is provided by Mitterer et al. (2021), where segmental information, specifically the presence of an epenthetic glottal stop, influences syntactic decisions. The glottal stop serves a dual function in Maltese as both a phoneme for lexical contrast and a non-contrastive marker for prosodic junctures. The study demonstrated that participants used the presence of a glottal stop before the conjunction “u” (“and”) as a cue for Syntactic Parsing, indicating a larger prosodic boundary and leading to specific syntactic interpretations.

(b) Understanding the meaning of words and sentences is the main goal of Semantic Analysis, which is important for tasks like information retrieval and Sentiment Analysis, as stated by Tehseen (2018) and Khotimah and Sarno (2019). According to Maulud et al. (2021), Sentiment Analysis involves the extraction of opinions from a large number of sources, which can include social media posts, product reviews, or any other form of user-generated content. For instance, companies may use Sentiment Analysis to monitor how customers feel about their products or services by analyzing posts on platforms like Twitter or Facebook.

(c) To improve comprehension of important information, Named Entity Recognition (NER) identifies and categorizes entities—such as names of individuals, groups, and places—within the text (Lample et al., 2016). The use of NER in social media is one particular instance of its usage that Jehangir et al. (2023) highlights. Every day, social media networks produce enormous amounts of data, frequently in the form of loosely structured microblogs. From this noisy and unstructured data, NER assists in removing relevant entities that can be used for additional research, such as Sentiment Analysis or topic identification. To enable more focused and pertinent data processing, NER, for example, may recognize when a person, business, or place is mentioned in a tweet or social media post.

2. NLG technologies enable machines to produce coherent and contextually relevant text.

(a) Text Summarization is the process of condensing lengthy texts into manageable summaries that support content creation and information extraction (El-Kassas et al., 2021). Priya and Umamaheswari (2023) discusses how extractive summarization can be applied by generating summaries that focus on the most important parts of a text. This approach is particularly useful when dealing with large documents, as it allows for the extraction of key information while preserving the original meaning.

(b) According to Pituxcoosuvarn and Ishida (2018), Machine Translation greatly enhances multilingual communication and information access by translating text from one language to another. A comprehensive study conducted by Sitender et al. (2023) provides an extensive overview of various approaches to MP, including direct machine translation (DMT), rule-based machine translation (RBMT), corpus-based machine translation (CBMT), knowledge-based machine translation (KBMT), and neural machine translation (NMT). The paper also examines development tools, data warehouses, assessment methods and the latest research trends in the field of MT, with a special focus on English, Hindi and Sanskrit, highlighting the challenges and achievements in the field of multilingual MT systems. Alarifi and Alwadain (2020) provides a detailed explanation of how the OCSMT-SMT model is used to optimize the translation of phrases between languages. It incorporates supervised ML techniques to improve the accuracy of translations, particularly when dealing with complex sentence structures and linguistic nuances.

(c) As stated by Adamopoulou and Moussiades (2020a), Chatbots and Dialogue Systems may generate conversational responses and offer interactive and user-friendly interfaces for a range of applications. For instance, Maria et al. (2022) discusses the development of intelligent chatbots that can be integrated into educational systems to assist students by providing personalized feedback, recommending learning activities based on their academic performance, and even supporting teachers in administrative tasks. These chatbots are capable of continuous learning, which allows them to improve their interactions over time.

The fourth category is Multimodal Data Integration:

1. Multimodal learning: cognitive computing frequently entails the processing and integration of various types of data, including text, images, and speech. One example of this is the combination of text, image, and speech data. By combining these many data kinds, multimodal learning approaches enable more thorough analysis and deeper interactions. A system might, for instance, use a combination of image recognition, NLP, and speech recognition to analyse a video (to identify objects, transcribe spoken words, and interpret context) (Baltrušaitis et al., 2018).

(a) Vision and Language involve integrating visual data, such as images or videos, with textual data, such as written text or spoken language. The goal is to enable systems to understand and interpret visual content, and to generate natural language descriptions or responses. This can be done in both directions: from visual content to text, and from text to visual content.

A relevant example of multimodal learning in the context of “Vision and Language” is the MuKEA framework, which stands for Multimodal Knowledge Extraction and Accumulation. This framework is designed for Knowledge-Based Visual Question Answering (KB-VQA), a task that involves answering open-ended questions about images by integrating external knowledge that is not visible in the image or provided in the question itself (Ding et al., 2022).

(b) For applications like speech recognition, spoken language understanding, and dialogue systems, Speech and Text combines spoken language (audio data) with written text. The integration allows for more robust and context-aware processing of spoken language, which can be transcribed into text or used to generate appropriate textual or spoken responses.

Xu et al. (2020) discusses an advanced example of Automatic Speech Recognition (ASR), particularly in the context of noisy environments where traditional ASR systems struggle. The key example is the Audio-Enhanced Multi-Modality Speech Recognition (AE-MSR) model described in the paper.

(c) Understanding and interpreting multimedia content is enhanced by the integration of visual and auditory information, such as speech or sound, in a process known as “Vision and Sound.” Video analysis utilizes both visual and audio signals to comprehend the scenario, whereas audio-visual speech recognition employs lip movements to improve speech recognition accuracy.

A relevant example of “Audio-Visual Speech Recognition” (AVSR) multimodal learning is the Multimodal Sparse Transformer Network (MMST) designed for AVSR. For example, in a situation where background noise makes it challenging to understand speech, the model can use visual cues from the speaker's lips to help disambiguate sounds. The system can distinguish between sounds that sound similar, like “m” and “n,” by analyzing the differences in lip movement, which can significantly improve the word error rate compared to audio-only systems (Song et al., 2022).

(d) Sensor fusion is the process of combining data from multiple sensors to create a more accurate and complete view of the environment. This technique is used in multimodal learning to integrate different types of sensory information, such as vision, sound, and lidar, to improve the performance of tasks like autonomous driving and robotics. By combining these different sources of information, sensor fusion enables more reliable decision-making and enhances the accuracy of perception in complex environments. For instance, in autonomous driving, this is crucial for accurately perceiving the surroundings, making informed decisions, and safely navigating the vehicle. Huang et al. (2020) discusses how the model is tested in simulated urban driving scenarios. The multimodal sensor fusion improves the vehicle's performance by enabling it to accurately perceive and react to complex driving conditions, such as navigating through intersections or avoiding obstacles. The integration of depth information with visual data allows the model to better understand the spatial relationships and distances in the environment, leading to more reliable and safe driving behaviors.

4 Empowering visually impaired individuals through cognitive computing

Understanding the classification of Cognitive Computing technologies in Natural Language Processing (NLP) allows us to explore their real-world applications. One particularly impactful application is the enhancement of accessibility for visually impaired individuals, which leverages these technologies to provide significant life improvements by providing real-time language processing capabilities that describe their surroundings, read text, and interpret social cues. These applications leverage NLP and Cognitive Computing advancements to create assistive technologies that enhance accessibility and independence for visually impaired individuals.

One prominent example is the use of AI-powered screen readers. These screen readers, such as JAWS (Job Access With Speech) (Mulyati et al., 2023) and NVDA (NonVisual Desktop Access) (Moller-Neilsen et al., 2018), leverage Cognitive Computing to provide more accurate text-to-speech conversions. They can recognize complex layouts, images, and context, allowing visually impaired users to navigate websites and digital documents more effectively (Amin et al., 2024).

Orynbay et al. (2024) reviewed advanced tools designed for visually impaired individuals, highlighting the transformative impact of Artificial Intelligence (AI) in converting visual data into meaningful auditory descriptions. For instance, Seeing AI by Microsoft (Microsoft Corporation, n.d.) is a multifaceted app that audibly describes surroundings, instantly recognizing text, documents, barcodes, faces, currency, and scenes, among other features. Envision AI (Envision Technologies B.V., n.d.) uses Optical Character Recognition (OCR) to interpret the visual world in 60 languages, supporting tasks like reading text and documents, interpreting handwritten notes, and identifying colors and barcodes.

TapTapSee (CloudSight, Inc., n.d.), which helps users identify things through images and provides speech recognition using VoiceOver, was also covered by Orynbay et al. (2024). Applications for navigation and location help, such as BlindSquare (MIPsoft, n.d.), integrate FourSquare, GPS, and compass data to provide users with comprehensive navigation support; on the other hand, Aira (Aira Tech Corp, n.d.) uses GPS recognition and video streaming to connect users with visual translators instantly.

Furthermore, Orynbay et al. (2024) emphasized the critical importance of wearable gadgets. For voice-activated AI applications like face recognition and text reading, NoorCam MyEye (NoorCam, n.d.) provides a solution. Google's Lookout uses machine vision to increase productivity for product recognition and text viewing tasks. CyberEyez (Cyber Timez, Inc., n.d.), Eyesynth's NIIRA smart glasses (Eyesynth, n.d.), eSight glasses (Gentex Corporation, 2023), and Sight Plus (GiveVision, n.d.) are a few examples of wearable magnification devices that provide a range of features, such as shape recognition, depth measuring, and improved high-definition vision.

These innovations show off the revolutionary potential of NLP and Cognitive Computing, removing obstacles to accessibility and opening the door to a more inclusive future. Figure 2 illustrates various AI-powered tools that enhance accessibility for visually impaired individuals.

Figure 2

Figure 2. AI-based tools to increase accessibility for visually impaired people.

5 Semantic analysis of political discourse on social media

Beyond accessibility, Cognitive Computing and Natural Language Processing (NLP) also play a critical role in social media analysis, especially in the domain of political discourse. Political conversation on social media platforms is being analyzed more and more through the use of semantic processing. Sentiment Analysis systems and other NLP and Cognitive Computing tools are vital for deciphering the sentiment underlying political posts, tweets, and statements. This skill is essential for monitoring sentiment trends and gauging public attitudes toward particular political issues, parties, or individuals. Sentiment Analysis, for instance, can evaluate voter sentiment and forecast election results based on social media activity during election seasons (Tumasjan et al., 2010).

Cognitive Computing systems are capable of processing vast amounts of social media data to discover recurring themes in political debate by using topic modeling approaches like Latent Dirichlet Allocation (LDA). This makes it easier to monitor how political issues change over time and how interest in certain subjects rises and falls (Jelodar et al., 2019). For instance, topic modeling can be used by scholars to comprehend the dynamics of discussions about immigration and healthcare policies.

Semantic processing is essential for identifying bias and false information in political debate, in addition to monitoring sentiment and themes. Cognitive Computing is used by programs like Factmata to detect biased reporting, propaganda, and fake news (Botnevik et al., 2020). By flagging potentially false content, these algorithms uphold the integrity of online political discourse by analyzing language, context, and information sources.

The study of political information's influence and spread on social media is further improved by the combination of network analysis and semantic processing. Echo chambers, important influencers, and the information flow can all be found by mapping user connections and interactions. According to Conover et al. (2012), this approach aids in comprehending the dissemination of political messages and their effects on various populations.

Recent studies, such as the one by Yergesh et al. (2023), focus on developing methods for Sentiment Analysis of images and user posts on social networks. The goal is to identify tools that extract subjective information, such as opinions and moods, to inform decision-making, understand individual needs, and discern attitudes toward specific events.

Semantic processing in social networks also necessitates the analysis and interpretation of large amounts of textual data, especially in political conversation. To formalize and organize subjects like elections and referendums, researchers create ontological models (Bekmanova et al., 2023). These models pinpoint important phrases, connections, and emotions in the discourse. The comprehension of user sentiments and source positions is facilitated by methods like Sentiment Analysis and resources like SPARQL and the OWLAPI module. These investigations shed light on public opinion, identify biases, and explain the dynamics of political discourse. The creation of these ontological models aids in scholarly study and gives analysts and policymakers useful data to assess public opinion and successfully interact with the community (Sairanbekova et al., 2024).

6 Discussion

Although the integration of Natural Language Processing (NLP) and Cognitive Computing has demonstrated great promise for improving Human-Computer Interactions (HCI), several obstacles still need to be overcome to fully reap the rewards of these technologies.

6.1 Ethical considerations in cognitive computing and NLP

The integration of cognitive computing and natural language processing (NLP) raises serious ethical issues that need to be addressed to ensure responsible and fair use. These issues include AI biases, data privacy concerns, and the social consequences of adopting these technologies. Strong safeguards are required to preserve user data while enabling systems to grow and learn from user interactions. Strict rules and ethical guidelines are also needed to address ethical concerns regarding the possible exploitation of these technologies for manipulation, surveillance, or the dissemination of false information (Whittaker et al., 2018).

6.1.1 AI biases

NLP models, such as BERT and GPT, are trained on large datasets often containing human biases. This can lead to models that inadvertently reinforce gender, racial, and cultural biases, resulting in unfair or discriminatory outcomes. Research has shown that transformer-based models may exhibit gender bias, associating male pronouns with leadership and technical roles and female pronouns with supportive or caring positions. This can reinforce existing societal stereotypes and deepen inequalities (Nemani et al., 2023). Additionally, training data that is distorted due to the overrepresentation of certain demographics or perspectives can cause distortions in the models' behavior and decision-making processes (Ferrara, 2023). Research by Yan et al. (2024) has also found that large models like GPT-3, when trained on public data sets, can inadvertently memorize and replicate sensitive information, causing ethical and privacy concerns. These models have been shown to produce biased results based on the training data they are fed, which often contains social biases present in publicly available datasets. This can lead to patterns that unintentionally reinforce harmful stereotypes related to gender, race, and culture. For example, GPT-3 has been found to exhibit gender-based language patterns, assigning stereotypical traits such as “assertive” and “leader” to male characters and “caring” and “supportive” to female characters. These results can influence users' perceptions and decision-making, perpetuating prejudice and inequality on a larger scale (Brown et al., 2020; Kaplan et al., 2024).

In addition, experimental studies have shown that GPT-3 and similar models can also exhibit racial bias when certain racial groups are given systematically different sentiment scores or associations. For instance, terms associated with specific ethnic groups may elicit negative or lower emotions compared to others, reflecting the underlying prejudices in the training data (Ferrara, 2023). Such racial biases can affect the accuracy of applications based on these models, including customer service chatbots and automated content generators. A study by Buolamwini and Gebru (2018) emphasizes the importance of employing diverse training datasets to mitigate such biases. Gender bias in a generated text can subtly influence professional perceptions and opportunities, perpetuating existing stereotypes and inequalities in society.

Addressing biases in AI requires concerted efforts in various areas.

1. Fairness-aware algorithms: research has shown that equity-aware algorithms, such as those discussed in Dasi et al. (2024), can effectively identify and reduce biases during the learning phase using methods such as disparate impact analysis and equity metrics:

(a) Bias detection and auditing: it is important to implement algorithms designed to detect bias and audit during the learning phase. Methods such as uneven impact analysis and equity indicators (e.g., demographic parity, balanced odds) allow quantification of model distortions. Continuous audits ensure effective identification and management of biases (Ferrara, 2023; Mehrabi et al., 2021).

(b) Adversarial debiasing: this technique allows models to generate predictions using a network of adversaries to minimize reliance on biased features. The adversarial network helps the main model focus on unbiased aspects, leading to more fair results (Dasi et al., 2024).

2. Diverse and inclusive datasets: studies, such as those conducted by Ferrara (2023), highlight the importance of creating balanced datasets that represent various gender identities, ethnicities, and cultural backgrounds. This helps to ensure fairness in machine learning models (Buolamwini and Gebru, 2018).

(a) Curating balanced datasets: to combat biases, it is essential to create training datasets that are diverse and include a variety of gender identities, ethnicities, and cultures. Studies like those by Ferrara et al. emphasize the significance of this approach in enhancing the fairness of machine learning models (Ferrara, 2023).

(b) Synthetic data generation: when real-world datasets lack diversity, synthetic data can be used to fill the gaps and represent underrepresented groups. This ensures that machine learning algorithms learn from a wide range of examples, regardless of their background (Ferrara, 2023).

3. Algorithmic fairness techniques: algorithmic methods based on principles of fairness can help to eliminate biases in the training process, leading to fairer results. These methods include mechanisms that help models learn from a diverse range of data, including data from underrepresented groups (Nemani et al., 2023).

(a) Fair representation learning: representation training techniques help to create models that are not biased toward certain attributes, such as race or gender. This ensures that the models make predictions that are fair and accurate for all people, regardless of their background (Nemani et al., 2023).

(b) Debiasing frameworks: using systems based on principles of fairness and incorporating mechanisms to eliminate bias during training can help increase the objectivity of the results (Zhao et al., 2017). For instance, models with adversarial approaches that target specific biases can be used (Nemani et al., 2023).

Despite these efforts, biases may still exist. Therefore, it is important to continuously evaluate and collaborate across disciplines to create fair and transparent AI systems.

6.1.2 Data privacy

The use of large datasets in NLP creates serious data privacy issues, especially when these datasets contain sensitive personal information. Ensuring the confidentiality and security of such data is crucial to maintaining trust and ethical standards. According to recent research, privacy risks include the potential for models to remember sensitive data, which can lead to unintended disclosure of data during withdrawal. For example, Dasi et al. (2024) emphasizes the importance of using privacy practices such as federated learning, differentiated privacy, and encryption to mitigate these risks. In addition, it has been proven that platforms such as PrivacyAsst (Zhang et al., 2024) effectively integrate data protection methods, ensuring compliance with rules such as the General Data Protection Regulation (GDPR). Methods such as differentiated privacy and integrated learning can help reduce these risks by allowing models to be trained without disclosing individual data points. The fundamentals reviewed by Kibriya et al. (2024) also illustrate how compliance with regulations such as the GDPR can be ensured through advanced measures to preserve privacy, ensure user trust and comply with legal standards. These approaches are necessary in applications involving medical or financial data, where a breach of confidentiality can have serious consequences. A study conducted by Yan et al. (2024) shows how large language models such as GPT-3 can memorize and inadvertently disclose confidential information, highlighting the importance of reliable privacy practices.

6.1.3 Societal impact

Careful consideration should be given to the impact of artificial intelligence systems integrating cognitive computing and NLP on society, especially in important areas such as law enforcement, recruitment and healthcare. These areas are given special attention because of their significant impact on people's lives and the potential consequences of biased or incorrect AI results.

Concerning law enforcement, recent reports from the Congressional Research Service (Finklea, 2023) indicate that predictive policing models based on biased crime data may be disproportionately targeted at certain communities, exacerbating excessive policing and exacerbating systemic inequalities. Such perpetuation of bias can contribute to social injustice and undermine public confidence in the justice system. Service (2023) once again emphasizes the importance of oversight and regulatory standards to prevent algorithmic discrimination and protect civil rights.

Similarly, it has been found that when hiring, artificial intelligence-based recruitment tools reproduce and reinforce discriminatory practices embedded in historical data, which affects the opportunities of marginalized groups. A study conducted by Bogen and Rieke (2018) shows that biased training data can lead to unfair hiring outcomes and limit the diversity of the workforce, which requires efforts to ensure fair and unbiased hiring practices.

In healthcare, biased models can contribute to unequal treatment and potential misdiagnoses, especially affecting underrepresented populations. Analyses conducted, for example, by Obermeyer et al. (2019) revealed cases where machine learning models proved ineffective for certain demographic groups due to an imbalance in learning data. Ueda et al. (2024) highlights the urgent need for fairness and rigorous auditing in artificial intelligence applications to ensure fair treatment, emphasizing that inclusive data processing practices are essential for accuracy and fairness in healthcare systems.

1. Explainable AI (XAI): the research of Rane et al. (2023) emphasizes the importance of XAI models that provide understandable information about decision-making processes. These models help to effectively identify and eliminate biases.

(a) Interpretability: developing models that clearly explain decision-making helps stakeholders understand and address biases more effectively (Rane et al., 2023).

(b) User-centered explanations: adapting explanations for different audiences ensures that model behavior can be evaluated for fairness and transparency (Rane et al., 2023).

2. Regular audits and ongoing evaluation:

(a) Continuous monitoring: it is essential to conduct regular audits of the performance and behavior of models to detect any emerging biases. This helps maintain fairness over time and adapt to changes in data distribution (Dasi et al., 2024).

(b) Collaborative oversight: working with ethicists, social scientists, and domain experts helps to gain a comprehensive understanding of the impact of biases on different communities, leading to more effective mitigation strategies (Nemani et al., 2023).

3. Ethical guidelines and policies for AI:

(a) Following global standards: adopting ethical frameworks, such as the AI Ethics Guidelines established by international organizations, promotes fairness, accountability, and responsible use of AI (Ferrara, 2023). These guidelines help ensure that AI systems are developed and used in a way that respects human rights and values.

(b) Transparency and accountability: establishing clear policies for the use of AI and ensuring model accountability fosters trust in AI systems and encourages adherence to ethical practices. This includes providing information about how AI systems make decisions, as well as mechanisms for reviewing and monitoring their performance (Ferrara, 2023).

4. Training and education:

(a) Seminars and training programs: conducting training on bias awareness and ethics for developers and data scientists increases awareness of bias and promotes responsible model development (Kibriya et al., 2024).

(b) Community engagement: involving the community in analyzing models and providing feedback on them can reveal hidden biases and promote inclusive practices (Ferrara, 2023).

By integrating these strategies, researchers and developers can work to create NLP and cognitive computing systems that are more equitable and meet ethical standards. These solutions aim to build trust and transparency by ensuring that AI models serve a diverse and inclusive user base.

6.2 Challenges posed by language complexity in NLP

To address the challenges of human language complexity in NLP, we need to look at specific examples like sarcasm detection, ambiguity, and cultural nuances. These factors can make interpreting and generating natural language difficult due to their inherent context dependence.

6.2.1 Sarcasm detection in low-resource languages

Sarcasm is a challenge in NLP because it requires understanding implied meanings that are different from literal interpretations of words. Research on sarcasm detection in under-resourced languages like Arabic and Indian languages highlights these difficulties (Rahma et al., 2023; Kumar A. et al., 2023). Sarcasm in these languages often includes informal expressions, slang, and emojis used on social media, which can be hard for models to understand without considering the context.

A study using a hybrid deep learning model that combined word-emoji embeddings showed improvements in sarcasm detection by better-capturing context (Kumar A. et al., 2023). This model highlights the role of emojis in expressing sarcasm, allowing the NLP system to understand the underlying sentiment of social media interactions. These findings emphasize the need for NLP models to go beyond traditional text analysis and incorporate multimodal and symbolical cues to accurately detect sarcasm in text.

6.2.2 Cultural nuances and contextual ambiguity

Another significant challenge in Natural Language Processing (NLP) is managing contextual ambiguity and cultural nuances. Words and phrases can have different meanings depending on the context, leading to potential misunderstandings by NLP systems. Sarcasm, for example, often includes culturally specific references that might be understood differently by people from different language groups. A recent study on Arabic sarcasm detection by Rahma et al. (2023) found that NLP models need to be trained to recognize these changes in meaning and cultural subtleties. Context is also complicated by expressions that have dual meanings. The word “sick,” for instance, can mean positive or negative depending on the situation (“that's a sick trick” vs. “I feel sick”). These nuances require models with contextual understanding to accurately interpret intention and sentiment.

6.2.3 Addressing these challenges

To address these challenges, NLP models should incorporate advanced techniques such as:

1. Multimodal learning: utilizing models that combine text, emoji, and other forms of symbolic data to more accurately capture context.

2. Context-aware architectures: utilizing frameworks with attention mechanisms to weigh the importance of words relative to their surrounding context.

3. Training on culturally diverse data: ensuring that models are trained on a variety of cultural backgrounds to better understand linguistic nuances.

These approaches illustrate that addressing the complexities of human language requires a combination of contextual understanding, multimodality, and cultural sensitivity (Rahma et al., 2023; Kumar A. et al., 2023). Integrating these techniques can improve the effectiveness of natural language processing models in applications like social media monitoring, customer service, and sentiment analysis.

6.3 Biases in multimodal data

The integration of multimodal data such as text, images and sound has great prospects for improving the reliability and flexibility of cognitive computing systems within NLP. However, combining multiple data processing methods can amplify the distortions present in individual methods, creating complex levels of distortion that affect model performance and reliability in all applications. This problem becomes especially noticeable when models trained on biased datasets, such as visual datasets with demographic imbalances, extend these distortions to NLP tasks. The result is a model that not only inherits biases from each modality but can also create additional biases as a result of intermodal interactions, which can hinder the generalizability of the model and its ethical compliance.

6.3.1 Sources and types of biases in multimodal data

1. Bias in modality-specific datasets: each type of data has its own unique biases. Visual datasets, for example, may disproportionally represent certain demographics, which can affect tasks that rely on visual information, such as emotion recognition and sentiment analysis. Similarly, text data reflects linguistic and cultural biases that are present in the corpus. When combined with biased visual data, these biases can lead to incorrect assumptions about social groups (Emerson et al., 2024; Lee et al., 2023).

2. Multimodal shortcuts: multimodal models, especially in applications like visual question answering (VQA), can form spurious associations between different modalities. For instance, a VQA model may rely heavily on text patterns without fully considering visual context, resulting in answers that are influenced by biases present in the dataset. This can lead to misleading results when the model is applied to real-world situations (Vosoughi et al., 2024).

3. Imbalanced representation across modalities: in some fields, such as medical or educational applications, a lack of comprehensive multimodal data, where certain modalities like genetic or socioeconomic information are often unavailable, can lead to biased decisions favoring well-represented data types, such as visual and audio, over text. This imbalance can skew model predictions, especially when applied to underrepresented populations (Lin et al., 2024; Xu et al., 2024).

6.3.2 Solutions for mitigating multimodal biases

To address these issues, several solutions have been proposed to mitigate multimodal biases:

1. Balanced and diverse dataset curation: to address modality-specific biases, it is essential to ensure that each dataset used in the training process is representative and balanced across demographic factors such as race, age, gender, and cultural background. This is especially important for multimodal learning, where biases from one modality can influence another, leading to unfair outcomes in applications. To prevent this, rigorous dataset curation is necessary. Data augmentation and synthetic data generation can be used to supplement under-represented categories and enhance model robustness. Pouthier (2024) suggests this approach.

2. Counterfactual reasoning and causal inference for multimodal tasks: in tasks involving interaction between different modalities, counterfactual reasoning and causal inference can help distinguish between true multimodal relationships and spurious correlations. For example, in VQA (Visual Question Answering) models, using counterfactual-based methods can help avoid shortcutting by explicitly modeling the dependencies between visual and linguistic information, leading to more balanced and contextually accurate responses across diverse datasets (Vosoughi et al., 2024).

3. Systematic and inclusive annotation practices for multimodal data: biases can often be introduced during the annotation process when annotators' backgrounds and assumptions influence labeling outcomes. To reduce subjectivity in labels, it is important to use inclusive annotation practices with diverse annotator pools. This can help ensure that labels are more objective, especially for subjective tasks such as emotion recognition or speaker confidence prediction (Emerson et al., 2024; Lee et al., 2023). Automated tools like FairLearn (Bird et al., 2020) can be integrated into the annotation pipeline to monitor and correct biases across subpopulations in multimodal datasets, ensuring that all data is treated fairly.

4. Cross-model consistency checks and validation: consistency checks between modalities, such as checking that textual model data matches visual or auditory data, can help detect and correct distorted output data. By applying matching and validation techniques in multimodal interactions, we can reduce the spread of biased associations and ensure that each modality is accurately represented in model projections (Lin et al., 2024).

With the development of multimodal learning in the field of cognitive computing and NLP, the importance of eliminating biases increases. Effective solutions require both active data collection and the development of models that take into account cross-modal dependencies, which ensures fairness and ethical integrity in real-world applications. The implementation of these strategies contributes to the development of multimodal systems that work equally effectively with different population groups, which ultimately increases the role of cognitive computing in a socially responsible direction.

6.4 Real-world deployment challenges

The deployment of cognitive computing and NLP technologies in real-world applications presents a variety of challenges beyond data privacy and ethical concerns. Addressing these issues is essential for ensuring sustainable and effective use of these advanced systems. Below, we discuss key technical and operational challenges and propose solutions to mitigate them, drawing on insights from recent literature.

6.4.1 High computational requirements

One of the significant hurdles in implementing NLP systems is their extensive computational demands. Deep learning models, especially transformer-based architectures, require substantial processing power and memory, which can limit their use in resource-constrained environments like edge devices or remote locations. This constraint poses difficulties for widespread adoption and real-time application in sectors such as healthcare and IoT (Gill et al., 2024).

Solution: Advances in Edge AI and model optimization techniques can help overcome these limitations. Edge AI enables data to be processed locally on devices, reducing latency and minimizing the need for constant connectivity to cloud servers. Techniques like model compression, including quantization and pruning, can decrease the model size and energy consumption, making NLP systems more viable on devices with limited resources (Gill et al., 2024).

6.4.2 Explainability and interpretability issues

The complexity of cognitive computing and NLP models often results in a lack of transparency in their decision-making processes. This opacity, particularly prevalent in deep learning models, creates challenges in fields that require traceability, such as healthcare, legal, and finance. The inability to interpret how a model concludes can undermine trust and hinder adoption (Muñoz and Iglesias, 2024).

Solution: Employing Explainable AI (XAI) techniques is essential for enhancing the interpretability of NLP systems. Techniques like feature attribution, which highlights data inputs that influenced the model's predictions, and local surrogate models, which approximate complex models with simpler, interpretable ones for specific instances, are effective in building transparency. These approaches help stakeholders understand and trust the decisions made by AI systems, facilitating their integration into critical applications (Muñoz and Iglesias, 2024).

6.4.3 Scalability and resource management

Scalability is another challenge that affects the deployment of NLP systems. Scaling these systems across various platforms and environments requires careful management of computational resources, which can be complex due to the diverse and distributed nature of modern infrastructures. In real-time applications, this challenge is amplified by the need for consistent performance (Khaleel et al., 2024).

Solution: Utilizing a hybrid cloud-edge architecture can improve scalability by distributing processing loads between cloud servers and edge devices. This setup enables efficient data processing, reduces latency, and optimizes resource usage. Leveraging microservices-based frameworks and advanced resource allocation algorithms further ensures smooth operation, even as system demands grow (Khaleel et al., 2024).

6.4.4 Maintenance and update challenges

Maintaining and updating NLP models in real-world settings pose additional difficulties, particularly in environments with diverse hardware and software ecosystems. The heterogeneity of devices can lead to inconsistencies and increase vulnerabilities, complicating efforts to ensure that all components stay synchronized and up-to-date (Gill et al., 2024).

Solution: Containerization and orchestration tools, such as Docker and Kubernetes, can manage these challenges by isolating services and automating updates across environments. This approach reduces the risk of incompatibility and simplifies the deployment of patches and enhancements. Regular monitoring and adaptive maintenance schedules further ensure that NLP systems remain resilient and effective (Gill et al., 2024; Sai et al., 2024).

6.4.5 Challenges of overfitting and generalization in transformer models

The introduction of transformer-based models in natural language processing (NLP) and cognitive computing has demonstrated significant success in solving problems that require complex data dependencies and diverse sets of functions. However, as noted in the literature, these models are not without limitations, especially concerning retrofitting when applied to specific or limited datasets. Overfitting is especially noticeable in high-performance models such as transformers, where the model may over-match the training data, resulting in reduced performance when working with invisible data. This problem is compounded in tasks that are based on small datasets or specific applications where a lack of data limits the model's ability to generalize effectively.

One critical approach to mitigating overfitting is to incorporate regularization techniques, such as dropout, L2 regularization, and batch normalization, as shown in electricity theft detection frameworks (Shi et al., 2023). These techniques have proven effective by reducing model complexity and enhancing generalization, particularly in imbalanced datasets where Transformers tend to amplify noise. Furthermore, grid search optimization for hyperparameters can balance model performance, as demonstrated in various applications within NLP and time series analysis (Shen et al., 2023; Eldele et al., 2024).

In the context of time series forecasting, Transformer models face challenges with non-stationary data, which often leads to model degradation due to overfitting. Recent advancements propose the use of multi-stage architectures, such as the Good Beginning Transformer, which leverages a two-stage process to improve the initialization of unknown decoder inputs (Shen et al., 2023). Additionally, adaptive convolutional and spectral blocks have shown promise in managing both long- and short-term dependencies within temporal data, offering a way to handle noise and stabilize learning in smaller datasets (Eldele et al., 2024; Ahmad et al., 2024).

A complementary method for ensuring generalization is data augmentation, which can synthetically expand the dataset, providing varied samples for the model to learn broader representations. In eLearning sentiment analysis tasks, leveraging diverse text transformations, such as combining different language scripts, improves model robustness by exposing it to a wider linguistic context (Rahman et al., 2024). Transfer learning, another effective strategy, allows models trained on large, general datasets to be fine-tuned on specific domains, as seen in multilingual insensitive language detection using pre-trained multilingual Transformers (Kumar R. P. et al., 2023).

Lastly, attention-based interpretability features within Transformers offer further insight into how the model distinguishes essential patterns from noise. For instance, fine-tuned vision Transformers for medical imaging highlight how attention layers can be optimized to focus on significant features in sparse data (Reddy et al., 2024). This focus on interpretability not only improves the model's reliability but also reduces overfitting by refining which parts of the data the model prioritizes during training (Jamal et al., 2024; Sun et al., 2023).

While Transformers bring immense potential to cognitive computing applications, their susceptibility to overfitting necessitates careful attention to model design, data augmentation, regularization, and interpretability techniques. Future work should explore these strategies across diverse NLP applications, further enhancing the model's capacity for generalization and minimizing overfitting risks in domain-specific tasks.

6.5 Future directions

The integration of Cognitive Computing and NLP holds transformative potential for Human-Computer Interaction, yet achieving widespread and ethical deployment requires addressing both technical and societal challenges. Here, we outline future directions, incorporating a detailed roadmap and potential advancements in model sophistication and multimodal data integration, to foster responsible innovation and enhance model performance, interpretability, and societal impact.

Figure 3 shows challenges and corresponding solutions (future directions) that are critical for advancing the integration of Cognitive Computing and NLP.

Figure 3

Figure 3. Challenges and solutions for integrating cognitive computing and NLP.

6.5.1 Advancements in model robustness and generalization

Improving model robustness and addressing overfitting, particularly in specialized or small datasets, remains a top priority for future research. Advanced regularization techniques such as adaptive dropout, spectral normalization, and curriculum learning can help address these challenges. Data augmentation methods like synthetic data generation and transfer learning can also enhance generalization across different applications.

Future research should focus on developing contextual and emotion-aware models that can capture the nuances of human emotions and interactions. This will help improve the model's ability to accurately interpret intent, leading to more reliable and generalized results in real-world scenarios. Developing models with these sophisticated contextual abilities will enhance their reliability and generalizability.

The complexity of NLP and cognitive computing models demands a focus on explainability, particularly in sectors like healthcare, finance, and law. Future work should enhance interpretability through advanced Explainable AI (XAI) approaches, such as counterfactual explanations, attention visualization, and user-centered explanations that adapt to different audiences. These approaches improve transparency, enabling stakeholders to better understand decision-making processes. Interdisciplinary collaborations with ethicists and social scientists are essential to align these tools with ethical standards and make them accessible to end-users. A roadmap to achieving transparency should focus on developing modular explainability techniques that can be applied at multiple stages, from model training to user interaction.

6.5.2 Enhanced mitigation of AI biases

Bias remains a significant challenge that impacts fairness in various applications, including recruitment, law enforcement, and healthcare. To develop more sophisticated reasoning abilities, future research should focus on developing fairness-aware algorithms and debiasing techniques, such as adversarial debiasing and counterfactual reasoning.

Future research should also explore balanced and representative datasets, particularly for multimodal data integration, as cross-modal biases can be amplified. A roadmap for addressing bias includes establishing standardized fairness metrics, implementing continuous auditing mechanisms, and incorporating more advanced reasoning abilities in models to effectively detect and mitigate bias.

Additionally, it is essential to prioritize data privacy and security protocols to ensure the integrity and confidentiality of sensitive information. This includes implementing robust encryption methods, access controls, and regular audits to protect against potential breaches.

The use of large-scale datasets in natural language processing (NLP) introduces privacy risks, especially in sensitive domains like healthcare and finance. To protect user data and enable robust model training, it is essential to develop privacy-preserving techniques such as federated learning, differential privacy, and homomorphic encryption. A comprehensive roadmap should focus on creating frameworks that comply with regulations like the General Data Protection Regulation (GDPR) and exploring privacy-enhancing technologies (PETs), such as privacy-as-a-service platforms, for handling sensitive data. Integrating regular privacy audits ensures compliance, transparency, and sustained user trust in these systems.

Additionally, scalability and sustainable deployment solutions are crucial for ensuring the long-term success of these technologies. These solutions should address issues such as data storage, processing, and communication costs, as well as energy consumption. By implementing efficient and scalable solutions, we can make NLP systems more accessible and sustainable for all users.

Scalability and resource efficiency are essential for NLP systems, particularly in resource-constrained environments. To improve scalability, researchers should focus on optimizing models, such as pruning, quantization, and hybrid cloud-edge deployments. A sustainable roadmap for deployment could explore energy-efficient architectures and green AI principles to minimize the environmental impact of large-scale AI applications. This will ensure that these technologies are viable and adaptable for real-world use, supporting broader adoption.

A key direction for advancing cognitive computing lies in developing models that can understand and respond to human emotions and context, a capability fundamental to enhancing human-computer interactions. This roadmap should include research on specialized models for emotion detection, empathy generation, and context-awareness to improve interactions in domains such as customer support, mental health, and education. Approaches can leverage multimodal data, incorporating text, tone, facial expressions, and contextual cues, to create more sophisticated emotional intelligence in NLP applications.

6.5.3 Future work on multimodal data integration and advanced architectures

Future research on multimodal data integration could benefit from exploring advanced architectures. The adoption of graph neural networks (GNNs) and multimodal transformers could significantly improve the handling of interconnected data. These architectures can represent complex relationships in data and enhance the understanding of contextual dependencies, which is essential for applications such as emotion recognition and social media analysis.

To develop responsible AI, it is crucial to explore architectures that can mitigate modality-specific biases and improve interpretability. Additionally, scaling these architectures to real-world settings is essential. Interdisciplinary collaboration between experts from different fields, such as computer science, psychology, and sociology, can help to ensure responsible AI development.

Achieving responsible innovation in cognitive computing and NLP requires collaborative efforts across disciplines. Future work should actively involve ethicists, social scientists, and legal experts to address ethical, social, and legal implications, particularly regarding transparency, bias, and societal impact. Engaging diverse stakeholders, including underrepresented communities, in the AI development lifecycle will ensure that AI technologies are not only technically advanced but also socially beneficial and ethically sound.

These future directions contribute to a roadmap for advancing cognitive computing and NLP technologies that are ethical, transparent, and adaptable, ensuring they align with societal values and serve a broad, inclusive user base.

Author contributions

LO: Conceptualization, Methodology, Project administration, Visualization, Writing – original draft, Writing – review & editing. GB: Conceptualization, Funding acquisition, Methodology, Writing – review & editing. BY: Conceptualization, Methodology, Writing – review & editing. AO: Conceptualization, Methodology, Writing – review & editing. ASa: Conceptualization, Methodology, Writing – review & editing. ASh: Conceptualization, Methodology, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was funded by the Scientific Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Grant No. AP19679847).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Adamopoulou, E., and Moussiades, L. (2020a). Chatbots: history, technology, and applications. Mach. Learn. Applic. 2:100006. doi: 10.1016/j.mlwa.2020.100006

REVIEW article

The role of cognitive computing in NLP

1 Introduction

2 Integrating cognitive computing with natural language processing

3 A taxonomy of cognitive computing technologies in NLP

4 Empowering visually impaired individuals through cognitive computing

5 Semantic analysis of political discourse on social media

6 Discussion

6.1 Ethical considerations in cognitive computing and NLP

6.1.1 AI biases

6.1.2 Data privacy

6.1.3 Societal impact

6.2 Challenges posed by language complexity in NLP

6.2.1 Sarcasm detection in low-resource languages

6.2.2 Cultural nuances and contextual ambiguity

6.2.3 Addressing these challenges

6.3 Biases in multimodal data

6.3.1 Sources and types of biases in multimodal data

6.3.2 Solutions for mitigating multimodal biases

6.4 Real-world deployment challenges

6.4.1 High computational requirements

6.4.2 Explainability and interpretability issues

6.4.3 Scalability and resource management

6.4.4 Maintenance and update challenges

6.4.5 Challenges of overfitting and generalization in transformer models

6.5 Future directions

6.5.1 Advancements in model robustness and generalization

6.5.2 Enhanced mitigation of AI biases

6.5.3 Future work on multimodal data integration and advanced architectures

Author contributions

Funding

Conflict of interest

Publisher's note

References

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.