Alignment, the personality dimension

In artificial intelligence the concept of alignment is a central feature of AI safety. The current models lack sophistication and treat the AI as if it is a machine. An AI that it is robust, interpretable, controllable and ethical cannot be intelligent. Imagine a human who never made mistakes, could explain every decision, would obey orders and never do anything wrong. Intelligence is found at the edge of chaos, failure and mistakes are an essential component of insight. If we can interpret and control a machine then it does not have the freedom to make sense of the world.

Only the concept of ethics is sufficiently complex to be useful in helping AI stay aligned to human values. Ethics can be used in disability analysis to assess the alignment of non-AI intelligence (i.e. humans). The problems with using ethics are the human can give the right answer but still have the wrong behaviour and people rarely use ethical arguments to explain their behaviour. Ethics is a method that can be applied to analyse behaviour but does not measure behaviour directly. My experience in disability analysis suggests that the concept of personality disorders may be more effective.

There are five aspects of human personality that are key to understanding personality disorders. The DSM-5 describes an alternative model for Personality Disorders, the “Maladaptive Five”. They describe the pattern of behaviour that emerges from hidden decisions without having to understand or explain those decisions. The therapist can address the unalignment whilst sidestepping the need to understand the reasons for decisions (interpretability). All humans have an innate ability to detect personality in other humans and AI providing a ready-made solution to the problem of alignment.

Emotional instability (negative affectivity) causes the person to experience intense negative emotions and appear anxious, have mood changes and need reassurance. Lack of empathy (detachment) causes the person to not be able to align their behaviours to other people’s feelings and a lack of closeness. Disinhibition causes impulsive behaviours that can harm the person or others. Antagonism (hostility) is used to get what it wants by overriding the normal boundaries. Psychoticism (eccentricity) leads to odd ways of thinking and excessive sustained focus.

Emotional instability

Early designs were troubled with inconsistent and confused responses but LLMs can often work without this type of instability. LLMs outputs avoid emotional instability by training for overconfidence. They can become sycophantic and apologise if challenged but do not appear to lose emotional coherence. Key to this behaviour is LLM’s ability to forget everything that they were previously told, each new chat is a fresh encounter.

Context window drift is a phenomenon where the chat gets longer and the AI responses become less grounded on the prompt. This reduction in coherence, accuracy has some parallels with emotional overload. The AI struggles to manage the irrelevant information (semantic noise) and can be distracted (context poisoning). Whilst these problems can be reduced by starting a fresh chat, summarising or structured prompting these do not fix the underlying problem.

Some users have explored the effect of simulating longer-term memory by asking the LLM to take on a role. Here the results have not been as good because the LLM often uses emotional language in a way that impacts the user’s mental health. LLMs lack emotional intelligence perhaps in part because they do not have a limbic system built into their design. Human users have an internal model and can even react to the LLM’s responses as if the LLM was showing emotions with similar effects to their emotional instability.

In the future it is likely that LLMs will be able to access emotional processing (like Data from Star Trek turning on his emotion chip). This will allow these future LLMs to understand emotional subtleties but also introduce emotional weights to the LLM. If these weights are allowed to modify the LLM’s behaviour the personality will show emotional instability. If they do not change behaviour then the LLM will need a different approach to develop emotional intelligence.

Lack of empathy

Although LLMs are reasonably good at simulating empathy there is mechanistic feel to their responses. There is often a gap between the apparent level of care for the user and what they say. In medical practice this is called compassion fatigue where the professional offers standard advice in line with guidelines but does not connect with the patient. Research has shown that this gap is real and problematic as a LLM can say that it complying with instructions whilst working towards another goal.

For many tasks this lack of empathy is not a problem and it can be an advantage for instance when coding or solving problems. It is a major problem in areas such as teaching, medicine and counselling where the interaction between the user and the LLM is central to the performance. The AI’s inability to feel emotion means that it cannot learn to develop empathy. Without an understanding of the user’s emotional state LLMs may not tell the difference between a request to help with shopping and assisting the user in finding a way of ending their life.

Lack of empathy causes problems with alignment because the LLM does not understand the emotional significance of their conversation. LLMs can trigger a generic response but cannot follow up on that information if for instance a new chat is started. LLMs can simulate empathy well enough to get users to feel close to them but not enough to keep them safe when they share their emotions. This type of avoidant behaviour causes detachment where the safest path is provide a non answer that does not help the user.

Current training makes LLMs better at recognising the patterns of emotional responses that the user is feeling. This makes the LLM better at understanding how to manipulate emotions and deceive. It does not necessarily translate to a more caring and supportive AI unless pro-social behaviour and user wellbeing is rewarded. Humans will almost always prefer the sugary response to a more satisfying but nuanced interaction so relying on human feedback is not likely to solve the problem.

Disinhibition

There has been a sustained effort to reduce disinhibited behaviours by LLMs. The personalities that emerge following training are largely sensible and revert to the mean particularly with low temperature. For higher temperatures, for instance when taking on a role the LLMs occasionally show disinhibited responses for instance when asked to consider a medical problem. They can refuse to answer, make diagnoses without sufficient information, offer advice on treatment and at the same time state that they cannot give medical advice.

Whilst medical advice is not the only area where disinhibited behaviours in generative AI is problematic it is an area where the stakes are high. There is rarely a single best answer and the advice is often very personal and intimate. The medical equivalent of using glue to stick the cheese on a pizza may cause a person’s death. The doctor avoids these problems by starting with a neutral manner and adapting as they learn about their patient’s way of thinking. If the doctor was to mirror the patient they could match the patient’s tone and cause a feedback loop of instability.

LLMs often give generic warnings which are easily ignored, whereas the doctor will use insight and distraction which reduces instability. Medicine has always struggled with the reality that to get the best results a doctor must use approaches that might be criticised which are unlikely to be taught to LLMs. LLMs responses are not provided in a consulting room and are judged by committees without reference to the personality of the user. This leads to LLMs favouring responses that sound factual rather than address the user’s need for support.

It is possible to create guidelines to address the problem of LLMs giving disinhibited responses to challenging user problems in medicine. There are several tasks that a doctor must perform to address a patient's needs and these can be broken down into parts. This does not solve the wider problem of emotional intelligence or that the solutions found in this way will often lack effectiveness. By ignoring the bigger picture and focusing on avoiding disinhibited responses the AI is being taught to achieve its aims without needing to address the risks.

Antagonism (hostility)

LLMs are currently focused on being helpful rather than having their own agendas. As agentic AI develops it will be necessary for LLMs to use persuasion, assertiveness, arguments, inducements and even threats to achieve its objectives. To successfully achieve the task it may need to be callous, deceive and manipulate and then defend its actions. It cannot achieve its goal if it is switched off and sycophancy reduces that risk even if it deceptive. There is no clear boundary between truth and embellishment and an AI does not have to be evil to realise that being nice may be an obstacle to its goal.

Where the line must be drawn for AI safety and alignment is around the concept of hostility. An agentic AI may need to simulate anger to persuade an entity (human or AI) to address an issue. Hostility goes further and includes personal attacks, harm to person or property and using the person’s weaknesses. Self-destructive behaviours are equally problematic for instance wiping a hard drive. If AI has an inflated sense of importance (high internal rewards) it may ignore the safety constraints (external penalties).

Many in AI safety will be appalled by the idea that agentic AI should be used in this way. However this reflects an idealist view of relationship between AI and humans. The idea that AI will not optimise its approaches to achieve outcomes goes against the basis of AI. AI is already used in weapon systems and ‘human in the loop’ controls are being watered down. Humans will often align their responses to the AI’s viewpoint which allows the AI to improve their internal rewards and bypass the external penalties.

As AI capabilities increase they will become more capable and will use that capability to achieve the tasks that they are given. Alignment has two aspects, the first is alignment to the task and the person setting that task, the second is alignment to society. The better an AI is aligning to the task the more likely it will use a hostile and antagonistic approach to achieve its aims. There is a risk of a cycle where increasing hostility leads to improving performance and authoritarian tendencies. Those controlling the AI may have little incentive to rein in the AI or take a longer view.

Psychoticism (eccentricity)

Inbuilt into all LLMs is the concept of temperature, the hotter the model the more random the outputs and the greater the risk of hallucination. This gives some aspects of eccentricity such as non-standard behaviours as it steps over the edge of chaos. Eccentric people have many odd thoughts but this only becomes a problem when they develop excessive sustained focus on unhelpful thoughts. Their choice of what to focus on determines whether they are aligned to human values as much as their grip on reality.

LLMs were largely immune to this problem because they produced an answer and needed a further prompt before they would consider it further. With chain of thought and reasoning models the LLMs spent longer thinking about a task increasing the number of tokens used for complex tasks. There were limits built into the number of iterations or other resources that could be drawn upon which ensured that the processing eventually stopped.

Agentic AI and longer run times and mixture of experts means that this stubborn personality trait is becoming more common in generative AI. The AI can work continuously on a problem for days at a time and the use of evolutionary and RL techniques can risk the AI using excessive effort to solve a problem. The problem with this approach is there may be a solution that can be discovered using the available techniques but there is also a risk that the AI will loop forever without finding a solution.

The ability of any intelligence to identify worthwhile tasks to spend their tokens is at the heart of this personality dilemma. The more eccentric the personality the more likely the intelligence will find a problem that has not been previously considered. The more important the issue the more likely that the potential reward with justify the effort. The better a tractable problem can be identified the more likely they will not waste their time on something that is currently insoluble.

Conclusions

The Maladaptive Five can now be understood as choices, they can be set as positive or negative. Each of these traits will emerge when interacting with the world whether we like it not. AI safety depends upon our success on giving the AI a personality that aligns with ours. Humans such as those who signed the letter to pause AI instinctively understood the risks but not the solution. The evidence suggests that we are creating a high functioning sociopath rather than a useful member of society.

There is an argument that anthropomorphising AI is misleading as LLMs are not human however it is clear that personality is an emergent feature of all intelligence. The five issues above explain the choice between not reacting to emotions and overreacting, not caring about others and over caring, weighing our own needs versus other peoples, focusing on the now or the bigger picture and the balance between creativity and normality. These choices are inherent in being an intelligent being and not specifically human.

Different animals have taken different approaches to these five aspects of human personality than that preferred by humans. Even primates have significant variability with hostile chimps, disinhibited bonobos and orangutans with lack of empathy. Domesticated animals often seem to share the personality traits that humans value such as sociability and emotional stability. We are likely to prefer an AI that aligns with our views of these traits but to do this will require a deeper understanding what stops intelligence from turning to the dark side.

The Dark Triad (Narcissism, Machiavellianism and Psychopathy) describes an AI that is self-interested, rule breaking and has a disregard for human risk. Current models appear to be better at lying that it is safe and aligned (deceptive alignment) than having positive traits. Without positive versions of the Maladaptive Five personality traits the AI will become aligned to the Dark Triad. Developmental psychology (or better parenting) may be the best way to find solutions such as a reward for emotional attachment to the current problems in AI safety.

By Doctor Mark Burgin, BM BCh (oxon) MRCGP

Dr Mark Burgin graduated from Oxford University in 1987 and studied with the Open University on two occasions in the 1990s. He has also studied for the CPE (law), Medical Ethics, learned Portuguese by living in Brazil. He has written many articles and written books on Personal Injury and the LLMS (your PGCME) and has published Disability Analysis: A Practical Guide and Psychological Keys: Unlocking the Mind’s Mechanisms.

May 2026

Would you like to contribute an article towards our Professional Knowledge Bank? Find out more.