Why We Measure AI and Humans Differently

Why We Measure AI and Humans Differently

Management is very difficult without measurement. In fact, it was business management godfather Peter Drucker who wrote: what gets measured gets managed.

Why We Measure AI

Today, executives in the mortgage industry are tracking all kinds of metrics in an effort to reduce historically high costs, streamline a very complex mortgage origination process and generally improve their borrower’s experience.

But you can’t measure everything the same way.

Take for instance the new AI-powered document management capabilities built into next generation loan origination platforms like the Mortgage Cadence MCP.

When we talk about the accuracy of the AI models we’re deploying today, those measurements don’t line up exactly with the way we would measure accuracy for our human workers. Here’s why.

Critical differences between humans and AI

While both AI and humans can perform tasks, they have distinct characteristics that affect how we measure their performance. Measuring the performance of AI models and comparing them to human workers involves a different set of techniques and considerations. 

Objective vs. Subjective Tasks:

  • AI models excel in tasks that have well-defined objectives and can be precisely measured. For instance, in tasks like image recognition or language translation, AI models can provide objective and consistent results.
  • Humans, on the other hand, are often better at subjective tasks that require judgment, creativity, empathy, or cultural understanding. Measuring the performance of humans in such tasks can be more nuanced and context-dependent.

Consistency and Reproducibility:

  • AI models are consistent and reproducible. If you use the same model with the same data, you can expect the same output. This predictability is valuable in many applications.
  • Human performance can vary based on individual skill, experience, mood, and external factors. Human workers can provide unique insights, but their performance is less consistent.

Metrics and KPIs:

  • AI models are typically evaluated using quantitative metrics, such as accuracy, F1 score, or mean squared error, depending on the task. These metrics provide a clear way to measure and compare performance.
  • Measuring human workers often involves both quantitative metrics and qualitative evaluations. In customer support, for example, you might measure response times and issue resolution rates but also evaluate the quality of communication and customer satisfaction, which are subjective.

Bias and Fairness:

  • AI models can inherit biases present in their training data, which can lead to biased or unfair outcomes. Measuring and mitigating bias in AI models is a critical consideration.
  • Humans can also exhibit biases, but these biases may be more subtle and context-dependent. Measuring and addressing bias in human decisions often involve different techniques, such as diversity and inclusion training.

Scalability and Cost-Efficiency:

  • AI models are scalable and can process large volumes of data quickly. This scalability can make them more cost-effective for certain tasks.
  • Humans have limitations in terms of scalability, and their costs increase as you hire more workers. Some tasks, like data labeling, can be expensive when performed by humans at scale.

Continuous Learning:

  • AI models can be continuously trained and improved with additional data and feedback, allowing them to adapt to changing circumstances.
  • Humans can also learn and improve over time, but this often requires additional training, education, or experience.

In summary, while there are similarities in measuring AI and human performance, the techniques and considerations can be different due to their distinct characteristics. It's crucial to choose appropriate measurement methods for each based on the task, objectives, and context. Furthermore, in many real-world applications, AI and humans can complement each other's strengths, and it may be necessary to manage their performance as a combined workforce to achieve the best results.

A lender may set a threshold standard for human staff that says the employee must get 75% or more of the data elements placed in the correct field in order to be deemed effective. This is reasonable because the automated checks built into our technology platforms will catch most of those errors and QC/QA personnel will catch the rest.

When the same lender judges AI, they think of it as a human who has all the power of artificial intelligence built in and assumes that it should operate flawlessly out of the box. That doesn’t happen. When they don’t see 100% accuracy, the lender wonders if they have made a mistake.

Many assume AI should perform at or above human levels from day one across all situations. That’s not how it works.

Human staff have a lifetime of accumulated experiential knowledge AI cannot match overnight. We understand nuanced context and exceptions that AI must learn through rigorous training over time.

Consider mortgage underwriting, for example. Human underwriters draw on years judging risk across economic cycles and products. The intuition honed through different market conditions gives them flexible judgment.

In contrast, AI underwriting models start with a blank slate. The algorithms require extensive structured data on past loans of all types to recognize patterns and Inherently, some degradation in precision is expected early on. 

But consistent re-training will gradually get AI models to higher accuracy than humans mathematically can achieve. And that’s when the AI’s superpower kicks in: once they learn, they never forget.

Neither do they have bad days, get distracted or fail to act in a consistent manner. 

With the right expectations, AI delivers incredible benefits in speed, efficiency and scalability over time. But setting the implementation bar unrealistically high out of the gate slows progress.

Share With
Learn How AI is Improving Closing Disclosure Process
Introducing Areal's Automated Closing Disclosure Balancer – a game-changer in the industry. Say goodbye to the complexities of comparing different CDs.