AI in Spend Analysis
Spend analysis is a critical first-step to establishing a truly effective procurement organization and presents a major challenge to organizations with limited time and resources. However, artificial intelligence, specifically machine learning, can increase speed to spend analysis results and overall accuracy while reducing the amount of manual input.
What is Machine Learning? Machine learning algorithms make one or more decisions or predictions (outputs) based on one or more (typically many) inputs, as shown in the block diagram below.
This blog post will focus on 2 types of machine learning that will improve your spend analyses. Let’s explore each type.
Supervised Learning is the most popular type of machine learning. It uses training data to construct a model that most accurately estimates the known output
Definition: Training Data – A set of inputs and correctly mapped outputs that is manually confirmed for accuracy
To build a supervised machine learning model, the algorithm architect feeds the algorithm a training data set to “Train” the model. The algorithm takes in the first input and predicts an output. Based on each iteration, the model compares a predicted input-output combination to the correct pair and corrects itself based on the “Learning rate” established by the human user.
Application: The most useful case for supervised learning is spend categorization. Supervised learning models can absorb information (inputs) from PO and Invoice line descriptions, supplier names, GL codes, and other data to predict a spend category (output). Supervised learning algorithms can provide confidence intervals for their predictions and send low-confidence predictions to a human user to verify.
Algorithm Examples, what’s inside the black box:
- Decision trees: A simple decision tree is shown below. Upon the outcome of each prediction, each branch probability is updated based on the learning rate
- Neural networks: uses multiple decision trees to predict an outcome and updates each branch probability based on the learning rate
Unsupervised learning, also known as cluster analysis, uses an algorithm to group data points based only on the input data itself. Unlike a supervised learning algorithm, unsupervised learning does not require output labels in the training dataset to establish the prediction model and does not correct itself based on known output labels.
The typical clustering algorithm initializes cluster assignment then refines the assignment and cluster center points based on a distance measure between data points within and around the clusters. The process repeats until clusters no longer change. The resulting assignment reveals information about the distributions and similarities within the dataset.
Vendor name normalization: The primary application for unsupervised learning in spend analysis is vendor name normalization, whereby vendor names are clustered. Many large companies that constitute a large portion of your spend will hold various names within your various data systems. Aggregating these names into a single name is important to show how much spend is going to certain suppliers so that you may identify your key suppliers.
[DELL FINANCIAL SERVICES, DELL MARKETING LP, DELL NV, DELLEMC, DMI DELL CORP BUS] Becomes DELL
[ORACLE, ORACLE AMERICA INC, ORACLE CORPORATION, ORACLE FINANCIAL SERVICES, ORACLE USA INC] becomes ORACLE
- K means
- Jaro-Winkler distance
- Gaussian mixture models
Machine learning algorithms, if implemented properly, can increase the speed and accuracy of your spend analyses, while reducing the amount of human interaction necessary to produce consistent results.