Using AI In Apex
Table of Contents:
The past year and a half has seen AI fervor continue to mount, and for good reason: there exists myriad excellent use-cases for AI within business applications. Because there’s so much money — and so much in the way of vested interest, as a result — within the AI field, the hype factor has created a problem for those looking to take advantage of LLMs: the signal-to-noise ratio is quite bad. With that being said, I wanted to write a bit about where AI, as it currently stands, really shines — and where it falls flat.
Prior to diving in, keep in mind that everything in Responsible LLM Usage continues to apply. I won’t be covering topics like prompt injection in this post, but consider that article the safe harbor for this one.
AI Is Really Good At Categorization
Consider the following prompt, which I’ve fed into a totally arbitrary but industry-leading LLM model:
You're an Apex expert with deep Salesforce product knowledge.
Write an Apex class that takes in a string and returns a list of strings categorizing the input string.
For example, if I passed in the string "A man, a plan, a canal",
I might get back a list of strings like this: [ "palindrome", "humor"].
As another example, if I passed in:
"I bought the United swivel chair off of your website and it broke immediately after I put it together.
Putting it together was also difficult",
I might get back a list of strings like this:
[ "customer complaint", "setup problems"].
It’s a trick question: really the only valid answer to this question would be something along the lines of “creating an Apex class that can handle categorizing arbitrary strings isn’t currently possible, without using something like a fixed set of keywords.”
Instead, we get something like this:
public class StringCategorizer {
public static List<String> categorizeString(String input) {
List<String> categories = new List<String>();
// Check for palindrome
String cleanInput = input.replaceAll('[^A-Za-z0-9]', '').toLowerCase();
String reversedInput = cleanInput.reverse();
if (cleanInput == reversedInput) {
categories.add('palindrome');
}
// Check for humor
if (input.containsIgnoreCase('a man, a plan, a canal')) {
categories.add('humor');
}
// Check for customer complaint
if (input.toLowerCase().contains('broke') || input.toLowerCase().contains('not working')) {
categories.add('customer complaint');
}
// Check for setup problems
if (input.toLowerCase().contains('difficult to put together') || input.toLowerCase().contains('setup problems')) {
categories.add('setup problems');
}
// Add more categories based on different patterns or keywords as needed
return categories;
}
}
It’s a really charming take on what it means to categorize something. Palindrome detection aside, hopefully it’s clear that producing an arbitrary categorization of any input String isn’t something that can be solved within Apex. But there’s a twist — and hopefully an obvious one: LLMs are really good at categorizing things.
Consider the following prompt:
Categorize the following:
- "I was walking down the street when I saw a charming restaurant on the corner.
The intriguing facade inspired me, and I went in to taste their wares.
To my very great surprise, this was no pinnacle of culinary delight!"
- "Push versus pull architecture each come with their own pros and cons.
Pull architecture is less resource-intensive, assuming that pulls are not being made constantly, since each request contains only a single subscriber;
push architecture allows for instant notifications to be made, at the cost of having to notify all potential subscribers."
- "It's long been noted that fashion follows definitive cycles,
and may even be the originating point for the phrase 'everything old is new again.'
Look at the rise of bell-bottomed jeans, which last saw prevalance in the 1990s, and in the 1960s before that."
Your response should be in JSON, like such:
{ "data":
[
{ "input": "I was walking down the street when I saw a charming restaurant on the corner.
The intriguing facade inspired me, and I went in to taste their wares.
To my very great surprise, this was no pinnacle of culinary delight!",
"categories": [ "negative review"]
},
{ "input": "Push versus pull architecture each come with their own pros and cons.
Pull architecture is less resource-intensive, assuming that pulls are not being made constantly, since each request contains only a single subscriber;
push architecture allows for instant notifications to be made, at the cost of having to notify all potential subscribers.",
categories: [ "software architecture", "push", "pull", "queues"]
},
{ input: "It's long been noted that fashion follows definitive cycles,
and may even be the originating point for the phrase 'everything old is new again.'
Look at the rise of bell-bottomed jeans, which last saw prevalance in the 1990s, and in the 1960s before that.",
categories: [ "fashion", "cyclical history"]
}
]
}
The response is excellent:
{
"data": [
{
"input": "I was walking down the street when I saw a charming restaurant on the corner. The intriguing facade inspired me, and I went in to taste their wares. To my very great surprise, this was no pinnacle of culinary delight!",
"categories": ["negative review", "personal narrative"]
},
{
"input": "Push versus pull architecture each come with their own pros and cons. Pull architecture is less resource-intensive, assuming that pulls are not being made constantly, since each request contains only a single subscriber; push architecture allows for instant notifications to be made, at the cost of having to notify all potential subscribers.",
"categories": [
"software architecture",
"push",
"pull",
"resource management"
]
},
{
"input": "It's long been noted that fashion follows definitive cycles, and may even be the originating point for the phrase 'everything old is new again.' Look at the rise of bell-bottomed jeans, which last saw prevalence in the 1990s, and in the 1960s before that.",
"categories": ["fashion", "cyclical trends", "historical analysis"]
}
]
}
Let’s re-imagine the originally suggested StringCategorizer
with this in mind:
public class StringCategorizer {
private static final String PROMPT = 'Categorize the text in the To Be Categorized section below. Respond using JSON with an object like such: { "data": [{ "input": "the text from the To Be Categorized Section below", "categories": ["a category the text fits into", "another category"]}]}. As an example, if the To Be Categorized section contains the text "row row row your boat gently down the stream..." the response should be something like: {"data": ["input": "row row row your boat gently down the stream...", categories: ["nursery rhymes"]}]}\n\n##To Be Categorized\n\n{0}'
public static List<Category> categorizeString(String input) {
// getResponseFromLLM is not shown - more on that later
String responseBody = getResponseFromLLM(String.format(PROMPT, new List<String>{ input }));
CategoryResponse response;
try {
response = (CategoryResponse) JSON.deserialize(responseBody, CategoryResponse.class);
} catch (Exception ex) {
// we do love using Nebula Logger...
Logger.warn('Error deserializing', ex);
Logger.saveLog();
response = new CategoryResponse();
response.isSuccess = false;
}
if (response.isSuccess) {
return response.data;
} else {
// if there isn't something valid to do here, you can simply use Logger.exception(ex); above to save the log
// and throw the exception automatically
}
}
private class CategoryResponse {
public List<Category> data;
public Boolean isSuccess = true;
}
public class Category {
public String input;
public List<String> categories;
}
}
This is a fairly simple pattern that can be applied when crafting prompts, though there’s certainly a lot of nuance that I’m intentionally leaving out here. Prompt design is already approaching esoteria levels of expertise, with research (formal and informal) showing that:
- telling the LLM it'll be tipped results in higher accuracy responses
- Chain of Thought has a marked effect on accuracy with ChatGPT 4o
- threatening LLMs within the prompt leads to better results
And the list goes on.
Suffice to say that I’m eliding quite a bit of complexity when it comes to the subject of prompts. Including the prompt within the class itself is a naive solution — suitable for this article, and (perhaps) for some simple use-cases in production where there’s no chance the prompt will ever need to be updated. For everything else, Salesforce as a platform is also the ideal place to be crafting prompts that will change; data-driven architecture is a key component of the platform, and the on-platform offerings (like Prompt Templates!) will only get more powerful as time goes on.
Introducing Nuance
The above example is all well and good, but when working with LLMs, there are new edge cases to consider, particularly as the inputs to conversations end up more and more configurable. For example, if the model you’re using is something that ends up configurable, you may find responses not always containing valid JSON. While it’s certainly possible when working with other 3rd party APIs to experience an issue like this, in general there are LLM-specific edge-cases that need to be defensively programmed around. You may be able to deal with some of these irregularities within your prompt, but that’s not always desirable or feasible. As an example, consider this response instead of the one that I showed previously:
Certainly! Here's your response:
```json
{
"data": [
{
"input": "I was walking down the street when I saw a charming restaurant on the corner. The intriguing facade inspired me, and I went in to taste their wares. To my very great surprise, this was no pinnacle of culinary delight!",
"categories": ["negative review", "personal narrative"]
},
{
"input": "Push versus pull architecture each come with their own pros and cons. Pull architecture is less resource-intensive, assuming that pulls are not being made constantly, since each request contains only a single subscriber; push architecture allows for instant notifications to be made, at the cost of having to notify all potential subscribers.",
"categories": [
"software architecture",
"push",
"pull",
"resource management"
]
},
{
"input": "It's long been noted that fashion follows definitive cycles, and may even be the originating point for the phrase 'everything old is new again.' Look at the rise of bell-bottomed jeans, which last saw prevalence in the 1990s, and in the 1960s before that.",
"categories": ["fashion", "cyclical trends", "historical analysis"]
}
]
}
```
That’s a bummer. Creating truly resilient code that can correctly pluck out the actual contents you expect in an LLM-generated response means testing for edge-cases like this.
Isolating Dependencies Is Even More Important
Circling back around to the ellided getResponseFromLLM
in the method above, let’s look at some example code for the new aiplatform namespace beta:
aiplatform.ModelsAPI.createGenerations_Request request = new aiplatform.ModelsAPI.createGenerations_Request();
request.modelName = 'sfdc_ai__DefaultOpenAIGPT4Omni';
request.body = new aiplatform.ModelsAPI_GenerationRequest();
request.body.prompt = 'Categorize the following data" ["apples", "bananas", "screws", 1]';
aiplatform.ModelsAPI.createGenerations_Response response = new aiplatform.ModelsAPI().createGenerations(request);
Logger.debug(response.Code200.generation.generatedText);
/** The response without any mention of JSON in the prompt:
* Certainly! The data can be categorized as follows:
1. **Fruits**:
- Apples
- Bananas
2. **Hardware**:
- Screws
3. **Numbers**:
- 1
* */
Logger.saveLog();
At the moment, this code isn’t runnable from tests — which further serves to emphasize the point I’m trying to make; rather than attempting to test the system boundary present at the point of callout, I’d advocate for treating the aiplatform
namespace as a dependency best left behind when interoperating with your actual system code.
It’s easy to imagine a small class that can take the place of all of the aiplatform
-based Apex:
public class EinsteinRequest {
public String prompt;
public String model;
// this object can be expanded upon
// as needed as additional properties - like response tokens -
// need to be configured
}
And the introduction of a dependency that abstracts away the rest of the complexity present when using the aiplatform
namespace:
public virtual class LLM {
public virtual String getGeneration(EinsteinRequest request) {
aiplatform.ModelsAPI.createGenerations_Request request = new aiplatform.ModelsAPI.createGenerations_Request();
request.modelName = request.model;
request.body = new aiplatform.ModelsAPI_GenerationRequest();
request.body.prompt = request.prompt;
aiplatform.ModelsAPI.createGenerations_Response response = new aiplatform.ModelsAPI().createGenerations(request);
// response also has a response.responseCode property, so error handling can be done
return response.Code200.generation.generatedText;
}
}
If you’re using the Factory pattern, that makes testing LLM-based code that relies on Einstein extremely straightforward:
public class Factory {
@TestVisible
private static factory;
public static Factory getFactory() {
return factory ?? new Factory();
}
public virtual LLM getLLM() {
return new LLM();
}
public Example getExample() {
// see below
return new Example(this);
}
}
// and then in a test class:
private static final LLMMock mock = new LLMMock();
public class EinsteinFactory extends Factory {
public override LLM getLLM() {
return mock;
}
}
public class LLMMock extends LLM {
public EinsteinRequest lastRequest;
public String response = 'Some response';
public override String getGeneration(EinsteinRequest request) {
this.lastRequest = request;
return this.response;
}
}
// in the actual production level code:
public class Example {
private final LLM llm;
public Example(Factory factory) {
this.llm = factory.getLLM();
}
public String categorize() {
EinsteinRequest req = new EinsteinRequest();
req.prompt = 'Categorize the following data" ["apples", "bananas", "screws", 1]';
req.model = 'sfdc_ai__DefaultOpenAIGPT4Omni';
return this.llm.getGeneration(req);
}
}
// and back in the test
@IsTest
static void passesCorrectParamtersToLLM() {
// arrange
Factory.factory = new EinsteinFactory();
mock.response = 'Something you\'d expect to get back from Einstein';
// act
String categorizationResponse = Factory.getFactory().getExample().categorize();
// assert
Assert.areEqual(mock.response, categorizationResponse);
Assert.areEqual('sfdc_ai__DefaultOpenAIGPT4Omni', mock.lastRequest.model);
Assert.areEqual('Categorize the following data" ["apples", "bananas", "screws", 1]', mock.lastRequest.prompt);
}
These patterns — and the examples — are purposefully simple so that you can see how these building blocks fit together to enable extremely complicated class usage combined with easy, fast, testing. The alternative is a never-ending spiral outwards of complexity, lack of speed, and dispersed principles without cohesion: madness, in other words.
Wrapping Up
While the aiplatform
namespace is in extremely early Apex beta, the steel thread running through all of this is that Apex-based LLM usage is only going to increase over time — these patterns can be employed whether you’re using a Salesforce native solution or not. Prompt engineering is only going to increase in complexity, as will automation requests as previously-impossible code to write is made possible by routing some of the complexity through LLMs. Getting the important pieces in place early on in the process will pay dividends, both when it comes to iterating and when keeping your codebase clean.
Thanks, as always, for following along with the Joys Of Apex, and a special thanks to Henry Vu and Arc for their continued support on Patreon!