October 21, 2014

Dealing with Big O in Apex

By Scott Tips 0 Comments

Big O notation is probably the bane of many Computer Science undergrads’ existence. It most certainly was one of mine back in my college days. When you are young and learning programming for the first time, why would you want to waste your time on boring lectures learning about algorithm theories and optimization?

When you are a student, you wanted to get in front of a computer and start coding! You felt that it did not really matter how you went about coding up a project. As long as it compiled and provided the expected output, you marked it as a success.

I did not understand that Big O notation and those advanced algorithms were actually going to be needed until I entered the job market. They became very important to me during my technical interviews with potential employers. In a world of emerging technologies and multiple programming languages, those lectures and core concepts from your Computer Science classes are a common language we all can speak.

I remember having a technical interview at Amazon about a month before I got into Salesforce. I was in their office in Seattle and working out a programming problem on a white board in one of their conference rooms. I forget what the question was, but I was struggling to find a solution to it that did not involve using a nested for loop.

“Can you do any processing of the data before entering the for loop?” the interviewer suggested.

I thought about this some more and came up with a solution that involved creating another for loop before entering my original loop. But this did not seem right to me. The whole point was that I was trying to avoid looping through all the data a second time, right?

When I expressed my doubts to the interviewer, he asked, “What’s the Big O notation of your solution?”

Well, one loop is going to be n, the second loop is going to be n, so together they will be 2n…

“And…”

… and you drop the leading 2 from n, so it’s O(n). It’s linear!

“Right,” the interviewer said with a smile. “And linear is a perfectly acceptable solution.”

I obviously did not land that job, otherwise I would have never gotten into Salesforce in the first place. But that realization that multiple for loops still yield an acceptable solution became o core philosophy of mine when it comes to Salesforce development in Apex.

Big O represents a summary of how well your code runs when scaled up to large amount of data, possibility going to infinity. For example, when your code has a Big O of O(n) – linear – it means that your code will basically run proportionally to the size of your data. So twice as much data going into your code will basically double your run time.

So let’s look at this code:

Integer count = 0;

// first loop
for (integer x = 0; x < n; x++)
	count++;

// second loop
for (integer y = 0; y < n; y++)
	count++;

If we run this code where the size of n is 5, then count would equal 10 because it looped 10 times. Five times for the first loop and five times for the second loop. If we double the size of n to 10, it will loop 20 times. Doubling the input meant doubling the loops. Tripling the input will triple the run time.

If the code is O(n²), then we consider that a quadratic runtime. This is best shown with the nested for loop:

Integer count = 0;

// first loop
for (integer x = 0; x < n; x++){

	// second loop
	for (integer y = 0; y < n; y++)
		count++;

}

In this example, when n is 5, it will loop around 25 times. If you set it to 10, then it loops around 100 times! Twice as much input data will quadruple your run time, and three times the data will yield nine times the run time!

Because of its unwieldy nature, O(n²) is a big no-no in Computer Science and should be avoided at all costs. This is especially true in Salesforce because you could very well be handling large amounts of data in your org and you will go over your rate limits.

But having a series of non-nested for loops? Go nuts!

In fact, I believe that having a series of non-nested for loops is one of the core strategies for handling most limits in Salesforce. Whenever I have to make a SOQL query, I often find myself making a for loop to gather IDs and values, then making the query, then making another for loop to process the results of the query.

In this example trigger on Contact, I gather all the Account IDs for the Contacts, query for Account names, then assigned the reverse of the Account name to a custom field on those Contacts.

trigger reverseAccountNames on Contact (before insert) {
	// first loop, gather Account Ids
	List<String> accountIds = new List<String>();
	for (Contact c : Trigger.new){
		accountIds.add(c.AccountID);
	}
	
	// query account names
	Map<Id, Account> accountMap = new Map<Id,Account>([SELECT Id, Name FROM Account WHERE Id =: accountIds]);
	
	// second loop, reverse and assign name
	for (Contact c: Trigger.new){
		Account a =  accountMap.get(c.AccountId);
		String reverseName = a.Name.reverse();
		c.Reverse_Account__c = reverseName;
	}
	
}

It might seem a little silly to repeat the exact same for loop criteria one after the other like this, but if you were to merge them together with the SOQL query inside of the loop, you’ll run into much bigger problems with the SOQL query limits.

And when you evaluate the Big O value of this trigger, it will still be O(n) no matter how many loops you have.

And anything with O(n) is a perfectly acceptable solution!

About Author

Scott

Scott is a Certified Advanced Developer with 10 years of Salesforce experience. Scott has helped companies of all sizes run their business better by designing customized business applications on the Salesforce platform, creating and maintaining apps sold on the Salesforce AppExchange, and coordinating data flow between Salesforce and external systems by using API’s and third-party integration tools.