My Favorite Data Structure – The Map of Lists
Last week, I wrote about Big O Notation in Apex and how having a O(n) was a perfectly acceptable solution for any Apex problem. However, trying to avoid having to go through nested for loops is often a tricky proposition when you have to process so much data in a single SOQL query.
Probably much to the displeasure of my co-workers, I like to utilize those series of for loops to build up complicated data structures. These structures organize the data in a way that is easy to access later on in my code. And the one data structure that I use the most often is the map of lists.
The map of lists comes up whenever I need to have a list of records “grouped by” a particular value. This is very common in Salesforce as it matches up the parent-child data model created by lookup relationships. By establishing a map of lists, you are able to organize a list of child records by their parent records.
I will throw out the hypothetical example of rolling up the last names of all Contacts for an Account into a custom field on Account. We want to trigger to fire off on Contact insert, update, or delete. How would we be able to access the Name field on the Account object that corresponds to the AccountID on Contact?
First, we will explore the incorrect route: the nested for loop.
trigger RollUpLastNames on Contact (after insert, after update, after delete) { // Gather Account IDs Set<ID> accntIds = new Set<ID>(); for (Contact c : Trigger.new) accntIds.add(c.AccountId); // Query for all Contacts in those Accounts List<Contact> allContacts = [SELECT LastName, AccountID FROM Contact WHERE AccountID =: accntIds]; // initialize list of Accounts with blank values in custom field List<Account> allAccounts = new List<Account>(); for (ID accntId : accntIds){ Account a = new Account(Id = accntId); a.LastName__c = ''; allAccounts.add(a); } // search and append to custom field if it is Contact's Account for (Contact c: allContacts){ for (Account a: allAccounts){ if (a.Id == c.AccountID){ if (a.LastName__c != '') a.LastName__c += ', '; a.LastName__c += c.LastName; } } } // update Accounts update allAccounts; }
Note that each Contact can only belong to one Account, however, we have to search through every Account just to find the correct one. If we assume that the sample size of Accounts is n and the sample size of Contacts is also n, then this evaluates to O(n²).
Now we will look at how we use the map of lists to avoid the quadratic runtime. The basic procedure for creating the map of lists goes like this:
- Initialize a map of lists variable.
- Loop through your data set
- Retrieve the list from your map with the key on the record.
- If list does not exist, initialize a new list and put it into map by key.
- Put record into list.
So now we will rewrite our code to define lists of Contacts grouped by their AccountIDs and then use this map to roll up the names.
trigger RollUpLastNames on Contact (after insert, after update, after delete) { // Gather Account IDs Set<ID> accntIds = new Set<ID>(); for (Contact c : Trigger.new) accntIds.add(c.AccountId); // Query for all Contacts in those Accounts List<Contact> allContacts = [SELECT LastName, AccountID FROM Contact WHERE AccountID =: accntIds]; // group each Contact by AccountID Map<ID, List<Contact>> contactsByAccount = new Map<ID, List<Contact>>(); for (Contact c : allContacts){ // get Contact List inside of map List<Contact> contactList = contactsByAccount.get(c.AccountId); // initialize List if not created yet if (contactList == null){ contactList = new List<Contact>(); contactsByAccount.put(c.AccountId, contactList); } // add Contact to List contactList.add(c); } // initialize list of Accounts to update List<Account> allAccounts = new List<Account>(); // loop through map for (ID key : contactsByAccount.keySet()){ // init Account Account a = new Account(Id = key); a.LastName__c = ''; allAccounts.add(a); // go through Contact list for this Account for (Contact c : contactsByAccount.get(key)){ if (a.LastName__c != '') a.LastName__c += ', '; a.LastName__c += c.LastName; } } // update Accounts update allAccounts; }
Yes, there is a nested for loop in this example as well, but does this code also going to evaluate to O(n²)?
The answer is no. In the first for loop, we divided our Contacts into groups based on AccountID, so each Contact will only belong to one group. If we split 5 Contacts into one group of 3 Contacts and another group of 2, we still only have 5 Contacts in total.
To evaluate, the outer loop consists of just the Accounts, so its runtime is O(a). The inner loop is Contacts divided into Accounts, so it is O(c/a). When you multiply (a * (c/a)), you get just O(c), which is linear.
This is just a basic example, but it illustrates how the map of lists can be used to avoid the O(n²) runtime. So when you run into a complicated problem, try to play around with complicated data structures to help you along.