Large Data Volume

Best Practices for Deployments with Large Data Volumes

APEX

Introduction

A “large data volume” is an unlimited, elastic term. If your deployment has tens of thousands of users, tens of millions of records, or hundreds of gigabytes of total record storage, you have a large data volume.

Salesforce enables customers to easily scale up their applications from small to large amounts of data. This scaling usually happens automatically, but as data sets get larger, the time required for certain operations grows. The ways in which architects design and configure data structures and operations can increase or decrease those operation times by several orders of magnitude.

What Technique we should need to consider:-

  • Techniques for improving the performance of applications with large data volumes
  • Salesforce mechanisms and implementations that affect performance in less-than-obvious ways
  • Salesforce mechanisms designed to support the performance of systems with large data volumes

Salesforce Big Objects:-

A big object stores and manages massive amounts of data on the Salesforce platform. You can archive data from other objects or bring massive datasets from outside systems into a big object to get a full view of your customers. A big object provides consistent performance, whether you have 1 million records, 100 million, or even 1 billion. This scale gives a big object its power and defines its features.

Now we focuses on optimizing large data volumes stored in standard and custom objects, not big objects. For optimal performance and a sustainable long-term storage solution for even larger data sets, use Bulk API or Batch Apex to move your data into big objects.

These big object behaviors ensure a consistent and scalable experience.

  • Big objects support only object and field permissions, not regular or standard sharing rules.
  • Features like triggers, flows, processes, and the Salesforce mobile app are not supported on big objects.
  • When you insert an identical big object record with the same representation multiple times, only a single record is created so that writes can be idempotent. This behavior is different from an sObject, which creates a record for each request to create an object.

Infrastructure for Systems with Large Data Volumes

  1. Lightning Platform Query Optimizer

Creating Efficient Queries
Determine the Selectivity of Your Filter Condition
Use SOQL to Determine the Selectivity of a Filter Condition
Determine the Selectivity of More Complex Filter Conditions
Use SOQL to Determine the Selectivity of a Filter Condition
Determine the Selectivity of More Complex Filter Conditions
Understand the Impact of Deleted Records on Selectivity

https://developer.salesforce.com/docs/atlas.en-us.salesforce_large_data_volumes_bp.meta/salesforce_large_data_volumes_bp/ldv_deployments_infrastructure_salesforce_query_optimizer.htm

2. Database Statistics

Modern databases gather statistics on the amount and types of data stored inside of them, and they use this information to execute queries efficiently. Because of Salesforce’s multitenant approach to software architecture, the platform must keep its own set of statistical information to help the database understand the best way to access the data. As a result, when large amounts of data are created, updated, or deleted using the API, the database must gather statistics before the application can efficiently access data. Currently, this statistics-gathering process runs on a nightly basis.

3. Skinny Tables

https://developer.salesforce.com/docs/atlas.en-us.salesforce_large_data_volumes_bp.meta/salesforce_large_data_volumes_bp/ldv_deployments_infrastructure_skinny_tables.htm

4. Indexes

https://developer.salesforce.com/docs/atlas.en-us.salesforce_large_data_volumes_bp.meta/salesforce_large_data_volumes_bp/ldv_deployments_infrastructure_indexes.htm

5. Divisions

Divisions are a means of partitioning the data of large deployments to reduce the number of records returned by queries and reports.
For example, a deployment with many customer records might create divisions called USEMEA, and APAC to separate the customers into smaller groups that are likely to have few interrelationships.

Salesforce provides special support for partitioning data by divisions, which you can enable by contacting Salesforce Customer Support.

Techniques for Optimizing Performance

  1. Using Mashups:-

One approach to reducing the amount of data in Salesforce is to maintain large data sets in a different application, and then make that application available to Salesforce as needed.
Salesforce refers to such an arrangement as a mashup because it provides a quick, loosely coupled integration of the two applications. Mashups use Salesforce presentation to display Salesforce-hosted data and externally hosted data.Salesforce supports the following mashup designs.External WebsiteThe Salesforce UI displays an external website, and passes information and requests to it. With this design, you can make the website look like part of the Salesforce UI.CalloutsApex code allows Salesforce to use Web services to exchange information with external systems in real time.

2. Defer Sharing Calculation

defer sharing calculation, which allows users to defer() the processing of sharing rules until after new users, rules, and other content have been loaded.

An organization’s administrator can use a defer sharing calculation permission to suspend and resume sharing calculations, and to manage two processes: group membership calculation and sharing rule calculation. The administrator can suspend these calculations when performing a large number of configuration changes, which might lead to very long sharing rule evaluations or timeouts, and resume calculations during an organization’s maintenance period. This deferral can help users process a large number of sharing-related configuration changes quickly during working hours, and then let the recalculation process run overnight between business days or over a weekend.

3. Using SOQL and SOSL

A query to retrieve an account by its foreign key account number can look like this

SELECT Name
   FROM Account
   WHERE Account_ID___c = :acctid;

if (rows found == 0) return "Not Found"

If acctid is null, the entire Account table is scanned row by row until all data is examined.

It’s better to rewrite the code as:

if (acctid != null) {
   SELECT Name
      FROM Account
      WHERE Account_Id___c = :acctid
}
else {
    return "Not Found"
}

4. Deleting Data

Salesforce data deletion mechanism can have a profound effect on the performance of large data volumes. Salesforce uses a Recycle Bin metaphor for data that users delete. Instead of removing the data, Salesforce flags the data as deleted and makes it visible through the Recycle Bin. This process is called soft deletion. While the data is soft deleted, it still affects database performance because the data is still resident, and deleted records have to be excluded from any queries.

 Bulk API and Bulk API 2.0 support a hard delete option, which allows records to bypass the Recycle Bin and become immediately available for deletion. We recommend that you use the Bulk API 2.0’s hard delete function to delete large data volumes.

Best Practice

  1. Reporting:-

Reducing the number of joins
Reducing the amount of data returned
Reducing the number of records to query

2. Loading Data from the API:-

GOALBEST PRACTICES
Reducing data to transfer and process
When updating, send only fields that have changed (delta-only loads).
Avoiding unnecessary overhead
For custom integrations, authenticate once per load, not on each record.
Avoiding computationsUse Public Read/Write security during initial load to avoid sharing calculation overhead
Reducing computations

If possible for initial loads, populate roles before populating sharing rules.

1. Load users into roles.
2. Load record data with owners, triggering calculations in the role hierarchy.
3. Configure public groups and queues, and let those computations propagate.
Add sharing rules one at a time, letting computations for each rule finish before adding the next one.

If possible, add people and data before creating and assigning groups and queues.

1. Load the new users and new record data.
2. Optionally, load new public groups and queues.
3. Add sharing rules one at a time, letting computations for each rule finish before adding the next one.
Deferring computations and speeding up load throughput
Disable Apex triggers, workflow rules, and validations during loads; investigate the use of batch Apex to process records after the load is complete.
Minimizing parent record-locking conflicts
When changing child records, group them by parent—group records by the field ParentId in the same batch to minimize locking conflicts.
Deferring sharing calculations
Use the defer sharing calculation permission to defer sharing calculations until after all data has been loaded.
Avoiding loading data into Salesforce
Use mashups to create coupled integrations of applications.

3. SOQL and SOSL:-

Avoid querying on formula fields, which are computed in real time
Use the most appropriate language, SOQL or SOSL, for a given search
Execute queries with null values in a WHERE filter for picklists or foreign key fields
Build efficient SOQL and SOSL queries

Leave a Reply