Managing Lock Contention on IBM i in Modern Architectures

🔍 How IBM i Locking Was Designed to Work

To understand the problem, it helps to look at the original design assumptions.

Classic IBM i environments were built for a small number of interactive users — typically 20 to 100 — with predictable, human-paced transactions. A user would initiate a task, the system would process it in seconds, and locks would be held for milliseconds. Transaction boundaries were clear, network latency was minimal, and commitment control was managed explicitly by the application.

In that world, a 50ms lock duration was never a problem.

⚡ What Has Changed

Modern IBM i systems operate under very different conditions.

Where you once had 50 concurrent users, you now have hundreds or thousands of concurrent API requests. Where transactions once took seconds, machine-driven processes now complete in milliseconds — or time out trying. And where the path from application to database was once direct, today it runs through API gateways, load balancers, service meshes, and caching layers, each adding latency.

The core problem: A 50ms lock that was perfectly acceptable in a traditional environment becomes a serious bottleneck when extended to 200ms by network latency, multiplied across thousands of simultaneous requests.

🛡️ The Service Layer: Your First Line of Defense

Many IBM i modernization projects start by exposing database tables directly through APIs. This seems simple, but it creates real problems — no centralized transaction control, inconsistent lock patterns across endpoints, and business logic scattered across multiple layers.

A better approach introduces a service layer between your API and your tables:

API Layer

↓

Service Program / SQL Stored Procedure

↓

DB2 for i Tables

📌 Key principle: The lock is acquired, used, and released inside the procedure. The API layer never holds a lock while doing other work.

Example: SQL Stored Procedure

The following procedure handles an inventory lookup. It acquires and releases its lock entirely within procedure execution.

SQL — GET_INVENTORY Stored Procedure

CREATE PROCEDURE GET_INVENTORY(
    IN P_ITEM_ID CHAR(10),
    OUT P_QTY_ON_HAND INT,
    OUT P_LAST_UPDATE TIMESTAMP,
    OUT P_STATUS_CODE INT
)
LANGUAGE SQL
SPECIFIC GET_INV_001
BEGIN
    DECLARE SQLCODE INT DEFAULT 0;

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

    SELECT QTY_ON_HAND, LAST_UPDATE
    INTO P_QTY_ON_HAND, P_LAST_UPDATE
    FROM INVENTORY
    WHERE ITEM_ID = P_ITEM_ID;

    IF SQLCODE = 0 THEN
        SET P_STATUS_CODE = 0;
    ELSE
        SET P_STATUS_CODE = SQLCODE;
    END IF;

    COMMIT;
END;

GRANT EXECUTE ON PROCEDURE GET_INVENTORY TO APIUSER;

Example: RPG Service Program

The same principle applies in RPG. Note the immediate COMMIT after the read — this is what keeps lock duration short.

RPGLE — GetInventory Service Program

**FREE
DCL-PROC GetInventory EXPORT;
    DCL-PI *N INT;
        itemId CHAR(10) CONST;
        qtyOnHand INT;
        lastUpdate TIMESTAMP;
    END-PI;

    DCL-S returnCode INT INZ(0);

    IF NOT %OPEN(INVENTORY);
        OPEN INVENTORY;
    ENDIF;

    CHAIN (itemId) INVENTORY;

    IF %FOUND(INVENTORY);
        qtyOnHand = INV_QTY;
        lastUpdate = INV_LUPDT;
        COMMIT;              // Release lock immediately after read
        returnCode = 0;
    ELSE;
        returnCode = -1;
    ENDIF;

    RETURN returnCode;
END-PROC;

Example: Calling the Service from Node.js

JavaScript — Node.js API calling the service procedure

app.get('/inventory/:itemId', async (req, res) => {
    let connection;
    try {
        connection = await ibmiJdbcPool.reserve();

        const result = await connection.conn.prepareCall(
            '{CALL GET_INVENTORY(?, ?, ?, ?)}'
        );

        result.setString(1, req.params.itemId);
        result.registerOutParameter(2, java.sql.Types.INTEGER);
        result.registerOutParameter(3, java.sql.Types.TIMESTAMP);
        result.registerOutParameter(4, java.sql.Types.INTEGER);

        await result.execute();

        const statusCode = result.getInt(4);

        if (statusCode === 0) {
            res.json({
                itemId: req.params.itemId,
                qtyOnHand: result.getInt(2),
                lastUpdate: result.getTimestamp(3)
            });
        } else {
            res.status(404).json({ error: 'Item not found' });
        }
    } catch (error) {
        res.status(500).json({ error: error.message });
    } finally {
        if (connection) await ibmiJdbcPool.release(connection);
    }
});

🚨 Challenge 1: API-Driven Lock Amplification

Exposing IBM i data through REST APIs extends lock duration in ways that are easy to overlook. Three factors are usually responsible:

Network latency — A traditional on-premise app might reach IBM i in 5ms. A cloud-hosted API server may take 150ms or more. Every millisecond of that round trip is time the lock is held.
Connection pooling — When connections are held open longer than necessary, the locks associated with them are also held longer. Pool exhaustion causes cascading delays as requests queue up.
Retry logic — Automatic retries on failure seem helpful, but during periods of contention they amplify the problem. Every retry is another lock acquisition attempt.

🔴 A Real-World Example

An e-commerce company exposed inventory data through a REST API. The implementation looked reasonable on the surface — but had one critical flaw: the database connection and the lock were held open for the entire request, including an external pricing API call.

❌ Problematic — lock held for entire request duration

app.get('/inventory/:itemId', async (req, res) => {
    const connection = await ibmiJdbcPool.getConnection();

    const result = await connection.query(
        'SELECT * FROM INVENTORY WHERE ITEM_ID = ?',
        [req.params.itemId]
    );

    // Lock still held during data enrichment
    const enrichedData = await enrichInventoryData(result);

    // Lock still held during this external API call
    const pricing = await externalPricingService.getPrice(result.itemId);

    connection.release();  // Finally released — too late

    res.json({ ...enrichedData, pricing });
});

Result: At 500 concurrent users, 30% of requests were timing out. The lock was held for the full 200ms+ round trip to an external pricing service.

🟢 The Fix

Release the connection as soon as you have the data. Do all subsequent processing without holding the lock.

✅ Optimized — connection released immediately after data retrieval

app.get('/inventory/:itemId', async (req, res) => {
    let inventoryData;

    // Scope 1: acquire, query, commit, release
    {
        const connection = await ibmiJdbcPool.getConnection();
        try {
            inventoryData = await connection.query(
                'SELECT ITEM_ID, QTY_ON_HAND, LAST_UPDATE FROM INVENTORY WHERE ITEM_ID = ?',
                [req.params.itemId]
            );
            await connection.commit();
        } finally {
            connection.release(); // Back to pool after ~50ms
        }
    }

    // Scope 2: all processing happens without any lock held
    const enrichedData = await enrichInventoryData(inventoryData);
    const pricing = await externalPricingService.getPrice(inventoryData.itemId);

    res.json({ ...enrichedData, pricing });
});

Results after this single change:

Lock duration: 200ms → 50ms (75% reduction)
Timeout rate: 30% → 2%
Throughput: more than doubled (500 → 1,200 req/s)

Additional Best Practices

Use separate connection pools for reads and writes, with isolation levels appropriate to each workload:

JavaScript — Separate read/write connection pools

// Read pool — no record locks acquired
const readPool = createPool({
    url: 'jdbc:as400://ibmi.company.com',
    drivername: 'com.ibm.as400.access.AS400JDBCDriver',
    properties: {
        'transaction isolation': 'read uncommitted',
        'libraries': 'PRODLIB'
    }
});

// Write pool — standard record locking
const writePool = createPool({
    url: 'jdbc:as400://ibmi.company.com',
    drivername: 'com.ibm.as400.access.AS400JDBCDriver',
    properties: {
        'transaction isolation': 'read committed',
        'libraries': 'PRODLIB'
    }
});

For frequently accessed, slowly changing data, add a caching layer to avoid hitting the database at all:

JavaScript — Redis caching layer

app.get('/inventory/:itemId', async (req, res) => {
    // Check cache first — no database hit at all
    const cached = await cache.get(`inventory:${req.params.itemId}`);
    if (cached) return res.json(JSON.parse(cached));

    // Cache miss — query database
    const data = await getInventoryFromDB(req.params.itemId);

    // Store for 60 seconds
    await cache.setex(`inventory:${req.params.itemId}`, 60, JSON.stringify(data));

    res.json(data);
});

☁️ Challenge 2: Cloud Integration Latency

When your application moves to the cloud, the distance between your app server and IBM i grows significantly.

Deployment	Latency to IBM i	Lock Duration Impact
On-premise	5ms	Minimal
Cloud (same region)	50ms	Moderate
Cloud (different region)	200ms	Severe at scale

✅ The Solution: Event-Driven Architecture

For write operations, replace synchronous calls with a queue-based approach. The API publishes an event and returns immediately. A separate worker handles all IBM i database work independently.

❌ Synchronous — lock held for full cloud round trip

app.post('/orders', async (req, res) => {
    const connection = await ibmiJdbcPool.getConnection();
    await connection.query('INSERT INTO ORDERS ...');
    await connection.query('UPDATE INVENTORY ...');
    await connection.commit(); // Lock held for 200ms
    res.json({ success: true });
});

✅ Event-driven — no lock held during API response

// API returns immediately — no lock held
app.post('/orders', async (req, res) => {
    await messageQueue.publish('order.created', req.body);
    res.json({ orderId: generateId(), status: 'processing' });
});

// Worker handles IBM i writes independently
messageQueue.subscribe('order.created', async (order) => {
    const connection = await ibmiJdbcPool.getConnection();
    try {
        await connection.query('INSERT INTO ORDERS ...');
        await connection.query('UPDATE INVENTORY ...');
        await connection.commit(); // Lock held only during actual DB work
    } finally {
        connection.release();
    }
});

Benefits:

User gets a fast response (20ms vs 200ms)
IBM i lock held only during actual database work
System is more resilient — queue buffers work during spikes

🤖 Challenge 3: AI and Analytics Lock Contention

AI training jobs and analytical queries often scan millions of rows. On IBM i, those scans hold shared locks — and while those locks are held, OLTP updates have to wait. A single long-running query can disrupt production operations for minutes.

❌ Problematic — scans millions of rows, holds shared locks for minutes

SELECT
    o.order_date,
    o.order_amount,
    c.customer_segment,
    p.product_category
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE o.order_date >= CURRENT_DATE - 2 YEARS;

✅ Solution 1: Use READ UNCOMMITTED for Analytics

For queries that do not require perfect precision, READ UNCOMMITTED eliminates shared locks entirely. The query reads what is on disk without waiting for or acquiring locks — with zero impact on concurrent OLTP operations.

✅ SQL — READ UNCOMMITTED eliminates lock contention

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

SELECT
    order_date,
    SUM(order_amount) AS total_sales
FROM orders
WHERE order_date >= CURRENT_DATE - 90 DAYS
GROUP BY order_date;

-- No record locks acquired. No impact on OLTP.

✅ Solution 2: Replicate Data for Analytics

For workloads that need accurate data, replicate your production tables to a separate analytical library using IBM i journaling. AI and analytics queries run against the replica — production is completely untouched.

✅ SQL — Journaling and analytical replica setup

-- Set up journaling on production table
STRJRNPF FILE(PRODUCTION/ORDERS) JRN(PRODUCTION/ORDJRN);

-- All AI and analytics queries run against the replica
-- Production locks are never affected
SELECT * FROM ANALYTICS.ORDERS WHERE ...;

🗺️ Implementation Roadmap

Phase 1 · Weeks 1–2

Assessment

Identify lock hotspots using QSYS2.OBJECT_LOCK_INFO. Document every integration point: APIs, cloud connections, analytics queries, and microservice dependencies.

Phase 2 · Weeks 3–4

Quick Wins

Add caching to frequently accessed APIs. Apply READ UNCOMMITTED to analytical queries. Tighten connection pooling. Add circuit breakers to external service calls.

Phase 3 · Months 2–3

Architectural Changes

Introduce service programs or stored procedures as your API service layer. Migrate high-latency write paths to event-driven patterns. Set up replicated libraries for analytical workloads.

Phase 4 · Ongoing

Continuous Optimization

Schedule regular lock contention reviews. Add performance testing to your integration pipeline. Keep your team trained on modern IBM i patterns as your architecture evolves.

Identifying Lock Hotspots

SQL — Lock hotspot identification query

SELECT
    OBJECT_NAME,
    OBJECT_LIBRARY,
    COUNT(*) AS lock_count,
    AVG(LOCK_DURATION) AS avg_duration_ms,
    MAX(LOCK_DURATION) AS max_duration_ms
FROM QSYS2.OBJECT_LOCK_INFO
WHERE LOCK_STATE = 'HELD'
GROUP BY OBJECT_NAME, OBJECT_LIBRARY
ORDER BY lock_count DESC;

📊 Measuring Success

Track these metrics before and after each change. The numbers below represent what a well-executed modernization effort typically achieves.

Metric	Before	After
Lock contention rate	30%	<5%
Average API response time	250ms	100ms
Timeout rate	5%	<1%
Concurrent capacity	500 req/s	2,000 req/s

SQL — Daily lock contention monitoring report

SELECT
    DATE(CURRENT_TIMESTAMP) AS report_date,
    COUNT(*) AS total_locks,
    SUM(CASE WHEN LOCK_STATE = 'WAIT' THEN 1 ELSE 0 END) AS waiting_locks,
    AVG(CASE WHEN LOCK_STATE = 'HELD'
        THEN TIMESTAMPDIFF(2, CHAR(CURRENT_TIMESTAMP - LOCK_REQUEST_TIME))
        ELSE 0 END) AS avg_hold_time_seconds
FROM QSYS2.OBJECT_LOCK_INFO
GROUP BY DATE(CURRENT_TIMESTAMP);

🔍 How IBM i Locking Was Designed to Work

⚡ What Has Changed

🛡️ The Service Layer: Your First Line of Defense

Example: SQL Stored Procedure

Example: RPG Service Program

Example: Calling the Service from Node.js

🚨 Challenge 1: API-Driven Lock Amplification

🔴 A Real-World Example

🟢 The Fix

Additional Best Practices

☁️ Challenge 2: Cloud Integration Latency

✅ The Solution: Event-Driven Architecture

🤖 Challenge 3: AI and Analytics Lock Contention

✅ Solution 1: Use READ UNCOMMITTED for Analytics

✅ Solution 2: Replicate Data for Analytics

🗺️ Implementation Roadmap

Identifying Lock Hotspots

📊 Measuring Success

💡 Conclusion