Quantcast
Channel: Jared Still – Official Pythian Blog
Viewing all articles
Browse latest Browse all 34

Enqueue Bytes – Is that a Pun?

$
0
0

Sometimes it is necessary to put on your uber-geek hat and start using cryptic bits of code to retrieve information from an Oracle database. Troubleshooting enqueue locking events in Oracle databases is one of the times some advanced SQL may be necessary.

Likely you have used SQL similar to the following when troubleshooting Oracle enqueue’s, probably in connection with row lock contention.

SQL# l
  1  SELECT
  2     s.username username,
  3     s.sid,
  4     e.event event,
  5     e.p1text,
  6     e.p1,
  7     e.state
  8  FROM v$session s, v$session_wait e
  9  WHERE s.username IS NOT NULL
 10     AND s.sid = e.sid
 11     AND e.event LIKE '%enq:%'
 12* ORDER BY s.username, UPPER(e.event)
 
USERNAME    SID EVENT                          P1TEXT                P1 STATE
---------- ---- ------------------------------ ---------- ------------- ----------
JKSTILL      68 enq: TX - ROW LOCK contention  name|mode     1415053318 WAITING
 
1 ROW selected.

The value for P1 is not very useful as is; Oracle has encoded the type of enqueue and the requested mode into the column. When working with current events such as when selecting from v$session, it is simple to determine the type of lock and the mode requested by querying v$lock, such as in the following example:

  1* SELECT sid, TYPE, request, block FROM v$lock WHERE sid=68 AND request > 0
SQL# /
 
 SID TY    REQUEST      BLOCK
---- -- ---------- ----------
  68 TX          6          0
 
1 ROW selected.

Session 68 is waiting on a TX enqueue with requested lock mode of 6. Seasoned Oracle DBA’s will recognize this as classic row lock contention.

Why bother to find out just which type of enqueue this is? There are many types of locks in Oracle, and they occur for differing reasons. The TX lock is interesting as it can occur not only in Mode 6 but Mode 4 as well; Mode 4 refers to locks that involve unique keys, such as when 2 or more sessions try to insert the same value for a primary key. The following example shows just that:

SQL# @s
 
USERNAME    SID EVENT                          P1TEXT                P1 STATE
---------- ---- ------------------------------ ---------- ------------- ----------
JKSTILL      68 enq: TX - ROW LOCK contention  name|mode     1415053316 WAITING
 
 
1 ROW selected.
 
SQL# @l
 
 SID TY    REQUEST      BLOCK
---- -- ---------- ----------
  68 TX          4          0
 
1 ROW selected.

Knowing just which lock mode is requested is vital, as the troubleshooting for TX Mode 4 locks will be different from what is used to troubleshoot Mode 6.

Though we can find the lock name and mode information in v$lock, there is still value in being able to decipher that cryptic P1 column.

The ASH and AWR facilities do not include any historical information for the lock name and mode; the P1 column found in v$active_session_history and dba_hist_active_sess_history does not have a corresponding dba_hist_lock view. Any research done after an event has occurred does require decoding this information.

Deciphering v$session.p1

Oracle Support document 34566.1 is the enqueue reference note that provides information needed to get the lock name and mode from the p1 column. As you will see this information is a bit puzzling.

The rest of this article will focus on TX Mode 6 locks. The value shown for this lock and mode in the P1 column is always 1415053318. Following is the SQL recommended by Oracle:

 SELECT chr(to_char(bitand(p1,-16777216))/16777215)||
         chr(to_char(bitand(p1, 16711680))/65535) "Lock",
         to_char( bitand(p1, 65535) )    "Mode"
    FROM v$session_wait
   WHERE event = 'enqueue'

As I currently have some planned row lock contention in a test database we can run this query:

  1   SELECT chr(to_char(bitand(p1,-16777216))/16777215)||
  2           chr(to_char(bitand(p1, 16711680))/65535) "Lock",
  3           to_char( bitand(p1, 65535) )    "Mode"
  4      FROM v$session_wait
  5*    WHERE event LIKE '%enq:%'
SQL# /
 
Lo Mode
-- ----------------------------------------
TX 4

Probably it is not very clear why this SQL works. Let’s try and understand it.
(Note that a small change had to be made to the WHERE clause.)

Converting the P1 value to hex may be helpful

1415053318 = 0x54580006

The two lower order bytes represent the lock mode that has been requested. This can be seen here to be 0x06, which is simple translate to decimal 6 (I can do this one in my head)

The next two bytes are also in hex and represent the two letters of the lock name.

0x54 = 84 = ‘T’
0x58 = 88 = ‘X’

Using string functions it is simple to extract the values from the hex string, convert them to numbers and retrieve the lock name and mode.

SQL# define p1 = 1415053318
 
SQL# l
  1  WITH hex AS (
  2     SELECT TRIM(to_char(&p1,'XXXXXXXXXXXXXXXX')) hexnum FROM dual
  3  ),
  4  hexbreak AS (
  5     SELECT hexnum
  6        , to_number(substr(hexnum,1,2),'XXXXXXXX') enq_name_byte_1
  7        , to_number(substr(hexnum,3,2),'XXXXXXXX') enq_name_byte_2
  8        , to_number(substr(hexnum,5),'XXXXXXXX') enq_mode
  9  FROM hex
 10  )
 11  SELECT
 12     hexnum
 13     , chr(enq_name_byte_1)
 14     || chr(enq_name_byte_2) enqueue_type
 15     , enq_mode
 16* FROM hexbreak
SQL# /
 
HEXNUM            EN   ENQ_MODE
----------------- -- ----------
54580006          TX          6

While that does work, my inner geek wants to investigate those bitmasks and find out why they work. Next are the bitmasks in decimal along with the hex equivalent.

-16777216 = 0xFFFFFFFFFF000000
 16777215 = 0xFFFFFF
 16711680 = 0xFF0000
    65535 = 0xFFFF

The bitand function is used to mask all unwanted bits to 0. The number is then divided by value needed to remove all of the now-zeroed-out lower order bytes.

The values being used as bitmasks are -16777216 and 16711680. The use of -16777216 does not seem to make sense. As the intent is to mask all but one byte, I would expect to find an FF surrounded by a number of zeroes. The bit mask of 16711680, however, looks fine.

Now let’s run the Oracle support query again, but modified to show just the integer values rather than converting them to ASCII.

 
  1  SELECT bitand(p1,-16777216)/16777215,
  2           bitand(p1, 16711680)/65535,
  3           bitand(p1, 65535)
  4      FROM v$session_wait
  5*    WHERE event LIKE '%enq:%'
SQL# /
 
BITAND(P1,-16777216)/16777215 BITAND(P1,16711680)/65535 BITAND(P1,65535)
----------------------------- ------------------------- ----------------
                    84.000005                88.0013428                6

Well, that is interesting. An implicit conversion is taking place with to_char() that is removing the decimal portion of these numbers. Is that being done with trunc(), round(), or something else? I don’t know the answer to that. What seems more important is just doing the math correctly.

There are a couple of things here that can be changed to make this work as expected.

A New BitMask

Let’s modify the first bitmask to something that seems more reasonable than -16777216. Let’s use this instead, as it masks only the single byte we need:

4278190080 = 0xFF000000

Lets’ try it out:

SQL# l
  1  SELECT bitand(p1,4278190080)/16777215,
  2           bitand(p1, 16711680)/65535,
  3           bitand(p1, 65535)
  4      FROM v$session_wait
  5*    WHERE event LIKE '%enq:%'
SQL# /
 
BITAND(P1,4278190080)/16777215 BITAND(P1,16711680)/65535 BITAND(P1,65535)
------------------------------ ------------------------- ----------------
                     84.000005                88.0013428                6

While the new bitmask didn’t break anything, it does not appear to have helped either.

Off By One Error

The solution is to consider the divisors used to remove the lower order zeroes; each of them is off by one. That is easy enough to verify:

SQL# l
  1  SELECT bitand(p1,4278190080)/16777216,
  2           bitand(p1, 16711680)/65536,
  3           bitand(p1, 65535)
  4      FROM v$session_wait
  5*    WHERE event LIKE '%enq:%'
SQL# /
 
BITAND(P1,4278190080)/16777216 BITAND(P1,16711680)/65536 BITAND(P1,65535)
------------------------------ ------------------------- ----------------
                           84                        88                6

Ah, that did it! But what was the problem previously?

Old Divisor Values

The original divisors are off by 1, which does not completely remove the lower order values.

 16777215 = 0xFFFFFF
    65535 = 0xFFFF

Increasing each by one has the desired effect.

New Divisor Values

 16777216 = 0x1000000
    65536 = 0x10000

Conclusion

Those odd bitmasks have been in the back of my mind for some time, and today it seemed like a good idea to dig in and find out more about them. It isn’t too hard to imagine that in some cases the wrong values might be returned, leading to some long and unproductive troubleshooting sessions.

There is a demo script enqueue-bitand.sql containing much of the SQL found in this article. There is also a script awr-top-sqlid-events.sql that incorporates the enqueue lock decoding. This script could be made better than it is, so please issue a pull request if you have some useful modifications.


Viewing all articles
Browse latest Browse all 34

Trending Articles