Our project had some additions that needed to have minimal interference to the existing system. The underlying Oracle database would have had to grab out changes from one existing data storage "P" table to a new "S" interface table. The simple solution would have been to implement a database trigger on the INSERT/UPDATE/DELETE statements on table P, inserting new rows into table S.
However, our DBA had concerns from recent developments, that suggested we avoid this traditional approach. They had little to offer in terms of alternatives. And this is where our story begins.
Resulting from the assessment, I ventured forth to find alternatives.
We could implement changes to the code in our existing module, that could support our efforts. That is obviously not feasible knowing that we'd much prefer to not rock the boat that much.
I saw suggestions online discussing the use of interceptors. Obviously it'd involve code changes as well, but with the add negative of not being best practice since we'd need to then share the classloader for both the new and old modules.
Next, I wondered if it's possible to tap into JMX. One option explored the possibility of exposing Hibernate Statistics that way. Again, we'd need to make changes, however minor, to the existing codes. Moreover, the statistics do not seem to offer the behaviours we need for picking up table changes.
In the brainstorming session with our DBA, we'd also discussed about using the flashback archive, and even the Change Data Capture that is with our OGG setup. Both of which were dismissed for complexity.
We'd need a way out that our application development team can support on their own, with minimal intervention/support from our DBA team.
I thought I'd hit the jackpot with the DBMS_CHANGE_NOTIFICATION (synonym as DBMS_CQ_NOTIFICATION) package. I thought that I could implement the trigger asynchronously without leaving the comfort of the database tier. The online documentation and examples were sparse, but still made sense. But it couldn't work in our setup for some reason. I'd written up the stored procedure and everything, but it was next to impossible to troubleshoot. Returning to basics, I then prepared a basic EMPLOYEES table to follow the example to the letter. No dice.
Following the Object Change Registration Example, I'd went on to GRANT the privileges, ALTER the job_queue_processes to non-zero, and even resorted to a simple UPDATE upon any of the events that were meant to trigger the procedure. None of it worked.
Giving up, our DBA suggested I seek help from Oracle. The ticket sat for 7 days after their first acknowledgement, as we waited for a response. I provided the test case using their example. And naturally, their response was that they could not reproduce my issue.
Meanwhile, I'd essentially moved on. The same CQN had a variation that involved writing Java code. My colleague shared me the basic example he concocted from online references. While the original sample code used ojdbc6, discussions pointed him to using ojdbc10. Oh dear. Our application servers only run on Java 8, meaning that we could only use ojdbc8. I tested using the "default" v12.2.0.1 of ojdbc8 (more on this later) by referencing the POM to online sources, and it broke. Oh no. Did this mean that this CQN approach was no go either? Well... I decided to put on my tinfoil hat and started searching desperately for answers.
First off, we require DCN_CLIENT_INIT_CONNECTION to be enabled. This was to avoid involving traffic being initiated from the database to app. But the v12.2.0.1 of the ojdbc8 library did not come with this constant. Even passing the property in as a hardcoded String literal did not help. As I scoured the web, Google returned nary a Page 2 of results. And then I found it. The javadoc for ojdbc8 on this site showed that the constant existed. I checked through the jar files I found from mvnrepo.com and still could not see the value. Next, I dug into our dev servers to try my luck. In the ojdbc8.jar retrieved from our own database instance, was this one constant string. My conspiracy theory meter blew up. "This had to be an enterprise version thing", I thought. Satisfied, I proceeded with finishing up my work. A barebones servlet was prepared, without using Spring, without using Hibernate. The servlet served, the listener listened.
I even had a sidequest for orphaned registrations. The PL/SQL block for deregistering does not work for notifications registered from an application. We needed to use unregisterDatabaseChangeNotification, that will also indicate the callback, in which we'd nuke all registrations queried from "SELECT * FROM USER_CHANGE_NOTIFICATION_REGS" in the database.
The dust settled, and I needed to work up our pipeline for my conspiracy. It was a hassle that I'd rather avoid, if we needed our team to manually install the special ojdbc8 jar for this. Delving into the "good" jar again, I noticed that the library indicated v19.24.0.0 in its manifest. Oh. That's odd. I did not realise that there had been many versions of ojdbc8 since v12.2.0.1 that were all available online. There was no conspiracy. The jar file I'd picked up internally was simply named ojdbc8.jar because Oracle would see no need to indicate version in their naming conventions in a product they supplied. I would not need to especially install the ojdbc8 jar afterall. All I needed, was to update my POM for the dependency to point to v19.24.0.0 instead of v12.2.0.1.
Along the way, there were some other notes that I'd picked up.
In order to acquire an OracleConnection, it's not enough to simply cast the default ds.getConnection() object. Instead, it is a must to .unwrap for acquiring an OracleConnection, as well as to .unwrap to get the OracleStatement. Without the unwrap for the Staement object, our WSJdbcStatement would have failed a ClassCastException since it could not be translated into an OracleStatement.
My conditions involve all of INSERT/UPDATE/DELETE events. It is trivial to retrieve records based on the supplied rowid, which involves setting Oracle.Connection.DCN_NOTIFY_ROWIDS property to true. The challenge for this comes in the form recovering rows that have been deleted. Fortunately, the Oracle database comes with a feature known as Flashback. It's basically an undo button for the database. The more advanced capabilities include creating tables or entire databases, and also a DBMS_FLASHBACK package. Each of those involve more nuanced considerations for issues surrounding configuration such as clusters. What I was interested in, was just the use of the Flashback Query. It basically allows a regular SELECT statement to include an AS OF clause, with which, I could do SELECT * FROM MYTABLE AS OF TIMESTAMP SYSDATE - INTERVAL '2' MINUTE WHERE ROWID = ? in order to travel back in time and recover the data that was just deleted during the DELETE operation. Just don't forget to GRANT FLASHBACK ANY TABLE TO YOURACCOUNT before that. The caveat is that you probably should not flashback longer than necessary.
Had the CQN Listener not been successful, I was already exploring the use of ORA_ROWSCN while blindly querying the tables. I'd have had to snapshot the largest PK ID somewhere, to check for INSERTs in one table, and DELETEs in the secondary archival table.The UPDATEs would have become the blocker in this case. The System Change Number "SCN" would provide a pseudo PK ID that I can then snapshot similarly. The rabbit hole would likely lead me to the Flashback Query as well to use in conjunction with the ORA_ROWSCN pseudocolumn. Fortunately, the story ends here, as that sounds like a whole other can of worms to deal with.
UPDATE (21 Oct 2024): I did not expect a sequel but here we are. We had two environments to test in. A colleague suggested that I test in the other environment, just in case. And it worked. But we didn't understand why there was this discrepancy, so I highlighted this new observation to Oracle Support team. And thanks to them, we found out the cause of the discrepancy with regards to the DBMS_CHANGE_NOTIFICATION package. I was routed to the AQ team, who then requested for our health check report for that component.
The example included a line for
ALTER SYSTEM SET "JOB_QUEUE_PROCESSES"=4;
The above is done, where the value simply needs to not be a zero. In order for the Change Notification to work "server side" within the database, Oracle employs a couple of modules internally. Namely Advanced Queuing, and the Job Scheduler, even though they aren't spelt out as dependencies in the documentation found online.
Querying (as SYS no less)
SELECT * FROM ALL_QUEUES WHERE NAME like '%CQN%';
results in 2 rows returned for the CQN_EVENT_TABLE, whereas
SELECT * FROM ALL_QUEUE_TABLES WHERE QUEUE_TABLE like '%CQN%';
returns one row for the same. Naturally, I'd next query for
SELECT * FROM CQN_EVENT_TABLE;
This results in every single test message I'd ran successfully from the beginning of this exercise.
This meant that all the messages were being captured from the AQ but failed to follow up with the stored procedure. That was where the job scheduler comes in. I had initially followed the steps, where some documentation instructed to set =2, then others noted to set =4, I'd even set =10 when I noticed that we had a lot more processes going on from querying
SELECT * FROM ALL_SCHEDULER_JOBS;
This was why, returning to the health check report, the Oracle team having observed that job_queue_processes=0, which perplexed me. Unbeknownst to me, there was some history to the database that led to this setting, which effectively turned off the job scheduler.
As well as, when I was updating the parameter to whatever value would not have made a difference, as I was merely making the change in the Pluggable DB "PDB" and not the Container DB "CDB" that was required for this to work. I had my DBA colleague help set the value to 80 (to align the value with the dev environment that did work).
I went further to revise my original PL/SQL script, just in case we still needed it. In which, I realised that I'd have needed the Flashback Query anyway.
And now we have both server and client side solutions available.
No comments:
Post a Comment