Today I worked on a Transactional log full issue on one of
the critical production database server, the database involved in this issue is
configured both Mirroring and Log shipping.
From the mirrored database a scheduled job generating snapshot for every
30 mins. The production database is
involved high volume of ecommerce OLTP, everything were working fine till
midnight and after some changes happened in the front end database started
receiving huge volume of data and the Tlog backup size turned to 20GB+ from
500MB normal value, this continued for 3 hours and the logs have been reduced
and back to normal but the mirroring didn’t caught up so fast it was crawling
till the morning 7.30AM and went into disconnected state.
The Tlog full issue started at 1.00AM and my fellow DBA
assumed that issue was causing by a trim job which runs every 8 hours and the
1AM schedule usually runs more than 6hrs, but that was not the true cause of
the issue, it was mirroring, the mirroring was absolutely slow, since the
transactions were not transferred to the mirrored database, the unsent logs
staying in the principal and waiting to be sent, of course this is causing not
to flush the transactional logs though the log shipping Tlog backup job running
every 5 minutes.
Finally figured out the issue, since the mirroring went into
disconnected state I ended up to break the mirroring completely and changed the
database recovery model from FULL to SIMPLE, which brought the Tlog usage 99%
to less than 1% drastically. Then I shrunk the log to required size.
Once all back to normal I setup the mirroring and log shipping once again... oh yeah a day work for a big database... :)
When I did the investigation, used the sp_dbmmonitorresults stored procedure and dbm_monitor_data system
table on msdb database on the principal database instance.
As per the below table, the mirroring was fine till 7.32AM
and it broken 7.33AM, if you see the unsent_log column value in the table it started
piling up.
sp_dbmmonitorresults dbname, 4, 0;
Mirroring the following table also has almost same but
little more, you can see the log_flush_rate and send_queue_size columns started
increasing from 7.33AM.
SELECT * FROM dbm_monitor_data
where database_id=5