• Delicious
  • Sales: 0800 634 6513
    General: 0845 053 0371

news and info

MS SQL Server Replication failure

The Problem

I’ve just spent two days trying to resolve a problem where our Microsoft SQL Server 2008 database had stopped replicating. We noticed that recent records were in the master database, but not in the replica database. Everything seemed to be OK, with no errors being reported, but the replication monitor kept saying “No replicated transactions are available”, even though it was not up-to-date. The annoying thing was that this had been working for years, and we could not remember making any changes – especially just before the problem started.

The Investigation

After trying many things, including reinitialising the subscription, we realised that the Log Reader Agent wasn’t running. As I understand it, this agent reads the transaction logs and packages changes up to be distributed. So, that’s why there weren’t any replicated transactions available to the distribution agent.

Attempts to start the Log Reader Agent resulting in it appearing to start, but stopping again immediately. Looking in the Windows Event logs showed no errors for the Log Reader and logread.exe wasn’t in the processes list in Task Manager, so it was a bit of a mystery.

We used the instructions here (https://support.microsoft.com/en-us/kb/312292) to enable logging in the Log Reader Agent, but that appeared to not make a log file at all, which suggested that the Log Reader wasn’t starting at all.

We noticed some errors in the Windows Event Log, which looked related, but not directly relevant.

SQL Server Scheduled Job 'Replication agents checkup' (0x777712503CE7E94DBFBB3E1859262BEC) - Status: Failed - Invoked on: 2016-01-29 09:20:00 - Message: The job failed.  The owner () of job Replication agents checkup does not have server access.

This wasn’t about the Log Reader itself, but appeared to be about something which checks on the agents, and it didn’t have server access.

Looking back through the Event Log, we found that this started on 26th January at 3.45am. That looked suspiciously like just after a Windows Update. Now, it might be that we changed something earlier, but the problem only happened at that time because the server restarted then – but we can’t remember what we may have changed…

At this point, it looked like a security permissions issue. So, we tried several things:

  • Changing the Windows service account which the SQL Server Agent was running as
  • In the Properties for the replication Publication, we went to Agent Security, and tried changing the Security Settings for the Log Reader Agent (we tried all sorts of things there, including setting it to administrator permissions)

None of these changes made any difference.

Then, we tried running the Log Reader from an administrator command prompt:

C:\Program Files\Microsoft SQL Server\100\COM>logread -Publisher [SERVER] -PublisherDB [MyDatabase] -Distributor [SERVER] -DistributorSecurityMode 1  -Continuous

That worked fine, and the database started replicating. So, the replication was, in fact, OK, and the Log Reader Agent can work, but it isn’t being run for some reason. Again, it looks like a security permissions problem, where it can’t start the Log Reader Agent.

So, now we had a last-resort workaround – we could just start the Log Reader Agent manually (or from Task Scheduler), but that’s not nice.

The Solution

Looking for the ‘Replication agents checkup’ which was being reported in the earlier error message, we eventually stumbled upon it. It’s in the SQL Server Management Studio, find the SQL Server Agent in the left-pane. Expand Jobs, and there it is. Going into the properties for the Replication agents checkup job, we found that on the General tab, there’s a box called Owner which was blank.  By putting a SQL Server user in there (eg, you can try ‘sa’, but it’s not recommended to leave it at that), we found that the Replication agents checkup could now run without error.

After a bit of investigation we found that there were some more jobs called <SERVER>_<Database>_<stuff> which contained jobs for the replication process. All those had no Owner defined either. Again, when we put a valid user (with appropriate permissions) in there, and restarted the server, it all started working again!

I have no idea why those boxes were all blank. I don’t know if we removed an SQL Server User which used to be in there (we don’t remember doing that) or if the Windows Update broke things, but hopefully the information above will help someone else with a similar problem.

 

 

 

Post a Comment