The most common mistakes and misconceptions when configuring DFSR

Transfer

[Approx. translator. The material of the article refers to Windows Server 2003 / 2003R2 / 2008 / 2008R2, but most of the above is true for later versions of the OS]

Hello! Warren is here again, and this blog post is a compilation of the most common DFSR problems that I have encountered in the past few years. The purpose of this post is to list common errors in the DFSR configuration that cause these problems and to prevent you from making similar errors. Knowing what to do should not be as important as knowing what to do. Many of the points described are related to other topics, therefore, relevant links are provided for in-depth study of the issue.

The quota size for the staging folder is too small.

Did you see a lot of events with the code 4202 and 4204 in the magazine? In this case, the size for the intermediate folder is set incorrectly. The unpleasant consequence of an incorrectly specified intermediate folder size is a decrease in replication performance, because instead of replicating files, the service will waste time clearing the intermediate folder.
DFSR servers that are configured with sufficient intermediate folder size work more efficiently for at least two reasons:

It is much more effective to just once place the file in an intermediate folder, and then send it to all receiving replication partners, rather than creating a file, replicating it, and then deleting its copy for each receiving partner.
If at least one member has the Enterprise edition of the operating system installed, servers can use the cross-file RDC technology [approx. Translator: Starting from Windows Server 2012, this technology is also available in the Standard Edition]

An incorrectly configured size for a staging folder can also cause a replication “loop”. This happens if the replicated file has already been copied to the intermediate folder on the receiving server, but the intermediate folder clearing mechanism deletes this file before it has time to move the target folder. The deleted file will be replicated to the server again and will be deleted by this server from the intermediate folder again, as a result of which the server will never be able to accept the file. This process will be repeated until the server accepts the file.

Do not ignore log events for a staging folder.

Check out this post , which describes how to use the method for determining the minimum size of an intermediate folder.

You can familiarize yourself with the section “Increasing the Interim Quota” here .

For information on cross-file RDC, you can read the article "Information about remote differential compression," posted here .

Invalid or not tested preseeding procedure

A preseeding procedure is the copying of data that will be replicated to a new replication member server before being added to the destination folder of this server, in order to reduce the time required to complete the primary replication. Most of the failures of the preseeding procedure I encountered were caused by three reasons.

ACL mismatch at source and destination.
After copying to the new replication member, the files were changed.
No pretesting has been done to verify that the preseeding procedure used is working as expected.

In short, the files should be copied in a certain way, and they cannot be changed after they have been copied to an intermediate folder, and the whole process should be pre-tested by you.

Click here to read Mr. Pyle's blog on how to properly organize the preseeding procedure for your DFSR servers.

Large copy queue size over time

Besides the fact that a long copy queue that exists for a long time means that your data is not synchronized, it can lead to undesirable conflict resolution when a file with old content wins in a conflict resolution scenario. The most common scenario in which I came across this behavior is the massive addition of new replication folders. Instead of doing a phased deployment, some administrators added 20 new folders at once for replication from 20 different branches, thereby overloading the hub server. Deploy in stages so that primary replication completes within a reasonable amount of time.

DFSR is used as a backup solution.

Believe it or not, some administrators implement DFSR without autonomous backups of replicated data. DFSR was not designed as a backup solution. One of the goals of developing DFSR is to be part of a backup strategy in the enterprise, because DFSR allows you to collect your geographically distributed data at a centralized site for later backup, recovery and archiving. With the help of several replication members, protection against server failure is implemented; however, this does not protect you from accidental deletions. To be fully protected, you must make backup copies of your data.

One-way replication: its use and incorrect repair methods

In an attempt to prevent unwanted updates from appearing on servers where data will never change, (or, if desired, to prevent changes to them), many customers set up one-way replication by removing outbound connections for replication members. One-way replication is not supported in any version of DFSR prior to Windows Server 2008 R2. Windows 2008 R2 supports one-way replication, which allows you to configure read-only mode for replicated folders.

Using replication members in read-only mode allows you to achieve a one-way replication goal that prevents unwanted changes in replicated data. If you want to use one-way replication using DFSR, use Windows 2008 R2 and for those members on whom no changes should be made, specify a “read only” mode.

Click here and here to learn about the read-only DFSR replication mode.

Another common problem occurs when an administrator finds that one-way replication is not supported, and tries to correct the situation, but does not do it the right way. Simply turning back duplication back may have undesirable results.

Clickhere to learn how to fix one-way replication problem.

Hub server as a single point of failure and overloaded hub servers

I have seen many deployments with a single hub server. If this hub server fails, the entire deployment will be at risk. If you are using Windows Server 2003 or 2008, you must have at least two hub servers, and if one of them fails, the other must cope with the load on the recovery time of the first with minimal impact on end users. Starting with Windows Server 2008 R2, you can deploy DFSR on a Windows failover cluster, which will provide high availability with a reduced storage requirement in half.

Sooner or later, administrators have a situation where there are too many servers in branch offices that are configured to replicate with a single hub server. This can lead to delays in replication. You can understand how many server office servers a single hub server can serve by tracking copy queues. There is no magic formula, since each environment is unique and there are many dependencies.

Read the “Set Up Topology” section here to learn about deploying hub servers.
Click here to learn how to configure DFSR on a Windows Server 2008 failover cluster.

Too many folders to replicate to one Jet database.

DFSR uses one Jet database per volume. As a result, placing all replicated folders on one volume results in placing them all in one Jet database. If this database has a problem that requires repairing or restoring the database, it will affect all replicated folders on this disk. [Approx. translator. obviously, it is not a disk (disk), but a volume (volume).] It would be more correct to use as many disks as possible and distribute the replicated folders between them, thus ensuring the maximum time available for the data.

Deployments based on budget iSCSI solutions

I have often seen DFSR deployments using the cheapest iSCSI hardware. Usually, if you use DFSR, then you do it to achieve critical goals, such as data redundancy, backup consolidation, application delivery and OS updates on a schedule. Making yourself dependent on low-quality equipment that does not have normal support from a vendor is not the best idea. If data is important for your business, it means that the equipment on which the OS and the replication mechanism are running will be important for it.

No actual patches are installed for the DFSR service.

DFSR is actively supported by Microsoft and updates are released for it as needed. Update DFSR, if at the time of your next update installation cycle there is a new release for it. Make sure your servers are updated according to the knowledge base articles listed below.

DFSR fixes for Windows 2003 R2
DFSR fixes for Windows 2008 and Windows 2008 R2

Please note that, besides DFSR.EXE / DFSRS.EXE, these updates are also intended for NTFS.SYS and other files. For replication to work correctly, always check that the latest patches are installed for at least DFSR and NTFS. Other fixes from the list mainly concern user interface problems, and you will need to install them on at least those systems where the DFSR configuration tasks are performed.

It is recommended that patches be installed on the DFSR servers in advance, even if everything is working fine, since later this will help you to avoid the appearance of already known problems.

Network adapter drivers are not maintained up to date.

DFSR can work normally only if the network you provide to it also works without problems. Using drivers 5 years ago is not the most reasonable solution. I had experience with several customers for whom problems with DFSR replication were decided by updating the outdated NIC driver.

No DFSR monitoring

Although DFSR is used to move critical data, as a rule, many admins have no idea what DFSR does until they have a problem. Those who are more sophisticated create their own scripts to monitor the copy queues on their servers, but most simply hope for it. The management pack for DFSR was released almost a year ago (and other versions appeared even earlier). Install it and use it - and then you will be able to detect problems and react to them before they turn into a nightmare. If you are not able to use the Operations Management Management Pack for DFSR, then at least write a script to track the copy queue on a daily basis in order to understand whether DFSR files are replicating or not.

Click hereto get information about the Operations Management Management Pack for DFSR.

Updated January 19, 2011:

Making changes to disk storage without prior archiving

If a DFSR server needs to replace a hard disk or add a new one to increase storage space, it is extremely important to have an up-to-date data backup in case something goes wrong. Anything can go wrong, most often conflict events occur due to unexpected changes in the parent folder or accidental deletion of the parent folder that is replicated to all partners. You need to back up your data before starting the changes and keep it until the project is completed.

Stop the DFSR service to temporarily stop replication

Sometimes it becomes necessary to temporarily stop replication. The correct way to do this is to disable replication for the correct group using the schedule. The DFSR service must be running in order to be able to read updates in the USN log. Do not stop the DFSR service for a long time (days, weeks), as this may lead to log overflow (if during this time many files have been changed, added or deleted). DFSR will recover from log overflow, but in large deployments it will take a long time and replication will not work or will be very slow during log recovery. You will also likely see very large copy queues until the log is restored.

I hope this list will help you. Successful replication!

Warren “wide net” Williams

[Approx. translator. If readers are interested, I will try later to translate the articles posted on the links indicated in the text, as well as other articles of the original author]

Only registered users can participate in the survey. Sign in , please.

Have you found the errors described in the article?

Tags: