Simple monitoring of DFS Replication in Zabbix

    Introduction


    With a sufficiently large and distributed infrastructure that uses DFS as a single data access point and DFSR for data replication between data centers and branch servers, the question of monitoring the state of this replication arises.
    It so happened that almost immediately after the start of using DFSR, we started implementing Zabbix in order to replace the existing zoo with various tools and bring infrastructure monitoring to a more informative, complete and logical view. The use of Zabbix to monitor DFS replication will be discussed.

    First of all, we need to decide what data on DFS replication we need to get to monitor its status. The most relevant indicator is the backlog. Files that were not synchronized with other members of the replication group fall into it. You can see its size by the dfsrdiag utility , which is installed along with the DFSR role. In the normal state of replication, the size of the backlog should go to zero. Accordingly, large values ​​of the number of files in the backlog indicate problems with replication.

    Now about the practical side of the issue.

    In order to monitor the size of the backlog through the Zabbix Agent, we need:

    • A script that will parse dfsrdiag output to provide the final values ​​of the backlog size in Zabbix,
    • A script that will determine how many replication groups there are on the server, which folders they replicate and which other servers are included in them (we don’t want to drive all this into Zabbix with our hands for each server, right?),
    • Entering these scripts as UserParameter in the configuration of the Zabbix agent for subsequent calling from the monitoring server,
    • Starting the Zabbix agent service as a user who has the right to read backlog,
    • A template for Zabbix, in which the detection of groups, the processing of received data and the issuance of alerts on them will be configured.


    Script parser


    To write the parser, I chose VBS as the most universal language found in all versions of Windows Server. The logic of the script is simple: it receives through the command line the name of the replication group, the replicated folder, and the names of the sending and receiving servers. Further, these parameters are transferred to dfsrdiag , and depending on its output:
    Number of files - if a message is received about the presence of files in the backlog,
    0 - if a message is received about the absence of files in the backlog ("No Backlog"),
    -1 - if a message is received dfsrdiag error while executing the request ("[ERROR]").

    get-backlog.vbs
    strReplicationGroup=WScript.Arguments.Item(0)
    strReplicatedFolder=WScript.Arguments.Item(1)
    strSending=WScript.Arguments.Item(2)
    strReceiving=WScript.Arguments.Item(3)
    Set WshShell = CreateObject ("Wscript.shell")
    Set objExec = WSHshell.Exec("dfsrdiag.exe Backlog /RGName:""" & strReplicationGroup & """ /RFName:""" & strReplicatedFolder & """ /SendingMember:" & strSending & " /ReceivingMember:" & strReceiving)
    strResult = ""
    Do While Not objExec.StdOut.AtEndOfStream
    	strResult = strResult & objExec.StdOut.ReadLine() & "\\"
    Loop
    If InStr(strResult, "No Backlog") > 0 then
    	intBackLog = 0
    ElseIf  InStr(strResult, "[ERROR]") > 0 Then
        intBackLog = -1
    Else
    	arrLines = Split(strResult, "\\")
    	arrResult = Split(arrLines(1), ":")
    	intBackLog = arrResult(1)
    End If
    WScript.echo intBackLog


    Detection script


    In order for Zabbix to determine all the replication groups present on the server and find out all the parameters required for the request (folder name, names of neighbor servers), we need to get this information, firstly, and secondly, Present it in a format that Zabbix understands. The format that the discovery tool understands is:

            "data":[
                    {
                            "{#GROUP}":"Share1",
                            "{#FOLDER}":"Folder1",
                            "{#SENDING}":"Server1",
                            "{#RECEIVING}":"Server2"}
    ...
                            "{#GROUP}":"ShareN",
                            "{#FOLDER}":"FolderN",
                            "{#SENDING}":"Server1",
                            "{#RECEIVING}":"ServerN"}]}


    The information we are interested in is most easily obtained through WMI by pulling it from the corresponding sections of DfsrReplicationGroupConfig. As a result, a script was born that generates a request to WMI and, at the output, displays a list of groups, their folders and servers in the desired format.

    DFSRDiscovery.vbs
    
    dim strComputer, strLine, n, k, i
    Set wshNetwork = WScript.CreateObject( "WScript.Network" )
    strComputer = wshNetwork.ComputerName
    Set oWMIService = GetObject("winmgmts:\\" & strComputer & "\root\MicrosoftDFS")
    Set colRGroups = oWMIService.ExecQuery("SELECT * FROM DfsrReplicationGroupConfig")
    wscript.echo "{"
    wscript.echo "        ""data"":["
    n=0
    k=0
    i=0
    For Each oGroup in colRGroups
      n=n+1
      Set colRGFolders = oWMIService.ExecQuery("SELECT * FROM DfsrReplicatedFolderConfig WHERE ReplicationGroupGUID='" & oGroup.ReplicationGroupGUID & "'")
      For Each oFolder in colRGFolders
        k=k+1
        Set colRGConnections = oWMIService.ExecQuery("SELECT * FROM DfsrConnectionConfig WHERE ReplicationGroupGUID='" & oGroup.ReplicationGroupGUID & "'")
        For Each oConnection in colRGConnections
          i=i+1
          binInbound = oConnection.Inbound
          strPartner = oConnection.PartnerName
          strRGName = oGroup.ReplicationGroupName
          strRFName = oFolder.ReplicatedFolderName
          If oConnection.Enabled = True and binInbound = False Then
            strSendingComputer = strComputer
            strReceivingComputer = strPartner
            strLine1="                {"    
            strLine2="                        ""{#GROUP}"":""" & strRGName & """," 
            strLine3="                        ""{#FOLDER}"":""" & strRFName & """," 
            strLine4="                        ""{#SENDING}"":""" & strSendingComputer & ""","                  
            if (n < colRGroups.Count) or (k < colRGFolders.count) or (i < colRGConnections.Count) then
              strLine5="                        ""{#RECEIVING}"":""" & strReceivingComputer & """},"
            else
              strLine5="                        ""{#RECEIVING}"":""" & strReceivingComputer & """}]}"       
            end if		
            wscript.echo strLine1
            wscript.echo strLine2
            wscript.echo strLine3
            wscript.echo strLine4
            wscript.echo strLine5	   
          End If
        Next
      Next
    Next
    


    I agree, the script may not be brilliant with the elegance of the code, and something in it can certainly be simplified, but it performs its main function of providing information about the parameters of replication groups in a format that Zabbix understands.


    Scripting Zabbix Agent Configuration


    Everything is extremely simple here. At the end of the agent configuration file, add the lines:

    UserParameter=check_dfsr[*],cscript /nologo "C:\Program Files\Zabbix Agent\get-Backlog.vbs" $1 $2 $3 $4
    UserParameter=discovery_dfsr[*],cscript /nologo "C:\Program Files\Zabbix Agent\DFSRDiscovery.vbs"
    

    We, of course, rule the paths where we have scripts. I put them in the same folder where the agent is installed.

    After making the changes, restart the Zabbix agent service.

    Change the user who runs the Zabbix Agent service


    In order to receive information through dfsrdiag , the utility must be run on behalf of an account that has administrative rights to both sending and receiving members of the replication group. The Zabbix agent service, launched by default under the system account, will not be able to execute such a request. I created a separate account in the domain, gave her administrative rights on the necessary servers and configured the launch of the service from these servers on these servers.

    You can go the other way: since dfsrdiag , in fact, works through the same WMI, you can use the descriptionhow to give a domain account the right to use it without issuing administrative rights, but if we have many replication groups, then issuing rights to each group will be difficult. However, if we want to monitor the replication of Domain System Volume on domain controllers, this may be the only acceptable option, since giving domain administrator privileges to the monitoring service account is not the best idea.

    Monitoring pattern


    Based on the data, I created a template that:
    • Once an hour, it starts automatic detection of replication groups,
    • Every 5 minutes, checks the backlog size for each group,
    • Contains a trigger that issues an alert when the backlog size for any group is more than 100 for 30 minutes. A trigger is described as a prototype that is automatically added to detected groups,
    • Builds backlog size graphs for each replication group.

    Download the template for Zabbix 2.2 here .

    Total


    After importing the template into Zabbix and creating an account with the necessary rights, we only need to copy the scripts to the file servers that we want to monitor for DFSR, add two lines to the agent configuration on them and restart the Zabbix agent service, setting it up to run on behalf of desired account. No other manual configuration is required to monitor DFSR.

    Also popular now: