How to determine the volume of your logs?

Published on June 09, 2018

How to determine the volume of your logs?



    Good day!

    Today, we will look at a common question that everyone who handles logs face or is going to do it and is now looking at various processing and storage solutions. What volume of logs per day / week / month will we receive from different systems and what storage resources should we use?
    It is quite difficult to say for sure, but we will try to help you to sort out the estimated volumes, based on our experience.

    Our assessment method is based on the use of statistical information on the number of logs in various sources, all the values ​​that will be presented below are average values ​​of the results of work on various logging projects.

    For example, take a few common sources:

    • Windows Event Logs
    • Windows domain
    • Cisco ASA
    • Cisco ESA
    • Cisco IPS
    • Cisco IOS
    • Palo alto
    • * nix-syslog
    • MSExchange-mail

    Collecting logs


    Previously, we measured the average number of bytes in one event in each source. Then they calculated the approximate number of events per day that fall on one source and calculated how many logs in GB will be collected from each source from one device.

    WinEventLog
    ~ bytes in the event = 1150
    Avg. Number of events per day (assign.) = 25 000
    Gb / day (assign.) = 1150 * 25 000/1024 ^ 3 ≈ 0.03

    Windows Domain
    ~ byte per event = 1150
    Avg. Number of events per day (given.) = 250 000
    Gb / day (given) = 1150 * 250 000/1024 ^ 3 ≈ 0.3

    Cisco ASA
    ~ byte per event = 240
    Avg. Number of events per day (assign.) = 1 600 000
    Gb / day (assign.) = 240 * 1 600 000/1024 ^ 3 ≈ 0.35

    Cisco ESA
    ~ byte in event = 100
    Avg. Number of events per day (assign.) = 200 000
    Gb / day (assign.) = 100 * 200 000/1024 ^ 3 ≈ 0.02

    Cisco IPS
    ~ bytes per event = 1200
    Avg. Number of events per day (assign.) = 500 000
    Gb / day (assign.) = 1200 * 500 000/1024 ^ 3 ≈ 0.6

    Cisco IOS
    ~ byte per event = 150
    Avg. Number of events per day (assignment) = 20 000
    Gb / day (assign.) = 150 * 20 000/1024 ^ 3 ≈ 0.003

    Palo Alto
    ~ bytes in the event = 400
    Avg. Number of events per day (assign.) = 500 000
    Gb / day (assign.) = 400 * 500 000/1024 ^ 3 ≈ 0.2

    * nix-syslog
    ~ bytes per event = 100
    Avg. Number of events per day (lot.) = 50 000
    Gb / day (assign.) = 100 * 50 000/1024 ^ 3 ≈ 0.005

    MSExchange-mail
    ~ byte per event = 300
    Avg. Number of events per day (assign.) = 100 000
    Gb / day (assign.) = 300 * 100 000/1024 ^ 3 ≈ 0.03

    Next, in order to determine the volume of all logs, you need to determine how many devices we want to collect and store information . For example, consider the case if we have 30 devices that generate WinEventLog, 1 device each - Windows Domain, Cisco ESA, Cisco IPS, Palo Alto.

    1150 * 25 000 * 30 + 1150 * 250 000 + 100 * 200 000 + 1200 * 500 000 + 400 * 500 000 = 1 970 000 000 bytes / day = 1.8347 Gb / day12.4 Gb / week55 Gb / month

    Of course, when using this method of calculation, a significant error may occur, since the number of logs per day depends on many factors, for example:

    • Number of users and their roles
    • Enabled audit services
    • Required Severity Level
    • And much more

    Essential plus of a similar method that if there are statistical data, then the approximate volume of logs can be counted even on a napkin. Minus - possible large error. If significant deviations are unacceptable, then you can configure the download of data from all sources to the test system, for example, Splunk provides a trial license with a sufficient resource to test a large number of sources. This method gives an accurate result, but the deployment of any test systems will require time, labor and technical resources.

    Data storage


    Let us briefly touch upon another question on the subject of logs: how much resources will be needed to store them.

    To answer this question, first of all you need to understand in what form your log processing tool stores data. For example, ELK , along with logs, also stores information about selected fields, which can increase the volume of one event up to 3 times, and Splunk stores data just raw, compressing it further, and metadata is stored separately from events.

    Then, you need to understand what period of historical data you need to store, the "temperature" of the data, RAID, and so on. A convenient calculator can be found at this link .

    Conclusion


    One of the pressing issues that caused us to touch upon the volume of logs is that the Splunk license depends on the amount of data being indexed per day. If you want to use Splunk for processing your logs, then after calculating the approximate volume, you can estimate the cost of the required license. The license calculator can be found here .

    How do you estimate the amount of your logs? Share in the comments your experience, tools, interesting cases.