DNS traffic =========== The number of DNS queries observed for a name over a time period can be retrieved. This is especially useful to see if a domain is popular, and to spot anomalies in its traffic. Getting the number of queries observed for a name ------------------------------------------------- The ``daily_traffic_by_name`` method returns a vector with the number of queries observed for each day, within a time period. By default, the time period starts 7 days before the current day, and ends at the current day, a day starting at 00:00 UTC. .. code-block:: ruby db.daily_traffic_by_name('www.github.com') The output is a ``Result::TimeSeries`` object: :: [ [0] 6152525, [1] 4756714, [2] 4670300, [3] 5954983, [4] 6140915, [5] 6040669, [6] 5529869 ] This method accepts several options: - ``start``: a ``Date`` object representing the lower bound of the time interval - ``end``: a ``Date`` object representing the higher bound of the time interval - ``days_back``: if ``start`` is not provided, this represents the number of days to go back in time. Here are some examples featuring these options: .. code-block:: ruby db.daily_traffic_by_name('www.github.com', end: Date.today - 2, days_back: 10) db.daily_traffic_by_name('www.github.com', start: Date.today - 10) The traffic for multiple domains can be looked up, provided that a vector is given instead of a single name. In that case, the output is a ``Result::HashByName`` object. .. code-block:: ruby db.daily_traffic_by_name(['www.github.com', 'www.github.io']) For example, the following snippet compares the median number of queries for a set of domains: .. code-block:: ruby ts = db.daily_traffic_by_name(['www.github.com', 'www.github.io']) ts.merge(ts) { |name, ts| ts.median.to_i } :: { "www.github.com" => 5954983, "www.github.io" => 528002 } Anomaly detection in traffic ---------------------------- A benign web site tends to have a comparable traffic every day. Sudden spikes or drop of traffic usually indicate a major event (incident, unusual volume of sent email), or some suspicious activity. Domain names used as C&C typically receive very little traffic, and suddenly get a spike of traffic for a short period of time. The same can be observed with compromised hosts acting as intermediaries. After having retrieved the traffic for a name, computing the relative standard deviation is a simple and efficient way to detect anomalies. To do so, the library includes the ``descriptive_statistics`` module and implements a ``relative_standard_deviation`` method. This method can work on the time series of a single domain, as well as on a set of multiple time series. .. code-block:: ruby ts = d.daily_traffic_by_name(['skyrock.com', 'github.com', 'ooctmxmgwigqt.info']) ap d.relative_standard_deviation(ts) This outputs either a ``Response::TimeSeries`` or a ``Response::HashByName`` object: :: { "skyrock.com" => 2.4300100908269657, "github.com" => 10.628632305278618, "ooctmxmgwigqt.info" => 244.18566965045403 } In this example, we can clearly spot a domain name whose traffic doesn't follow what we usually observe for a benign domain. High-pass filter ---------------- Domains receiving little traffic are frequently receiving more noise (bots, internal traffic) than queries sent by actual users. A simple high pass filter sets to 0 all entries of a time series below a cutoff value. This is provided by the ``high_pass_filter`` method: .. code-block:: ruby ts = d.high_pass_filter(ts, cutoff: 5.0) This method works on the time series of a single domain, as well as on a set of multiple time series. The result is either a `Response::TimeSeries` or a `Response::HashByName` object.