How I tested QoE (Quality of Experience)
Over the past six months, I often began to hear at conferences and from acquaintances about various products based on the concept of “quality of perception” services (Quality of Experience, QoE). This term is becoming increasingly popular. Quite a lot of research is being done on creating new methods for determining the quality of users' perception, of a particular service or service, but now I don’t want to delve into the theory, who are interested in being able to google it on their own.
The advertising messages say that this wonderful product can:
- Identify problems with the quality of communication for CPE, including up to subscriber equipment,
- Boost LTV (LifeTimeValue),
- Clickstream analytics, that is, the ability to track visited sites, including competitors' sites,
- Find out how many virtual IP addresses are behind the real IP address,
- Determine what services / equipment subscribers use (SIP, OTT, smart house, smart tv, network equipment),
- For each user, prioritize traffic and restrict certain protocols based on L7,
- Easy integration with OSS / BSS.
Since my main task is to find subscribers dissatisfied with the quality of the Internet and generally solve problems of outflow of subscribers in every way (I work in one, not the smallest telecom operator in the service quality department), this product (we will call it QoE for convenience) ), with the sweet words of sellers and marketers solves this problem. But this is all in theory, and until you make sure in practice, you will not understand. That is why I wanted to share with colleagues the practical side of this solution, devoid of beautiful marketing packaging.
Immediately make a reservation that I will not call the vendor, otherwise they will consider it for advertising, and no one will pay me money for it. Let me just say that this is a Russian manufacturer, in the line of which there are solutions (hardware-software complex) for filtering URL, DPI and a product based on QoE principles based on this DPI.
Therefore, I will tell and show you what functionality I was able to test, what problems arose during testing and summarize my subjective result.
I will not particularly mention how I received the equipment for testing, since nothing special happened at this stage. In short, I was quickly contacted by the vendor’s partners from the NAG company.
We asked for clarifying information about the tasks that we want to solve using QoE, network topology, traffic volume and contact person data from our company. After that, they sent a link to your personal account, where you could see the product live.
When logging in, a dashboard appears, showing:
- Active subscribers
- Subscribers with bad RTT / terrible RTT (Round Trip Time),
- Packages from subscribers / to subscribers,
- Average RTT.
To be honest, the concept of "terrible RTT" is somehow unintelligible. For some, 10 ms is considered terrible, for others - 100 ms. But, having contacted the vendor’s technical support, I found out that the provider’s “terrible” indicator defines the provider itself, and it registers it in the QoE configuration.
Problems with specific subscribers
I immediately found problematic subscribers who had serious delays. For example, more than 4.5 ms.
Here you can see the data of the client equipment, in this case it has TP-LINK. Also, the subscriber cable length and CRC errors are visible.
A little help, just in case: Circular redundancy check (CRC) is a way to detect small changes in data blocks. This type of error detection is especially useful when sending packet data over a network, such as SynqNet. While the packet error counter checks for missing or invalid packets, the CRC error counter checks the accuracy of the data in the packets.
It can be concluded that either the cable is broken or problems inside the apartment.
It is possible to derive trunk problems: for example, to segment subscribers into groups of areas depending on the time of day and RTT, it can be filtered and grouped according to various criteria:
- Number of CRC per week,
- Access switch,
- Trunk switch
- Subscriber unit vendor,
- Length of cable.
We will display a list of trunk switches, filtering by area. As a result, we will see the number of subscribers on the trunk switches. On one sits 99 subscribers, on the other 64, etc. Also, you can see the average delay on the trunk switches.
I was most interested in finding the most brake switches. As you can see from the screenshot above, this is the very first switch in the list with 99 subscribers. We can go to his data and see what's wrong with him.
We click on the switch, we filter by the criterion of the “Main switch”, grouped by the criterion of the “Access switch”. So it will be clear which access switch is the “worst” on a given trunk switch:
As a result, the worst access switch will appear (highlighted in red) - it has the largest RTT.
Now we go to this, the worst access switch and see this picture:
We see a lot of sessions with long delays. If you look at the subscribers on this access switch, you can see those who have errors on the port - these are problems with the cable. In the screenshot below, the subscriber number 12 is visible, with 898 errors.
Immediately, you can notice subscribers with a large RTT, for example, 10.5.
We go to the subscriber and see the following picture:
For every five-minute retransmitters, the subscriber has about 2% loss. Most likely, he needs to change his Wi-Fi router. This client just need to do.
Subscribers with steadily bad Internet
This is one of the main reasons that prompted me to try QoE. You can display all subscribers with consistently poor RTT and work individually with each one. For example, open the statistics in the list for subscriber # 3.
This subscriber has no errors on the port, cable 37 meters. Most likely, the trouble lies in the apartment subscriber.
As I understand it, it works like this: information is removed from the DHCP server on the MAC of the subscriber devices. Thus, all Wi-Fi-vendors are pulled out:
The Zyxel turned out to be the most popular and with normal RTT, with 9307 subscribers.
Top worst below, with RTT of 15.2 and below.
Found another feature that shows subscribers with the number of sessions.
Immediately visible subscribers with a bunch of sessions.
Let's enter subscriber number 1. In the section “Clickstream Logs” you can see how many devices a subscriber has at the moment:
As we see, the subscriber has 100 devices. Such a subscriber exactly resells the Internet. What to do with him? For example, we plan to transfer such subscribers to service as a legal entity.
Here, it seems, everything is simple: clickstream shows which devices subscribers use, which sites they visit, which browsers they use. This information is not so interesting to me, but it turned out to be necessary for our marketers. For example, they are interested in the following scenarios:
1) Selling our TV service to those subscribers who have a Smart TV. To do this, you can filter by "user-agent: SmartTV" and display the owners of smart TV. Then the matter of technology: calling customers or a letter with a proposal to connect the tariff with TV.
2) Search for potential dissatisfied customers who are interested in the sites of competitors. In the same section “Clickstream logs” we enter the URL of the competitor we are interested in in the “domain” line, and as a result, we get the following list:
Additionally, you can go back to the very beginning of my test and check for RTT quality (maybe the subscriber has a problem with his Wi-Fi router).
This information can be passed on to the marketing center, call center, they know what to do. At a minimum, they will communicate with subscribers about their satisfaction with the quality of our services.
There is a connection log function, with which you can determine how many virtual addresses are located behind one real one.
In fact, this graph shows the density of the NAT provider. This graph shows that NAT can still be compacted.
Here you can see TOP autonomous, applications.
You can look at a specific application in terms of the quality of communication, and from which autonomy it flows, for example World of tanks:
There is nothing unusual: it flows from GCORE, there are no special brakes.
You can go to the GCORE autonomous system and see what else pours on it from us: You
can also create an interesting filter. For example, show Russian AS, for which, the delay is more than 16 ms.
In other words, it is possible to understand where the peering goes through the West.
As a result, we get the list AS:
For my tasks as a whole, the product is suitable, as I could easily find all problem subscribers with RTT greater than 4-5 c indicating the reason (broken cable, viruses, etc.) and indicating “problem” areas - indicating the street and IP subscribers . I also want to note a useful feature - search for subscribers who are already considering their escape to competitors.
What I would like to see in future versions of the product is automation. That is, the system finds subscribers who have started to visit competitors' websites, then it would be more convenient for me to receive notifications in the mail about such events.
Even in terms of automation, it would be convenient if it could be integrated with our VoIP, so that when a “terrible” RTT occurs at a subscriber, our call center automatically calls these customers according to a predefined script.
But for the time being, we, as a provider, will have to solve the problem of churning out customers in a complex together with customer support and a call center in manual or semi-manual mode. In the near future we want to move from testing to implementation.
PS: If it is interesting, then I can tell you how we worked with subscribers who wanted to run away to competitors. And also about how we will integrate this product into our network.
In general, write to kammenti, which of these topics you would like to see as the next article, and I will send the material to the editors of the site - maybe they will agree to publish my creations.