QOS Implementation details with RouterOS
About 2 months ago, I began experimenting with an approach to QOS that mimics much of the functionality of the NetEqualizer (http://www.netequalizer.com) product line. As I was experimenting with some various techniques for limiting bandwidth utilization, I realized that the scope of the project I had undertaken was WAY more than I had initially bargained for. I dedicated more and more time to this project, however, because I was seeing some real results from my tests. While most of my articles here have been tutorial in nature, this one is a little different. I have a lot of time invested in my approach to handling QOS on a network and have made this a commercial offering. I will attempt to describe some of the functionality in this short article.
My goals for the system I set out to create were as follows:
- I wanted a system that handled traffic “automatically” based on “behaviors” rather than simple protocol matching
- I wanted a system that would allow an administrator to determine what traffic was most important to HIS/HER network and provide for easy manipulation of the way the traffic priorities were handled
- I did NOT want to set up artificial speed limits on any traffic
- I wanted to ensure that the priority system did not disturb the operation of any specific traffic types of traffic. Rather, I wanted to structure the prioritization in such a way that permitted the most interactive traffic to have priority over non-interactive traffic.
- I wanted a system that worked on the MikroTik platform, but could be ported to work on other Linux based platforms (iptables). This goal required that the entire application had to rely entirely on firewall (mangle) and the queues.
I did end up meeting each of these goals. Before I continue, I must say that the approach I took is not necessarily a “new” approach. It is, as far as I know, unique in the MikroTik world, however. If anyone else has created a system like this, I’d be happy to exchange notes with you.
There are several issues to consider when you are building a QOS application. This is especially true when you are building an application for the MikroTik (http://www.mikrotik.com) platform. The only reason I say this is because many of the MikroTik users are using the RouterBoard (http://www.routerboard.com) products. I kept the limitations of the RouterBoard products in mind, as I set out to build an implementation that uses as little processor as possible, while still offering the functionality I was wanting. In the end, I had to compromise a little on the CPU question. It is certainly possible to support a very small number of users behind even the smaller RouterBoards, but the truth is, that the circumstances in which such a small number of end users would need such an advanced approach to QOS are pretty limiting.
The primary limiting factor with the approach I was taking to QOS on the RouterBoard platform is the packet rate that I was trying to limit. It is not too hard to manage an aggregate rate of 1000 packets per second, but most of the folks I consult for are moving a significantly higher packet rate. My initial testbed routers were an RB433AH and an x86 box at 1GHz and 1Gig of RAM. Once I had the final system in place on these 2 routers, I could generate about 4000 packets per second and the RB433AH would begin to steadily increase it’s CPU load. I got to about 9000 pps on the x86 platform before there was a significant increase in CPU utilization. These 2 numbers gave me a sort of “base” for further testing.
Another factor that I had to consider is the question of what traffic “deserves” to have priority. My main goal was to create a system that required as little hands-on work by the administrator that purchased my system. At the same time, I wanted to offer enough flexibility for those that like to “tweak” a lot. I will describe my handling shortly, but it is important to understand that my solution is not intended to be a speed limiting solution. I believe that the purpose of a good QOS implementation should permit full access to the available bandwidth, while ensuring that certain types of traffic that can have a negative impact on user experience are contained during times of peak network usage.
One final factor that must be considered is how we handle the actual traffic. What I mean is, which type of queue discipline should be used. MikroTik offers FIFO, RED, SFQ and PCQ. Of course, the system was to be an HTB structure. In my mind, there was no other choice other than PCQ. PCQ (per connection queueing) is a queue discipline that is VERY fair in how it distributes available bandwidth among it’s individual “sub queues”. The PCQ allows you to define the speed that any individual “classifier” (source address, destination address, etc.) is permitted, as well as the option to define how much bandwidth is guaranteed AND maximum aggregate speed for the whole queue. SFQ and RED queues got some initial attention, but I decided that PCQ would give us the best results, since the target was to ensure that all comers in any given queue should have equal access to the bandwidth.
Because of the way the HTB is structured, it is important to understand that this type of application is intended to sit at various aggregation points on a network. In other words, it will fit at a POP location with just a hundred or so users, or it will sit at the head end of a network with 2000+ users. The primary difference in these 2 applications for my QOS system is, of course, hardware requirements.
Having laid down my goals and taken everything into account, here is the basic operation of what I have created. First, the queues:
- There is a defined amount of available bandwidth for upload and downloads
- The HTB structure is 3 levels deep with 2 (or more) interior classes each having 8 leaf classes. Each of these interior classes is rooted at the parent queue assigned to manage all interface traffic.
- This structure gives us the ability to define a guaranteed bandwidth for 2 “types” of classified traffic.
- We have an “interactive” class, which is traffic that we are considering interactive. This will include such traffic as email, dns, voip, http (bursty, not long downloads), etc.
- Additionally, we have a “non-interactive” class. This traffic will include things like P2P traffic, large http streams, connections resulting from users creating high connection counts, etc.
With this structure in place, we now just need to classify traffic in mangle. The mangle rules will “watch” traffic and classify certain protocols and pass these direct to the queues. This includes traffic such as voip, pop3, smtp and others. This is traffic that can be readily identified and is rarely a root cause of network issues. (NOTE: I am aware of “stealth” p2p traffic that will try to hide among these ports, but until it grows to be a significant issue, I have chosen to not add load on the system to try to find these programs.) Other types of traffic, such as http downloads must be examined more closely in order to more accurately identify is as interactive vs non-interactive. One problem we have here, is that a streaming video from NetFlix (for example) will not be easily distinguished from , say, an ISO image of your favorite Linux distro. One thing these 2 have in common is that they both involve a large volume of data flowing inside a single stream. They are, however, different with respect to their level of “interactivity”. Because of this, we classify this type of traffic as a high priority, but give it less guaranteed minimum available bandwidth than the known interactive traffic. Another type of traffic that we look for are streaming video from sources such as Hulu. Again, this traffic is given high priority, but in the non-interactive interior class. Finally, we have traffic that is typically a “problem” type of traffic. This classification includes such things as Peer to peer, bit torrents, viruses and such. While I do not go to great lengths to specifically identify viruses or torrents, what I DO try to identify is a traffic pattern that is common for this type of traffic. This traffic is put at low priority in the non-interactive interior class.
Well, I have tried to describe plainly what my system attempts to do. It should be noted that we are NOT picking any particular content provider’s stream and reducing priority. We ARE allowing the firewall to make intelligent decisions based on the behavior of the network traffic it sees. I did mention the names of some sites above, but those are just examples of the TYPE of traffic that we are identifying. This script, complete with installation, is $200 (You can purchase it HERE). Finally, if you have any questions at all, feel free to EMAIL or give me a call at 573-276-2879.
As always, please DIGG this article if you find it of interest.
June 14th, 2010 at 2:54 pm
[…] first wrote his own implementation in November (and announced it on his blog) and says he’s deployed it on about 200 networks by now. Whether you choose to use […]
May 6th, 2014 at 6:41 pm
[…] This breakdown is similar to that found in some netequalizer, packeteer, and other QoS appliances and was first bought to my attention by Butch Evans […]