Monday, September 8, 2008

Congestion Avoidance and Control

Following the major collapse of ARPANET due to congestion, several new algorithms were developed to further prevent congestion instead of congestion control. Particularly, the paper discusses RTT estimation, exponential backoff, slow-start, aggressive receiver acking, and dynamic window sizing. The paper points out that the main cause of the collapse was not due to errors in design, but more due to mis-steps in implementation.

The paper first introduces slow start, which focuses on getting to equilibrium. The idea is to increase the packet window based on the number of acks that have been received. Then once the upper limit is reached, the system will try to maintain this level of transfer, partially by use of the RTT estimation. In order to maintain this equilibrium, the new implementation utilizes congestion avoidance to adapt to the capacity of the network.

This scheme utilizes the same principles from the previous paper, mainly additive increase and multiplicative decrease by increasing cwnd by 1/cwnd on every ack (additive increase), and halving cwind on any loss (multiplicative decrease) in addition to exponential backoff. However, it does not use the suggested additional "congestion bit", but rather utilizes existing signals and packet loss to determine congestion.

This paper was much more well-rounded than the previous paper, and discussed more of the details of implementation and its direct application. The papers both share many of the same goals, this paper simply looks at all aspects of a tcp transaction and determines what needs to be done at each stage to achieve its goals.

Having the appendix for the RTT calculation seemed ineffectual - I would have preferred that the implementation details were placed into the body of the paper at the appropriate stage. Even though it may not have flowed as elegantly, it would have given a more complete view of the improvements.

This is really some of the early work with congestion avoidance over congestion control, and clearly this was only meant to be a starting point. Nevertheless, many of the techniques used today remain the same as presented in this paper, as they have stood the test of time. Perhaps to implement further congestion avoidance would require additions to the protocol instead of additions to the implementation.

Analysis of Increase and Decrase Algorithms for Congestion Avoidance in Computer Networks

This paper presents a largely mathematical analysis of different algorithms for congestion avoidance. They show that congestion control by itself leads to a steep decrease in throughput and a steep increase in response time. So they instead focus on congestion avoidance, i.e. using a preemptive measure to help prevent from congestion from occurring in the first place.

In particular the paper looks at behavior relating to a single bit of information coming from the other side - a "congestion experience" bit. It looks different kinds of behavior of a state going from no congestion to congestion, and what kind of state this leads to. The main goals were as follows:

efficiency - maximize throughput by pushing the state of the system as close to the "knee" as possible
fairness - equal bandwidth to all users
distributedness - doesn't require knowledge of the state of the system
convergence - the state of the system comes to a set state after oscillating between the congestion bit

The methods analyzed were additive increase/additive decrease, multiplicative increase/additive decrease, multiplicative increase/multiplicative decrease, and additive increase/multiplicative decrease. Through some thorough mathematical analysis, the paper shows that the only sound method to satisfactorily meet all goals was additive increase and multiplicative decrease.

The scope of this paper was fairly limited, focusing only on this one particular part of congestion avoidance. The depth of the mathematical proofs made it a bit difficult to read at times, but it does show quite well that additive increase with multiplicative decrease is the way to go. I would have liked to see the paper talk a bit more about information from the system other than simply the "congestion" bit. Dropped packets are left out (although presumably this could simply be interpreted as reaching congestion), and no other congestion avoidance methods are mentioned.

Clearly this is only one of the ways that congestion avoidance can be implemented, and I believe this paper was intended to be a snippet. I thought it weaved in the idea of convergence well, and the binary example does allow the convergence to be emphasized. The diagrams were also a welcome supplement to the mathematical proofs.

Thursday, September 4, 2008

On Inferring Autonomous System Relationships in the Internet

ASes connect to each other in several different relationships - either customer-provider, peering, or sibling relationships. However, companies are quite secret about what their actual relationships are with other ASes. This paper uses several heuristics to try to infer the relationships between ASes. The connections between ASes can be thought of as graphs, with each vertex being an AS and each edge being a relationship to another AS. The degree of a vertex can give an idea how big an AS is. Furthermore, customer-provider edges are directed, while peering and sibling edges are undirected.

Based on the nature of these relationships, ASes have selective export policies which can be observed and can be identified into different patterns which can help determine the relationship among ASes. After some initial tweaking of the heuristic, the paper was able to confirm its findings from internal AT&T information and from WHOIS databases. A portion of the incorrect findings are posited to be a result of incorrectly configured routers who are not following selective export policies.

Reading the Interdomain Internet Routing paper helps in understanding the different types of relationships between ASes, and the motivation behind selective export policies. These relationships, and thus the routing in wide area networks, are constantly changing as a result of the corporate nature of the system. ISPs can buy eachother out, ASes can change providers at any point in time, etc.

The paper notes that ASes themselves should probably confirm their routing policies against their AS relationships, as there were instances of misconfigured routers. It is interesting that the heuristic detected this, and it is somewhat surprising that anamolies like this have persisted. Furthermore, from the perspective of ASes themselves, they do not necessarily know the relationships of another AS, so by using similar techniques to this, they could potentially make better contractual agreements to each other.

Wednesday, September 3, 2008

Interdomain Internet Routing

Interdomain routing is concerned with connecting ISPs, or Autonomous Systems, of different sizes together to allow maximum connectivity on the internet. The protocol that allows these ASes to communicate is BGP. Connections between ASes come in two types: transit, basically paying for connectivity to another AS, or peer, where two ASes agree to forward connections destined for one another because there is roughly equal traffic both ways. ASes want to prioritize connecting their own transit customers who pay them, but want to avoid having to pay to connect them. BGP was designed with scalability in mind, with the idea that multiple competetors would be running the internet, so it allows them to enforce their own routing policies, but to enforce cooperation to ensure connectivity.

An AS is composed of edge routers that are connected to other ASes, and routers which are only connected within the same AS. Edge routers speak to other edge routers through eBGP sessions, and all routers in an AS communicate to each other using iBGP sessions. When eBGP routes are distributed to all the routers, the two main goals are to avoid loops, and to make sure all routers have a complete view of all routes. BGP uses attributes to decide which route to take. The first priority is LOCAL PREF, wihch ensures transit customers are given priority. Then by ASPATH length which is how many ASes are traversed on the route. Next MED is comared if there is a financial agreement, essentially this allows a paying AS to rank certain routes to avoid extra cost. If these attributes cannot alone decide a path, external routes are preferred, if not then pick the shortest internal hop, and finally a choice based on the IPs of the routers.

This paper was a lot of information, but it was quite interesting, particularly the fact that this system was designed with competing corporations in mind. The 'hot potato' principle is a bit bothersome, because it essentially means that in these peer situations, the route from A to B and the route from B to A are likely not to be the same. Are there situations where having these different routes could cause problems with different latency times etc?

Multi-homing was another interesting topic of the paper. Essentially the paper said that multi-homing is infeasible for smaller customers since routers typically don't keep records for anything less than ip/19. Also it is hard to specify a route as a 'backup' since each AS will do its own route computations and might ignore MEDs. The main hack to get around this is to pad the ASPATH with the same AS so that the main route is always preferred unless it goes down.

Monday, September 1, 2008

The Design Philosophy Of The DARPA Internet Protocols

This paper gives the reader a perspective not always seen when regarding an architecture - not just the final result, but the reason that the design ended up that way. The initial goals for the creation of the packet switched network were responsible for many of the design decisions that shaped the internet. However, since the DARPA project was primarily based with regards to the military, those goals were quite different than those from a corporation might have been, placing durability first and accountability last.

In order to maintain durability given a network node was lost led DARPA to remove state from such nodes, instead transferring state information to the systems themselves. The project also evolved to support multiple types of service after it became clear early on that certain applications would prefer to forgo reliability in place of having speedy delivery and an occasional dropped packet, leading to the separation of TCP and IP, and the creation of UDP. From the very beginning, the purpose of this project was to join many different types of networks together, thus a minimum standard was set for a network to be able to carry connections. There were other goals, but they simply did not get the same attention as the primary goals, and many were only approached as an afterthought.

Since the original intent of this project was largely for military purposes, it is quite interesting to see the consequences of a military focussed protocol being used by the public. Accountability was really an afterthought of the internet, most likely because the military could assume that all partners would cooperate. However, in the consumer world, there is a much higher lack of trust, and potentially many more malicious types.

The writer also talks about the 'next' protocol and what considerations should be put into it, but at this point, can there ever even be a next protocol? IPV6 has been around for 10+ years and it has barely gained any ground in that time period. IPV6 is only one part of the protocol - if IP alone can not successfully be updated, then any bigger changes are surely going to be exponentially more difficult to introduce. We have already used these protocols for a lot longer than the original designers intended, and the future does not look any brighter.

This is a great paper because it gives some context to the protocols that are standard today. You can never quite escape your past, so you might as well find out why. After all, I just read that the width of train tracks is directly tied to the width of two horses!

End-To-End Arguments In System Design

The "End-To-End" paper talks about a subject which is largely taken for granted today: separating complexity from lower levels. Using the main example of a reliable file transfer program, the paper shows some of the pitfalls of putting too much complexity into the lower levels. It opens with the definition of the "end-to-end argument" as an argument against over-implementing the lower levels of an architecture, offering instead that the implementation should be put in the layer where it is going to be used, which in this example is the application layer instead of the (lower level) communication layer.

The file transfer example illustrates how even if reliability was built into the communication layer, the application layer would still have to verify the transmitted data since other points of failure along the way could have dropped or corrupted data. Therefore, as long as the failure rate of the network is not too high, reliability should not be built into the communication layer, and instead will be part of the application layer, since the it already has to deal with data losses elsewhere. The paper then offers several important examples of mechanisms which could be put in the communication layer, but the end-to-end principle shows problems that could arise if this was done.

The end-to-end principle has come a long way since this paper, and it is easy to point out numerous areas to which it applies in today's world of hardware and software. Our networks today follow this principle, where instead of just a communication layer and an application layer, we have many layers in between, each offering varying levels of function. Each layer offers tools for the layer above to use, hoping to reduce redundancy by offering building blocks for higher layers. But the basics of our current system all have ties to ideals set in this paper.

One of the most important points of the paper was identifying the ends. The paper gave the example of a strong end-to-end argument in digital voice communication where reliability would introduce delays unacceptable for a conversation, and offered a counter-example of a speech message system, where you would want reliability built in to the lower levels. This is an interesting example of what is known today as the transport layer. Today, the digital voice communication would use UDP, while the voice message system would use TCP.

This paper opened the door on end-to-end communication, and it left if open for future research into what the right balance of simplicity and complexity together with cost. Today, that has evolved into our multi-layered communications, which offers many different layers, each with a well-defined set of parameters.