Archives
Monday, 3rd of November, 2014
Plenary session at 4 p.m.:
BRIAN NISBET: Hello. And we are reasonably sure this session should be Dalek‑free. So, as we start up the climbing up the roller coaster of RIPE plenary sessions into the second session of the day, we have two presentations and three lightning talks for you this afternoon to keep your brains awake after all the travel you have been doing today and over the weekend. I am Brian and I will be chairing this session and obviously the mandatory plug on behalf of the PC is to rate the talks and I think if the RIPE NCC weren't planning on giving prizes to the people who rated the talks already, Filiz shamed them into it earlier, so we are reasonably sure there will be some sort of reward there and you can bring it up as a member of the GM as some sort of flagrant use of money on Wednesday afternoon.
First off this afternoon, we have the Alan Mauldin on the flat earth theory: Convergence of prices around the world.
ALAN MAULDIN: Good afternoon. So today's take is going to be called the flat earth theory, convergence of prices around the world. So let's, what is the it? In the old days, you all thought the world was actually flat, until the ancient Greeks proved it is a sphere but some folks thought, no, maybe it's actually flat, so they made these maps showing that the earth could indeed be flat. This trend has continued to the modern day with authors like Thomas Freedman who say the world is flat. Obviously, he does not mean the world actually is flat but using it here as a metaphor to describe the fact with the technology and trade the world is becoming flatter and closer together and a level playing field in many ways. But is it really that way? So I am going to try and look at the flat earth theory as it relates to global talk on prices.
So one obvious point is prices are always going down both for transport and for IP transit.
And have prices become flat meaning more similar across the major regions and routes, and if so, why is this happening? Or can we figure it out? And if it is happening, what are the prospects for even more flattening to occur in the future?
So let's get started with this analysis here. First, some words of caution: TeleGeography is a market research company. We get our data from operators around the world on an anonymous basis but the prices that I am showing here, I am going to be showing a median price for a single unit and a one year term, as they are important distinctions here because there is a broad range of prices. Prices are much lower than what I am going to be showing and higher. There is discounts that exist for volume and term that can lead to a much lower price, and finally, discounts also exist depending on who you are and the value you bring to the relationship.
So first let's look at transport prices, I am looking at sub‑C transport actually to be clear. So here is a chart looking at the median monthly 10 gig bit waive length price for the past six years, clearly they are all going down. So before we look at are they converging in any way, let's look at why are they different from the outset, what makes prices different.
So, here is a few factors, four that I have identified, so the first one is length. So in theory longer cables require more fibre or use more power, they have a higher maintenance cost as well. So that should lead to a difference in the price. Given the cables have given lengths. Competition is a factor. So on every route in the world there is different number of cables and different types of players and that has an impact on the level of price. So more cables should in theory lead to more competition, and thus, lower prices.
Demand. So the demand levels that exist around the world clearly are different and cables that have higher demand and more capacity have be sold lead to lower unit costs, sell at a lower unit price.
And finally, a little bit harder to get into is network configuration. So looking at a route like New York to London or LA to Tokyo, point‑to‑point links, there is no differences required for those routes but from Miami to San Paulo there is not a current direct link, you have to go either through St. Coy and to make that digs so you are going to have more equipment, it's going to be more expensive because of those extra hops you have to take to reach those locations.
So, overall, all cables are not the same; they have different landings, different number of fibre pairs, can be different as well.
So, prices do appear to be narrowing. This chart here, if you can see it, it's showing the 10 gigabit wavelength least price multiples over London to New York comparing other prices to that route. So the blue bar is 2008 and the red is 2013 so you can see that on these three routes LA‑Tokyo, Hong Kong Tokyo. There is a substantial narrowing of a difference that the London to New York route had. So prices are getting closer to each other.
What about other routes. Let's see, so here is even a more extreme example: London to Johannesburg, really has come out a lot as Michele mentioned in her talk, lots of new cables and new competition, has led to a sharp reduction in the transport prices to South Africa, London to Mumbai also a similar type of drop in the wavelength prices as compared to London to New York.
So, let's try look at why our prices converging. So could it be the link that is somehow range changing? Not really. There is not really changes in the length of cables that are happening. There is not a substantial enough difference that is going to be a reason why we are seeing prices converge. But moreover, pricing on many routes is not based on length at all or distance, it's on the incremental cost of upgrades which does not factor in the distance at all.
Let's look at distance versus price in the last year, so this is showing length versus price multiple over London to New York. You can see that LA Tokyo is about 30% longer than London to New York, the prices is over two times higher. My mate in San Paulo, same length as London to New York but the price is seven times higher still. And finally Hong Kong‑Tokyo it's shorter than London to New York but the price is two times higher. There is no clear trend here of how price is relate ‑‑ how distance is relation to price on these routes.
So competition. That is a reason, I think, of why prices are converging. We are seeing increased competition, a large number of new cables in Africa, Asia both within the Asia area itself and across the Pacific. There has been no US Brazil cables recently but prices have moved lower, largely because cables are coming there, and the market is rearchitecturing in advance of these new cables, so there will be three new cables over the next ‑‑ just over two years.
So the impact of new cables really varies by routes. This chart here is showing is the source of newly capacity by route over the past five years. Looking at the far right going to the left you can see where there has been new cables on the route in subs Africa, Europe to Asia via Egypt, the blue bar showing a larger contribution of new cables capacity to those routes, in inter Asia Transpac also have a bit of blue there. There have been no new cables in Latin America or transatlantic in quite some time but that is going to change, though.
So demand, the third possible reason why prices could be converging. So yes, there is a difference in demand. So the demand growth has varied substantially across different routes. So whilst it's somewhat low in the Atlantic, 20 percent, it's much faster between US Latin America. So what this leads to is you are seeing increasingly similar volumes of lit capacity on cables which then leads to a similar cost basis and in theory, should lead it a similar price being charged in the market for that capacity.
So here is a chart I did showing the average capacity per cable segment, so it's looking at US, UK, Japan and to Brazil seeing how it's varied between 2008 and 2013. It's been very similar between US to UK and US Japan but US to Brazil can where it's really changed, so it was about a third, had a third less ‑‑ lit in 2008, and now it's getting toward the cables there almost have a similar amount of lit capacity as they do between the Atlantic and the Pacific.
The final point I want to look at was network design changes and how is this having an impact on prices converging. So, with the addition of express paths, that is reducing equipment requirements, so that in theory is going to lead to lower prices. Also cables are having better integration of the back hall which helps to reduce the city to city prices as well. So this is a map to see some examples. This is our ‑‑ world cables. This is showing the AP C and 2 cable built in 2001, to go from Hong Kong to Tokyo here, there is not a straight path, you have to hop through some place to get there, which leads to extra cost for that capacity. Let's compare that to a modern cable, a newer kind of cable, let's look at the SJC. This surfaced last year, this is a modern trunk and branch top design so you see there is express fibres that link ‑‑ the locations, the branching units are optical, which allows for a efficient transmission of the waves between the locations.
So what is the outlook for more conversions to happen among prices. Well, in terms of the demand growth, you know, when we are seeing more ‑‑ the more rapid demand growth does help to lead to those routes having greater pace of price erosion so those routes will continue to narrow compared to the major routes. We are seeing new cables coming into service, still. There is lots of cables coming in across the Pacific, two new cables between Europe and Asia, three to Latin America and one in the Atlantic coming next year so there is cables everywhere. And that should help to fuel more price erosion.
And the way the cables are being built is going to help with a reduction in the cost so you are seeing more use of express fibre pairs as well as more landing into co‑location sites which does help to reduce the equipment costs, and should lead to price reversion as well.
Ultimately I think the variations in the unit costs are going to be different on all routes, so I am not sure we are going to see price parity but I think the difference as we have seen it, has been narrowing and probably will continue to do so for some time. But I think we are pretty far from a singularity happening with global transport prices at least.
Let's turn now to IP transit, a bit closer to you guys' hearts and minds, probably. What is unique here is that there already is flatness to some degree. This chart here is showing the median 10 gig IP transit price 2011 to 2014 for a variety of cities in Europe and in North America and they are roughly the same. This is a median, so you can't see differences depending on who is servicing which markets but they are roughly tracking in the same direction.
Let's look at London transit compared to other cities around the world. So same time frame, 2010, 2014, you can see here the price on top is Sydney, Ray I can't, Istanbul, finally London so there is level ‑‑ there is a change of prices that are happening here. But the key question is, not are they going down, we all know that, is, are they converging at all?
Not exactly, it seems, which is kind of interesting, so we are seeing that the price difference between comparing the prices to London, the difference has gone down for Istanbul, for Sydney, Singapore is gone down a little bit, bin's and airs more expensive now compared to London than it used to be.
So, I try to identify why this could be happening and what is going on and why we are not seeing convergence. It's really that the role of transport here plays a big role so outside of the major US and European cities, IPT is often a function of transport to one of these hubs and then IP transit in one of those had you been cities. So the higher cost ‑‑ the higher transport prices in some cities leads to higher IPT prices as well.
There is not the same level of competition in every market, so while the US and Europe have very competitive and very low cost providers, which is the key, they are not everywhere, and so that does prevent you from seeing some of these bottom of the barrel prices that it's possible you have been seeing in the major hubs.
Finally, this key one is unfavourable regulatory environments, and tied into that is expensive back hall and cross‑connects further help to increase the price of IP trance its in some of these cities compared to Europe and the US.
So to conclude here, transport prices, they do seem to be narrowing across many of the routes but price parity, it may not occur. There is just a difference in cost base and the lengths which in theory shouldn't matter and prevent from becoming a single transport price around the world. For IP transit, yes, they are going down as well but outside of the major European and North American cities, there is ‑‑ the parity of prices is not really near at this time.
And I want to leave you with a public service announcement for RIPE attendees. This is pretty important, you might want to look at this. They are taking new members right now. You have to get in now, because you will have T‑shirts, they have T‑shirts, that is the front, and here is the back. If you can't see in the back it says "ever been to space? No? Then STFU." Even if you don't believe that the earth is flat, maybe you want a shirt. That ends the talk so I am happy to try and answer questions if I can.
SPEAKER: Thank you, that was awesome. Any questions? I see four empty mics. And three empty mics.
GEOFF HUSTON: Thanks for the talk. I actually don't own a submarine cable, I would love to, but if I did, I suspect I'd have two kind of cost components; the investment bank that is so could have dent in my abilities has loaned me the capital money to actually build the thing, and then at the same time I have got this other operating cost. Now, the operating cost is largely independent of length, isn't it? Because it's kind of power and not much else, really is it? Because it's under the ocean
ALAN MAULDIN: The length of the ONM is tied to the length of the cable had a has to be maintained. There is some link that matters for the cost of the maintenance.
GEOFF HUSTON: The billion dollars is gone into buildings...
ALAN MAULDIN: Sure.
GEOFF HUSTON: Wouldn't the retail price reflect how long I have to pay back the banker? So if I have only got five years to pay it all back, I am expensive, if I have got 25 years, I have cheap. So, why do we think longer cables means more expensive prices? It shouldn't, should it?
ALAN MAULDIN: Well, just in terms of the cost of building it, I mean there is a difference, a short cable, long cable. There is a difference.
GEOFF HUSTON: If cost over 25 years, then realistically, the price each year is actually incredibly low and therefore, I should be able to price it lower. What I'm arguing, I suppose, is, Australia is really, really bad, because quite frankly, if we had the right financial environment, I could build a cable from Australia to the US that could be a cheaper than a transatlantic, I just need the right bank, true?
ALAN MAULDIN: If you think so, yeah. Go for it. Build your own cable. Chair anyone here have a bank? Any other questions, anything from the Twitter account. We have a lot of empty mics and sometime. Cool. Is this data going to be available anywhere for people who want to start paring it with other data sets?
ALAN MAULDIN: The slides are available, yes. For sure. That is a lot.
SPEAKER: So you talked about sea cables. What about land cables. For example, I understand there is a cable through Siberia?
ALAN MAULDIN: Yes, there are several cables that go through Siberia linking China to Europe, yeah, but I didn't cover that. I haven't looked into trying to figure out what would determine the variances of price on terrestrial and what could be causing that so maybe next time I can do terrestrial.
SPEAKER: I guess one big difference in the ocean you don't have to go to the municipality and ask for the rights to dig through the streets, right?
ALAN MAULDIN: Well, you have to get permits as well to lay in the waters of various countries which takes a lot of time and can be very costly as well.
CHAIR: I see another question.
AUDIENCE SPEAKER: Really interesting presentation. I am from Limelight Networks. I will start be answering the previous question. Look at building a cable from Australia to Singapore and see why it takes so long. Indonesia with the politics. It's something you can ask me in the bar. Coming back to Alan, have you looked at pricing in secondary markets like, you know, in the slides you have pricing for Hong Kong to Tokyo, Singapore‑Tokyo, those have the demand and there is competition, but what about going from Singapore to, say, /PHAO*EUPB mar or Thailand or, say, India? Right, because this is where the demand is growing but in my own experience, the pricing is completely up in the air; it can be pretty low or it can be pretty high. And there is, I can't seem to find any reasoning behind that whole thing. And probably in other parts of the world where secondary markets pricing.
ALAN MAULDIN: Yes, I think it's going to demand on the country, because you have got different number of cables serving different routes, the environments are totally different, the backup prices are crazy in places so I haven't looked at that to see is there convergence happening in those spots as well. I think some might have some convergence happening but it's probably you seeing those countries that you deal with there is not much happening there so there is room for prices to hall if things got straightened out with regulatory environment perhaps.
AUDIENCE SPEAKER: I think we would like to see more secondary marketing all over the place rather than the trunks:
CHAIR: Yes. Anything else before we give a big pound of applause to Alan?
(Applause)
So now ‑‑ has anyone out there ever wondered how asymmetric the Internet is because we may have your answer? I would like to welcome Wouter de Vries. And Jose Jair Santanna to the stage to talk a little more about that. Thanks.
WOUTER DE VRIES: I am here from the University of between the, from the design analysis of communication systems group. And Jose Jair Santanna is there in the audience.
And I will talk about the asymmetry of the Internet. To introduce the subject, I started a couple of months ago to try to find a way to mitigate DDoS attacks in a bit more efficient way. Is there anyone here who knows what a DDoS attack is? With a show of hands. I will assume that everyone knows. And we tried to look at a way to mitigate the DDoS attack more closely to the source of the attack instead of at the target. And the reason for this was made more apparent at the beginning of this year when large DDoS attacks hit Spamhaus and Cloud flare at 300 gigabits and 400 gigabits of bandwidth, which was basically so large that no one can mitigate that and the entire Internet was kind of disturbed by it.
And this is the idea. The attack intensity becomes higher when it reaches the target so you see that here. And we have on the left the attacker, and in this case, this illustrates the hard DNS reflection to amplify the bandwidth. And that means that IP spoofing is not an issue in this case. However, when we tried this and performed a few trace routes to sources of DR Dos attack and we send this to our friends in the University of Rome, we told us have you considered the asymmetry of the Internet which means this is not quite as easy as it seems. The reason why we do this on the AS level is because to cooperate with different networks and to, when we know an attack can be blocked, we will contact a network that is responsible for it and have them block it there. And the AS is quite convenient for this because we can link it to administrative entity trivially and so that is quite convenient the other is we can link routers that belong together very easily. The disadvantage we lose a lot of acsee, for troubleshooting purposes. If a certain router is causing trouble, this analysis is kind of useless, actually.
So what is the problem? The Internet is asymmetric, that has been determined in previous studies, the ones that I have cited here, but what they do is they give us an average number of ‑‑ an average percentage for the asymmetry and varies between 20% asymmetry and 90%, something like that. But for our use case, we wanted to see exactly how asymmetric it is and maybe there are a few places we can mitigate that are still quite symmetric. Maybe close to the source it's a bit more symmetric, as is shown in this figure here.
Our expectancies, that the Internet is more symmetric, close to the source and to the target, and because we want to mitigate the DDoS attack close to the source and we expect it will be more symmetric there, that will be good for us. So we want to investigate if this was actually the case.
We did this using the RIPE Atlas network that is provided by RIPE NCC, I think, and they have somewhere around 7,000 probes located worldwide. When we started these measurements it was 6,700, now I think it's closer to 7150 so it increases quite rapidly and we selected 2000 pairs of probes from the probes that are available and we celt up trace route measurements between them. Now, to do these 2000 pairs we had to ask the RIPE network to increase our limit on the number of user defined measurements from 100 to 4,000, and they agreed to that, so thanks for that because it made our research possible.
So, these are the probes that we have selected, and the red lines indicate what the pairs are, basically. And as you can probably see, and what is more apparent in this table, is that it is heavily skewed towards Europe and this is because it's a RIPE NCC project and they distributed the RIPE Atlas probes in Europe, mostly in Europe. We tried to mitigate that effect a little bit by selecting long paths, so that the pairs had a long distance between them, and this kind of caused us to select a little bit more probes in North America and Asia than we would have otherwise. Heavily skewed towards Europe.
Our experiment set‑up: We did these measurements using trace route, of course, because that is the simplest way, and we held them for ten days. And eight times per day, so every three hours. The reason for this is to filter out any effects that the time might have on this. We didn't know if there would be any, but to be safe we did it this way, and that resulted in theoretical maximum of 160,000 measurements, which is gigabit of phrase route data. The measurements by the are conducted on the IP level, so we needed to convert those to an ASN number. Now, provides a convenient table for this or CSV file and we decided not use that because I like to do things myself, I downloaded BGP routing cables which are very difficult to analyse and I wrote a small tool to analyse that or to process that, basically. So, these tools are available on‑line and they are OpenSource so I encourage you to look at that and criticise me and my work. Because they are probably not quite flawless.
Anyhow, this tool accepts an IP address and it gives me back the AS number. And this is what we use to process the results in the first place.
The paths on the IP level are about 70% ‑‑ no, I could say, yeah, it come presses the result set about 70% so usually there is more than one router from one AS involved in a path. And on average, these paths are about, contain about five hops and in this presentation when I talk about hops I always mean hops on the AS level, so not on IP level. And five hops is about average.
In the end when we ended up with the results, we ended up with 120,000 measured network paths so a lot less than 160,000 that we predicted which was caused by, well, malfunctioning probes or probes that didn't do anything or maybe our experiment set‑up wasn't quite as good as I made it to be. Anyhow, this is what the number that we actually have and what is also, these results also are available publicly.
In the results set there is 2275 unique ASs, 1717 contain one
SHAWN MORRIS: Probes. The results we see only 12 .6% of the results are symmetric, so that means that route symmetry is extremely ‑‑ occurs extremely often as to what I assumed when we started these measurements.
So we will delve a little deeper into these results using some graphs. First off, the stability over time of the paths which we wanted to know because sometimes we want to analyse the data a little bit after the deed, so to say and these graphs will show that the network path is stable so when we do this analysis after the actual attack occurred, then it will still be valid for quite some time. We did this using the leave enstein testing and this is an algorithm that is actually used to compare strings to each other and see if some words are similar or not. And what it actually does; it counts the insert change and delete operations between two strings, or in our case, two paths, and that is the number ‑‑ that is the distance between them. And that gives us this graph. We normalise the eleven stein outset with the path length so this number is between zero and one and we see at the top it's around 0.14. So this is quite low. And we can say that network paths barely change over times, quite stable. And we expected this so this is quite good. For us at least.
Then, on to asymmetry, we wanted to count the number of equal consecutive hops and this is counted from each side that are equal. And this is interesting because if you remember, we wanted to block the DDoS attacks near the source so if this average number is something high, then we can actually do that and if it's very low, then we can't, and we don't have to continue this research.
As you can see here in the lower figure, that is the way that we counted it. And in the graph, we did something a little bit different because I also subtracted the AS number ‑‑ the ASs that contain the source and target because we know this is always the same so it shows better results than that there actually are. Here, you can see that only four paths of length six and higher, there is one more hop on each side that is equal. Yeah, that is one extra chance to mitigate the DDoS attack but only in those cases and if you remember path length five is the most common one, which only shows ECH of 0.79 which is quite low and the standard deviation is also very high which means the results vary quite wildly.
Not that we know how many are equal consecutively, we also wanted to look at how equal they are considering their position. So we did that in this way. For each hop we looked at the same hop in the reverse trace and we considered them equal or not. And the average of this we plotted in a graph, which can be seen here. As you can see this shows a nice curve, which is what we expected, but it also shows the results a little bit better than they are because if you subtract the two hops that are at the beginning and the end, then you will see that actually the curve is not that great and the results are actually almost the same. In any case, for each hop, on average the quality is 0. ‑‑ 60%, actually, so a little bit better than 50/50 but to mitigate a large scale DDoS attack we were hoping for a little bit more.
In conclusion: The Internet is mostly asymmetric. This means that our proposed solution for mitigating DDoS attacks is probably needs quite some work on actually determining the paths, which is going to be difficult. The paths barely change over time so if stability is certainly not an issue, we can do the measurements or analyse the results weeks or two weeks later and it will be more or less the same as if we had done it realtime. We have at least one more chance to mitigate DDoS attacks when the path length is higher than six or seven. Well, the data, we perform these measurements using RIPE Atlas network, and we made them available publically and for that you need the measurement IDs to access them and the measurements IDs are are available at this link when I have inverted into QR code, because everybody uses it, and also using the new interface from RIPE Atlas you can search for symmetry measurements and it will show up and you can start playing with that right away.
So, are there any questions or comments?
(Applause)
CHAIR: Before we get to questions, I apologising for interrupting. I got a really urgent message here so I want to communicate that. I hear we have a birthday. Whose birthday is it? Happy birthday, Wouter, a great way to celebrate.
(Applause)
BRIAN NISBET: And now on to questions.
RANDY BUSH: I am a little slow this afternoon. Could you explain why the asymmetry makes it easier to mitigate DDoS?
WOUTER DE VRIES: Actually, it makes it a lot more difficult.
RANDY BUSH: Or more difficult?
WOUTER DE VRIES: Well, because we ‑‑
RANDY BUSH: Makes it negatively easier.
WOUTER DE VRIES: Because if we want ‑‑ when you are the victim and you are possibly the one who wants to stop the attack, then from that point you want to measure, you can measure the attack, the incoming attack and you want to determine who is attacking you, so you start the trace routes from the victim, actually, and you send trace routes to all the attackers, to determine the paths that they use to attack you. But if the Internet is asymmetric, that means that the paths that you measure using trace route are completely different from the paths that are used to send packets to you. If you try to mitigate the attack using the information that you got from trace route then that will be completely useless.
AUDIENCE SPEAKER: Jen. When you calculate yourist path do you take into account sub hops might belong to Internet Exchanges? So basically some ASs are not in the data path?
WOUTER DE VRIES: Actually, no, we don't and we had a comment about this earlier, and for future work we will try to exclude the Internet Exchange points.
JEN LINKOVA: I assume your measurements were about legacy before ‑‑ what about IPv6 one?
WOUTER DE VRIES: That is the same story, future work we want to work on IPv6 but as of now, most attacks ‑‑ or acutally all attacks are occurring on IPv4 so that is what we focused on.
JEN LINKOVA: Thank you.
DAVID FREEDMAN: All attacks are occurring on v4, so two that I had last night on IPv6 didn't happen then? The point I was going to make was really, from my experience, imunable to trust the source addresses of my attackers, which means that the work, whilst very interesting, may have quite little validity or interest for me because anybody right now that I look ‑‑ I try and pinpoint and say righting that an attacking network I need to speak to them, number of cases I found that not to be the case.
WOUTER DE VRIES: That is a good point. However, in our group we are focusing on reflective DDoS attacks and in that case, the source of the attack is not spoofed, actually. So, because ‑‑ the reflector is not spoofing its own source address, and you can block it after the reflector, actually.
AUDIENCE SPEAKER: Right. So you are only only focusing on where the reflector is and not the true, true source.
WOUTER DE VRIES: It is the true source but the attacker might be spoofing his address but we are really looking at the source of the bandwidth, actually, the data.
DAVID FREEDMAN: Thank you.
RANDY BUSH: IIJ again. How do we mitigate these attacks? They are generally two schemes; one is what is actually practiced today, which is some DDoS mitigation service sync my address, clearing it and passing it on to me, which this is no help.
WOUTER DE VRIES: No this is completely different mitigation, would be a completely different mitigation solution.
RANDY BUSH: Correct. The other is what has been researched and proposed and some stuff ten years old, and there is one that is ‑‑ a son of that or daughter that has been proposed today, which is essentially I propagate to the next‑hop I see it coming from.
WOUTER DE VRIES: Yes.
RANDY BUSH: And recursing forward, so it's hop by hop as to where this next‑hop source is, so indeed this is, again, not highly relevant.
WOUTER DE VRIES: Then I would have to say this is more ‑‑ from a practical perspective and the methods that you talk about are, well, they work in theory but in practice they can be used currently.
DANIEL KARRENBERG: I am the chief scientist at the RIPE NCC and I have more a comment. Can you go back one slide. I think it's worth noting, and I commend you doing that, that something significant is happening here, is that there is research‑based on data that is really publically available and that will be publically available for some time to come, due to the funding that this community provides through the RIPE NCC. You can actually get the data that Wouter based his research on and verify his research, or do your own research‑based on exactly the same data, and in fact, what this URL gets you to is just a number of integers that lets you actually get the source data and I think that is worth noting and I would like to see, personally, as a scientist, more people doing this kind of stuff, rather than having data on DNA and whatever.
WOUTER DE VRIES: Thank you.
BRIAN NISBET: Meredith is very happy about that notion, dancing at the front here.
AUDIENCE SPEAKER: I am from RIPE NCC, I have a remote comment from remote participant, Alison Wheeler from the creative organisation ‑‑ the creative organisation, the comment actually reads, sorry ‑‑ I am getting more comments, I will give you of source the Internet works better because its of asymmetric, because it is asymmetric, physically more stable, but not for discovering and mitigating DDoS attack issues. Thank you.
WOUTER DE VRIES: Was that a question.
BRIAN NISBET: A comment I think. Any other questions or comments or thoughts? I think I would definitely agree with Daniel on the fact that it's great that the data is out there and so people can look and people can see themselves rather than having to ‑‑ can check it. So thank you very much, enjoy your cake.
(Applause)
So, the last third of the plenary sessions this afternoon is three lightning talks. As a reminder to our speakers, you get ten minutes and you may allocate that between talking and answering questions as you so wish, but you don't get a ten minute talk and time to take questions. It is ten minutes only. We have Filiz, there is still the potential to submit lightning talks for Friday.
FILIZ YILMAZ: We still have space.
BRIAN NISBET: On Friday, and obviously of course, rate all the talks, rate them, please. So, Geoff, first up, and it is the resolvers we use.
GEOFF HUSTON: This is a talk about the DNS. The DNS truly is a miracle. Basically it's a miracle that it works at all, and certainly, after looking at the data I am going to present here, I hope you share the view that this is just truly weird.
It's a measurement kind of talk. At APNIC for some time we have been perverting on‑line ads into using it as a measurement vehicle. We ask the users who actually see the ad as the ad gets impressed in their browser, there is a little bit of code written up in flash "go and resolve this DNS name and go and fetch the URL." The name is unique, never been used before and will never again. So there is no caching, so that resolution query comes back to our authoritative name servers, so we see the question. And then of course we give the answer because we are good people and the user, the browser, then fetches that one by one pixel, that plot. So now I have two pieces of information; which resolver asked me the question, and what the end user IP address was that actually got the web object. We can put them together. So, the other thing about ads, which is really cool, is that as an advertiser, you get to pay the ad folk bucket loads of money if folk click on the ad.
Now as long as the ad is suitably boring and ours is really, really, really boring, nobody clicks. Which means that the ad agency works really hard to get money out of us, so they give more and more people this ad to hope that somebody, some naive person, will click on it, so from January to October we served 55 million of these ads, which is indeed sampling 55 million end points, all of you and all of everyone else. Out of that, around half a million unique resolvers. So each resolver serves about 100 end users, yes? No. When you start ranking the resolvers, you get this kind of pattern. This is the top 25, and that looks like a liner error. Because that is just weird shit. Why is 74.125.189 so popular? Well if you had the origin AS you start to see that is all Google, slots one to 25 are just Google and they handled about a million each. They are all part of 8.8.8. So this isn't getting me any where, time to change the methodology and look at resolvers by AS. One‑third of all the queries that I gather come from Google. That is market power.
The next one is China, there are a lot of Chinese in this world, and then there is Taiwan, level 3, etc., etc.. you can read as well as I can. But that is kind of interesting. One‑third of the resolvers in the world, one‑third of the questions come from Google. That is not quite right. Google server farm tends to amplify things. So if I correct that a little bit I get back to more realistic value, Google doesn't own the DNS, they only own 11% of the DNS. We all feel much better than about that.
The real thing about this is trying figure out how close are you to your resolver because the content folk don't know where you are, but they assume you are located where your resolver is. So if there is differential content they work from where your DNS happens. Who watches Netflix? All of you. How does it work? Via the DNS. How do they know which country you are in? By the country that does your DNS resolution. Lets ask the question in general: How close are you to your resolver?
So, 41% of the world use resolvers that aren't in the same AS that they are. They are non‑local. One‑third of those non‑local use Google. The next one is level 3, open DNS, again you can read this as well as I can, but one‑third of users use resolvers that are somewhere else. Fair enough. Lets go a little bit further.
How many cross boundaries? Who is foreign? These are the resolvers that get used when the user is in another country. Google, 42% of the market, level 3, 7%, open DNS 7% and so on and so forth. One‑third of folk use resolvers in another country. This is weird shit. So, who keeps their DNS at home? This is the list of folk who basically have the customer in the DNS in the same country. The Republic of Korea has the lowest foreign resolution count, 94% of all queries come locally, the other 6% are foreign. And most of you are on that list. That is the good list. What about the bad list? If you live in Algeria, your resolver is nowhere close to you. 99% of folk in Algeria do their resolution elsewhere. In Vietnam, 70% of all the queries that are made from users in Vietnam come from else ‑‑ or they are resolved from resolvers living elsewhere. So you think, hang on a second, maybe we can map this.
Lets actually have a look at the distribution of resolvers used by users in each country. So here is the US. If you live in the US, out of the three million‑odd samples we got, 300,000 are foreign. One in ten. Most likely, 12%. Those folk are using resolvers in Japan, because it's vast, right, it's just the Pacific, yeah right. 7% use resolvers in China. These are US nationals, have addresses located in the US, using resolvers in China. And 6% use resolvers located in Pakistan, that well‑known factory of DNS. I have no idea why.
I also have no idea why average eneven in is more popular than Brazil for DNS resolution if you live in the US. This is truly weird. If we are looking at the US using China. What do China use. Philippines and Thailand are really, really popular. Not Russia. So, that is if you are China. But we are here in the United Kingdom. Where do folk in the United Kingdom do their DNS? Half do it in the US, four percent use it in India and of course the French, despite what the ‑‑ are still very popular, with 3% of the foreign resolution happening en France. What about if you are in France? Much the same, it is a love‑love relationship, around 3% of French users use Great Britain resolvers.
But if you really, really want insanity, you have to go north. You have to go to the Scandinavians because they are actually insane. Half use US, 12% use UK, but 3% use resolvers in Kazakhstan. Truly insane.
What about if you are in Australia? There is a data point there that I can't get over, that there are 300 folk who use resolvers in Peru. I have no idea what their DNS must be like. But shit slow just comes to mind because getting the queries all the way to Peru and back again must be truly crap. Why is this happening? Google is popular. About one in six or seven folk have their queries through Google. There is an awful lot of folk trying to watch Netflix and you muck around with the DNS. There is a certain amount of DNS party querying and monitoring and don't forget viruses because a they do is hijack where do you your DNS so they can control it. Or some other insane theory. Because the DNS is truly magical and full of mysteries, and it's always a source of amusement. You, too, can play, there is a URL. Thank you.
(Applause)
CHAIR: One minute for one question. 45 seconds. All right we are done, thank you, Geoff. And now we have Job Snijders blazing fast partial outage detection telling us about it.
JOB SNIJDERS: Hi. I do stuff for NTT, we will not talk about that, I want to talk about the ring. And I will first introduce the ring and we will talk about fast partial outage detection. The analog ring is this cooperation project between all kinds of organisations, let me skip through some stuff.
And we have nodes all across the world that are essentially line shell boxes and you can log into them and perform all kinds of tests, like trace routes, testing on v4 and v6, DNS resolving, all kinds of cool debugging stuff. So with all these nodes around the world, you can imagine that there is a vast diversity in paths between all these nodes. And that is where we get to ring SQA:
"Ring SQA", the letters mean nothing so don't search for a deeper meaning there. It's quite easy to detect that your whole network is down, you will get some calls, your mother calls, I can't reach my websites any more; it's the tiny outages that affect maybe 10 or 12% of your destinations that are hard to detect and a lot of people here probably have monitoring boxes in just one location and you will only notice outages if the broken path happens to be along the way towards monitoring box. So what we built is a system where every ring node all across the planet sends a UDP packet to each other one and they chat a little bit about, what is up and what is down.
When an alarm is raised, and we will get to how that alarm is raised in a second, immediately trace routes are launched towards broken destinations, to collect data of this short or possibly longer outage. And now we get to the real meat of it, this is one of the alert messages. What happens here, I hope it's kind of readable, yeah, it should be ‑‑ the system was monitoring all ring nodes and at any given point in time 10% are off‑line for whatever reason and it will use this as a sort of baseline to derive, this is the current state of me chatting with other ring nodes and it sends packets every 30 seconds and for three consecutive minutes the amount of unreachable nodes is higher than the median of the 27 minutes beforehand, it says this must be a partial outage. And it will send you an e‑mail with the exact time stamp of when this outage was detected, a list of nodes that were previously reachable but are no longer reachable, it groups them by country so you might have ‑‑ get a feeling of in which region of the world stuff is broken, and these are the trace routes at the moment stuff was broken. And this is very nice because I have experienced that customers will complain that stuff is broken and then you will ask them maybe you can tell me the source and destination of the stuff that is not working and they will tell you in IP and they will copy paste the screen shot in a word document and e‑mail that to you without any time stamping whatsoever. This system does that automatically for you.
This is the baseline I was talking about, a snippet of it, per minute it has a circular buffer and if the median rises above what happened in the previous 27 minutes, there has to be an outage of sorts. And this works for v4 and v6.
How can you use this mechanism? Who here ‑‑ can everybody that is a ring participant raise their hand. I love you guys. So you guys, you can just log into your box, go to the /etc/ring/sqa directory and fill in the e‑mail address that you want ring SQA alerts to be sent to. Then you restart the deem en, it will warm up for half an hour and you are good to go. If you are into the ring participant, and I noticed there were still a few people that did not raise their hands, we can beat them up outside, join the ring. Joining the ring is simple; you just provide a virtual machine, you fill in the application form on the website and you get this partial outage detection for free.
How do you use ring SQA? What I would do is fill in your Knox e‑mail address or whoever is the unfortunate individual of being on call at that point in time and have your operation operations team use the ring alerts because what we have notice sod far, this system has been running for two or three months, there is no false positives, zero, and I think that is quite amazing that we could not find cases where it wrongly assumed outages. Now, it could be that it detects an outage that is only local to the network, for example, a top ‑ switch that is broken or whatever but if it alerts there is something wrong with your network and if we look at the example here, at hop 5 and hop 7, those IP addresses belong to the same transit provider and apparently they had one of their routers blow up.
So, I think we covered most of the slides in random order. Are there questions about ring SQA so far? I mean, we have five minutes to fill, and nervous as I was, I speeded through it way ahead of time.
BRIAN NISBET: No, all makes perfect sense to everybody?
JOB SNIJDERS: There has to be one question. I will buy that guy a beer.
BRIAN NISBET: There are two questions.
AUDIENCE SPEAKER: Anders from the Danish research network. As much as I have an outage alert, to have an inn alert, unreachable in places you were unreachable before?
JOB SNIJDERS: The current system will send a recovery message when the median of the last three minutes drops below the median of the previous 27 minutes, but it's extremely hard to assess ‑‑ it's very easy for me to detect when it breaks but hard to detect when the network actually recovered. We used the three minutes interval because most modern routers can converge a table in roughly three minutes, so if after three minutes it's still broken, the alert triggers. I thought that would be a safe threshold to not have constant alerts on super‑transient issues but a recovery message is sent, it just means the alarm is cleared because the base lines changed and it is now the state but only humans can assess whether their network is broken or there was ‑‑ it was routed around.
DANIEL KARRENBERG: Can you give us an impression on how many alerts you get per day, per week, per hour, per second?
JOB SNIJDERS: Excellent question. The alerts are sent on a per node basis, so if you happen to have a ring node and you made a poor choice in using your upstream provider, say might receive a lot of alerts, but it highly depends on the node's perspective on the Internet, so if you have a global network it makes a lot of sense to put ring nodes in your major hops because in some hops you might have a partial outage and in some you might not. From NTT's perspective, I have run this for the last three weeks on a European node and I I had one outage and it was a line car that crashed that I was connected to. But it depends on the quality of your network. Or your partners. So you will notice that some peering partners will black hole traffic more often than others.
DANIEL KARRENBERG: So you only get alerted when your ring node is involved?
JOB SNIJDERS: Yes.
DANIEL KARRENBERG: Then I misunderstood, thank you.
JOB SNIJDERS: I would not be interested in other people's problems. I only want to see partial outages that affect myself.
AUDIENCE SPEAKER: First and foremost, I would like to just claim my free beer for asking the question. And secondly, I am wondering about the trace routes that are run when you detect an outage. Have you given any thought to somehow getting the node that are unreachable to do the trace route towards you as well, because as was discussed earlier, the Internet is largely asymmetrical and especially if packet loss starts occurring at AS boundary you have no idea if it's the return path that is broken or if it's the outbound path, and since unreachable presumably ‑‑
JOB SNIJDERS: Partially unreachable.
AUDIENCE SPEAKER: You would need to relay through another node to hopefully get the message across to trigger that trace route or something like that. Have you given any thoughts to that?
JOB SNIJDERS: Absolutely. And all I am thinking right now, I am waiting for your pool request that will implement this. It is OpenSource and you can run it internally. I think this would be good for, say, version 2. What I want to build with the ring is an overlay network that would be able to survive partial outages so messages between codes can still be sent so then we can add something that instructs nodes like trace routes towards me, but it has to happen very quickly, because what I like about this system is three minutes into your outage it will send an alert and I feel that such short timing is important, so you don't want to wait too long to collect information and then ‑‑ this is why the trace routes don't do DNS resolving, for instance, it has to be as fast as possible and ship that alert out of the door. But we can chat about this. I know you love coding and Ruby. I think my time is up.
BRIAN NISBET: It is, indeed, thank you very much.
(Applause)
But the last speaker this afternoon is Andrei Robachevsky, and how can we work together to improve the security and resilience of the global routing system. Clearly what we need to do is let Job build a new Internet on top of the current one.
ANDREI ROBACHEVSKY: Thank you. My proposal is more modest.
I work for the Internet society and Internet society was coordinate this had effort that started last year, end of last year by a small group of network operators who were thinking about this question.
OK. So, well, I think vulnerabilities of the global routing system are well‑known, as well as the kind of wide array of solutions. But still we know what the challenges are. The challenges that you cannot solve this problems alone you cannot secure your network alone, security of your network in this kind of global routing system depends on efforts of others. And if you look from individual ISP perspective there are problems as well although the problem looks big, I mean, a typical ISP doesn't feel a lot of pain coming from the routing system or particular those vulnerabilities. Also, there is a plethora of PCPs, operational practices, vendor documentation, which one to pick to implement. And /TPAOEUPLly, how to convince your management that you are doing a good thing.
So also there comes this kind of social aspect, right, the community, the collaboration, thing. So this effort which we called routing manifested but now a slightly different name, it is aimed at bringing those two aspects together, people and technology.
So, this effort is the MANRS, the Mutually Agreed Norms for Routing Security. It defines a minimum package of action that is we ask ISPs to implement and deploy them wide scale that should lead to significant improvements of routing security. And it also emphasises collective focus, which is amplified by growing group of participants, adopters of this manifesto.
So, looking at the document in more detail and you saw the URL so you can click on that and go in and explore this document in more detail. It consist two of main participate, high level principles which underline the commitment to best practices and encouragement of peers and customers to follow the same model. And the package, which is the minimum package, I am stressing this, this is not aspiration, it's not how good it looks like, it's the minimum we ask ISPs to implement. And it focused on three main things, which is BGP filtering, anti‑spoofing and collaboration.
It's a technical document, it's not high level fluffy stuff, if you read it, but still, it focuses on what and how that something to be done in external documentation, BCPs, vendor documentation etc., etc..
So what are good MANRS. Well, there are three things and it looks like motherhood and apple pie in a way, and the thing is minimal, I wanted to reiterate this stuff, because it limits the scope of those requests. We are not trying to boil the oh we are just cleaning our side of the street, assuming others do the same. Prevent propagation of incorrect routing information. We are focusing on the own ISP infrastructure and own customers. The same applies for anti‑spoofing, not just anti‑spoofing globally but single homed customers and own infrastructure. Things that are not completely solve the problem, right, but are pretty easy to implement, there is little reason not to do that.
Well, for the third one, for instance, while it kind of sets high aspirations maybe, what it means is you make your contact information up‑to‑date and globally available.
Simple things.
Now, MANRS is not only a document, I would say it's not a document actually, it's a commitment, so that is an important component, it's not just a document that you read and say yeah that is a good set of principles, I think I support that. They are required to demonstrate by implementing those actions and the thing this group of supporters sends a clear and tangible messages we expect you to do the same.
So, here is what is required to participate in this effort. So, an ISP have to implement at least one of the actions, and another thing is that we want the participants to get the ownership of this document and grow this document because we think this is a live document, so it should grow in the scope as well as in the size of the group of supporters.
So, think about this: Are you interested in participating? And if you are doing this stuff, I mean, there is no reason for you not to participate, right, and publically commit that you are doing this stuff and invite others to follow yourself.
Where are we now? So we are redesigning the site for going public on 6th of November, with the final version of this document and with a list of early participants, initial participants which will indicate which actions they support to implement. And then working, expanding group of participants and, as I said, expanding the scope of the same document, the MANRS.
Well, thank you very much, if there are any questions, please?
CHAIR: Any questions? Anyone want to know how they can join MANRS? No.
ANDREI ROBACHEVSKY: Well, I am here, find me in the hallway and we can discuss. If we implement the stuff, no reason not to join. Please join, thank you.
(Applause)
CHAIR: Thank you, everyone. And I think this is the end of the plenary but we have some events this evening which I will let Brian tell you about.
BRIAN NISBET: I shall read from my badge. Yes, no, just a quick reminder, there are social events this evening. You can meet the expanded and exciting NC Executive Board at 18:30, and then ‑‑ which is downstairs where we had lunch. And then there is a welcome reception, which is, handily enough, downstairs where we had lunch, at 18:30. Yes, that's right ‑‑ no. They are all happening at the same time, apparently. This is confusing, my apologies. One thing says the welcome drinks starts at 1900, the other says 18:30.
AUDIENCE SPEAKER: 18:30 is meet the board.
BRIAN NISBET: Yes. The information is still ambiguous in the thing. But meet the board at 18:30, welcome drinks at 1900. Filiz wants to tell me something else. Yes, and there are BoFs also this evening which I will quickly ‑‑
CHAIR: All right. Just opening up the programme to see what BoFs are happening this evening. There will be drinks tonight, don't worry. So who defiance our technology future, it's at 1,800 or 6 o'clock for the people from the US, and Christian Koffman will be leading that BoF which will happen in the side room or a Sid room. So you could probably find it. Then drinks and meeting the board, some overlap between those, but at least there will be drinks. Enjoy.
(Applause)
BRIAN NISBET: And I am sure Benno and George and Jan will get very upset if we don't mention the BCOP task force is in here this answering at 6:00 and we will be back in here for the plenary tomorrow at 9:00. We are done. Thank you all very much.
(Applause)
LIVE CAPTIONING BY AOIFE DOWNES RPR
DOYLE COURT REPORTERS LTD, DUBLIN IRELAND.
WWW.DCR.IE