Routing Working Group session
6 November, 2014, at 2 p.m.:
JOAO DAMAS: Welcome. This is the Routing Working Group, I am Joao Damas, Rob Evans is my Co‑Chair sitting here in the front row. He did most of the work of putting together this agenda, so thank him.
A few administrative issues. This time around, instead of the usual one slot that we have had for the last few years, we have had two, so be aware after the coffee break we will be resuming with the Working Group.
So, first of all, I'd like to thank the RIPE NCC for providing a minute taker, a scribe, Alex Band, Jabber interface with ‑‑ and thanks again for live scribing, Aoife, you do the best job possible.
Minutes. In between us and RIPE NCC he managed to drop the ball on this one, this time around so we haven't actually published the minutes from the last meeting yet. We have been able to read them and I think they are OK, we will just publish them to the mailing list and call for a two‑week period for comments and then consider them approved if that is OK with everyone.
We have a few items so this is from the web page. It pretty much covers all the time we have two items that have been requested to put in AOB. Hopefully the Working Group thing can take less time. Routing, hijack and prefixes, the two topics requested to talk about.
The first talk is Enno Ray, if you are in the room.
ENNO RAY: Good afternoon, everybody. I am involved in many IPv6 projects, I have been working carrier space in '90s and right now, I am more on the enterprise consultancy size. I work for a company called ENRW, and I would like to talk about safe phenomenon that we observe happening in the conflict space between enterprises and traditional transit ISPs. The stuff I am going to present on is based on research project that two colleagues of mine and I have conducted. Those guys and I, we have done this, the detailed results have been published yesterday on our website, so I will skip through some stuff rather quickly, it can be found in more detail in the RIPE paper.
Lets start with problem statement. This is a quotation from RIPE 5.3.2, that goes like "there are occasional requirements for the advertisement of more specific routes from within a location with a few ISPs currently filtering" and so on this can cause significant difficulties for some networks wishing to deploy IPv6." And the actual question then is: What are the ‑‑ what nature might these occasional requirements be; is it just a few ISPs, is it, say, a common practice that such filtering happens? Does it happen at all? Is there any trends or developments that can be observed? And to understand the nature of the problem, it helps to quickly illustrate what we think is the current reality in LIR space and namely in the RIPE region and that is ‑‑ that is what we, say the guys who wrote this and I, call the traditional ISPs, we call them, lets call them the transit LIRs, probably the majority right now in this room represents the section and there is a growing percentage of another group of LIRs that we call enterprise LIRs. Huge organisations. I can tell you first‑hand that pretty much every large organisation within Germany has chosen the path to become an LIR within the last, say, six, 12, 24 months, for reasons that I will discuss in a second. And those organisations obviously have different requirements, different agendas than what we call traditional ISPs. And in quite some cases they don't even run an IT centric business, they produced tyres or cars or whatever, but certain reasons to be discussed, they have applied for an LIR membership and status within RIPE.
This is a statistics from RIPE itself, which shows that, apparently, there is a huge increase in the overall number of LIRs, so the sum of those two graphs, and probably these ‑‑ this increase cannot be attributed to just, say, an increase in traditional ISP business within Europe.
One might ask why do those organisations chose the path to become an LIR instead of choosing ‑‑ applying for PI space. I won't go through this in detail. I suggest you read this. You might use a translator to translate it. There is some element of ridiculousness in this answer, which was given by Deutsche Telekom, sorry for that Ruediger, to a customer actually asking to become ‑‑ to apply for PI space through the sponsoring LIR.
That said, there is another thing that should be kept in mind, that is the practice of strict filtering as initially described by Gert. I am not sure if he is happy today that he still attributed his name with this practice. The point is, probably everybody in the room is more or less familiar with this approach. Like, if there is route announcements from PA space, it should be covering aggregate and not more specifics. If it's more specifics filtering either by checking the, say, RIPE database or by say prefix length of a certain number.
So, why does strict filtering exist? Safe TK memory and keep ‑‑ the whole aggregating discussion.
So the question does strict filtering actually happen and does it hurt some players in the overall ecosystem, and unfortunately, yes, that is the case; I will give two quick case studies. Both have been ‑‑ I have been involved in both cases. Both organisations have authorised me discussing these studies here. I won't give the carrier names involved but I will provide some details on organisations. The first one Burda media. These numbers on the slide they are there for simple reason to illustrate these are huge organisations; it's not like running into some routing problems, huge organisations having real problems with real routing in real world and Burda, they have a through Germany, three main POPs, network hubs, data centres, whatever. Each of those is multihomed, has own AS number and there is a stated firewalls for each of those. Once they started their IPv6 deployment, which for a media organisation is, say they have a certain need to start maybe IPv6 deployment soon, which they did in 2011 and they were asking should we become an LIR or not. They decided to do so. And so they were allocated a /32 which they split into three pieces of a/34 each, and I already mentioned they were ‑ firewalls, an interest of sigh metric flow. If traffic leaves one of their three main POPs, each allocated a/34, traffic should come back the same way, same path. What happened this?
When they initially ‑‑ actually, it was the first one, the first thing they did, the first route they ever announced from their IPv6 space was filtered or from one of the up links, again I won't give the name here but that organisation apparently was filtering like compared the prefix lengths of the router with the INET 6 num object carried a /32. There is a/34 was advertised and they filtered it. The situation could be resolved bay phone call, why did do you this? We are a customer and all this. So, after some days, they monitored this, and it stabilised, and right now, apparently it seems to work. Still, it gave them quite bad feeling, wait, first time we announce a route and it gets filtered, what is going to happen in the future. The second case study is an organisation called EVONIK, they have been title sponsor for quite some time. It's a huge organisation. They have been an LIR since 2005. They have two main data centres throughout Germany, and when they did their address planning for IPv6 they came up with like lets use a globals only within our own network, which from my perspective is a good idea, chemical industry so they have some industrial control systems, stuff that shouldn't be reachable from the Internet, which is why they came up with the idea, lets ‑‑ to avoid this stuff is reachable, lets do two things; filtering at the boundaries, like firewalls and usual stuff, and selective propagation of address space. So just announces a /48 at the two of outside data centres, which didn't work as both carriers, one is top, it's one of the five largest in the EU, and the other one is larger German one, told them, oh we are not going to accept your /48 without covering /32, which then again might mean traffic leaves one side, comes back to ‑‑ through another, stateful firewalls, so what they did is, in their case, the sites are connected so they have an MPLS network between those, do some rerouting which was a bit tedious given one of the elements involved didn't support OSPF. So, it was cumbersome to find a solution to handle this. And again, a question came up: Well, why does this happen? Can't we just ‑‑ I mean we are an LIR ourselves, what is going on out there. And so the question for us was, OK, is this just some, two ankle dotal cases and actually, the the first two projects I was involved in which had reached this state of IPv6 deployment, so my personal sample is from that space, two kind organisations two kind of fails, which then again might be anecdotal but we ask ourselves is this common practice? What is the distribution of routes out there in, say, at least in the RIPE region, which is why we started a little research project. We looked at data from the route information service from the ES 2010 to 2014 and analysed this on prefix lengths to be observed. We had a particular look at five RFCs, remote could you tell collectors, to be seen here to have not just one like Germany but more representative view across Europe, and what we did is actually we fetched one B view file per quarter, 1st of January or 1st of April. We filtered for IPv6 routes only, from PA space only. So whatever will be shown in the next slides and the results, is only PA space. There is no PI space. There is no 2001, 6, 7 or 8 /29. It's just PA space assigned by RIPE throughout the years, and that is mostly ‑‑ something /12, the main one and then there is some others. In any case PA space only have a look at the distribution of routes and the prefixes. And the results, again I will go through this rather quickly. More detail results can be found in the white paper. What might not be ‑‑ not be too surprising is that over all, the number of, say, routes to be observed, IPv6 routes has increased throughout those years. What is interesting, there is apparently, there is two main columns or pillars that is /32, some 29, some 40s, binary stuff, 36, 40s, and what might be more interests is what about the ‑‑ what is the percentage of those two main sections /32 and/ ‑‑ as opposed to /48, within the whole of prefixes to be observed. Over the years, a /48 has gained much ground, as again this is about weight, so it's not about absolute numbers. Within the space of more specifics, just looking at more specifics, it turns out that a /48 is the vast majority of them, announced and what is further interesting, I will skip this for a second, is this slide. Look at the trend which changes in late 2013. Up until that point, from, say, more specifics, which have been advertised, more than 50% ‑‑ yeah, has been, say, advertised together with the covering aggregate. This has changed over time, so right now we see more /48s without a covering aggregate than with a covering aggregate, which leads to some interesting conclusions. I might state, all this was done just to observe what is happening out there to find out where is it, what are the developments. And the ‑‑ it turns out that the share of more specific growth, so apparently there is more advertisements of 48s or more specifics but in particular 48s, and the amount of those without covering aggregates is growing, so, Roland, one of the co‑authors, he is a chemist and thinks in very scientific approach to the world, he stated apparently the second‑last ‑‑ equal distribution somewhat CIX kicks in. Is there anything out there doing strict filtering? We observed some. This is a particular interesting example, as this organisation apparently changed their policy between 1st of April and 1st of July, as after 1st of July the number of more specifics kind of exploded as compared to the earlier days. Apparently, maybe before that they had just some selected customers within their routing propagation, whereas after that change of policy, apparently they turned off strict filtering. This is what we read from this one.
So what does all this mean? It means that while in reality there is more and more ‑‑ more specifics, there is a certain amount of uncertainty within enterprise organisations, well, can I be sure that when I do something, when I announce something longer than /22 this will be routed? Which leads to certain decisions, which are from my perspective unfortunate for everybody. As we have a tragedy of comments here. There is the overall goal could be have a tidy, clean, Internet routing space, whereas specific organisations looking at this might be tempted, and this is what we actually see, to reserve as much as possible, like going for a /29 instead of 32, which is default policy within RIPE anyway, but the bigger we get, the higher the chance it gets routed. They do so at most RIRs in the world, so I know a number of organisation which have applied for address space at APNIC and ARIN as well, even some at ‑‑ even one at AfriNIC, get whatever you can get, as it increases our chances of being routed which is certainly not in the interests of the community, which leads to the question, what can be done actually? Discuss the problem, which is exactly why I am here. Solve it by policy. This hasn't worked too well in the past, I am not sure if this work here. Solve it commercially like have them who wants to advertise more specific, pay extra fee to the individual carriers or to fund of ‑‑ to be shared amongst everybody. Wait and hope the problem goes away. Apparently this could be done given the trend toward ‑‑ points towards more specifics being seen anyway. Or others like what they did with BCOP and aggregation. Actually, I would like for this talk to take the first route, which is discuss the problem, make everybody aware there is a conflict of interest and an issue that arises from this and I would like to suggest why not try the consensus on /40, which keeps the deaggregation small and still allows organisations to do certain things they are required to do with different data centres and so on. This could be a starting point for a discussion. For the moment, we think if this is not discussed, this conflict of interest and the developments which happen out there, in the end of the day this is bad for IPv6 deployment as a whole.
Thanks for your attention.
Obviously, we have a strict schedule so I am not sure how much we can discuss right now.
GERT DORING: Author of the quoted filtering recommendation. I am actually not sorry to have done this 12 years ago because it was a good starting point for discussions that needed to happen and I hope that some BCOP group would pick it up and care for it. But anyway. People that follow that document might have noticed it now points to the routing registries and said accepts what is in there and don't do strict length filtering. But that is not the point why I got up. As far as permit eight additional bits for deaggregation, there is a document that the routing Working Group produced a couple of years ago about aggregation and deaggregation recommendations and maybe it's time to rewrite that one because back at that time there was some, I think, I remember some six bits recommendation, which never made it into formal text because the Working Group couldn't agree on what we want and whatnot. Which, back at that time, we were lacking the experience to understand what is good, what is bad. Maybe we still lack that.
And then there is actually third one, third point that needs to be considered in these discussions. Deaggregation is not necessarily bad if ‑‑ phrase it different. If a side has an independent enough routing policy to require BGP based multi‑homing for some definitions of require, it will eat one slot in the global routing table so whether they do it with PI or deaggregate, it is one slot. So I think the underlying question should not only be do we permit deaggregation; but also, are we sure we understand under which conditions we need a node to have an entry in the global routing table, because with v6, for smaller sides that are basically just office and a few workstations, we might get along with PI just with upstream address just fine with other global table slot. But that of course depends very much on the network you are talking about and how open the mindset of the people implementing it is. If they say if they have done it in IPv4 this way we insist on doing it that way in v6, it will be done that way. If they are open and say, OK, IPv6 is not so equal to IPv4 in the end, we might do it differently, we might come to different result, and I think this is where the Working Group actually could help in formulating recommendations, without pointing at other people of course to do the work.
JOAO DAMAS: I see.
GEOFF HUSTON: I actually don't understand a lot of the thrust of this insofar as I don't understand what the problem is you think you are solving because quite frankly, the size of the routing table is not an issue. They are tiny. When you look inside enterprise networks and large scale carriers the internal routes enter into the millions, and so 23, 30, 40,000 routes what the hell. In some ways, this is going nowhere insofar as v6 is looking a whole lot like v4. Currently, 30% of the routes are more specifics of aggregates, in v4 it's been 50%. My guess is v6 sort of aims to maturity it will become 50% too because we do traffic engineering and routing and the way we do it is more specifics and there is no other way of doing it. So rule based situations really didn't work. Routing is actually a series of conventions and negotiations that is all it is. And it doesn't need a whole lot of rules; it just needs conventions and negotiation. In some ways obsessing over the size much the routing table doesn't matter. Even obsessing over updates doesn't matter. They are not critical right now. Most of the carrier equipment we deploy is heaps of headroom, I don't understand what the evil is that you think you are saving us from because I just don't rock it. Thanks.
ROB EVANS: As one of the co‑authors of 5.3.2. One of the sentences I think I put in was the market will decide, operators will decide, it's attempt to ‑‑ provide some guidelines of what people were thinking at the time. I am fully open to ‑‑ bearing in mind what you were saying on Monday evening. But also bearing in mind that whatever we put in text now may change. And what Geoff said is true, it's ‑‑ these are guidelines about best behaviour which means that maybe we don't need to obsess about it too much but it's worth noting and observing and I think that is interesting to see what you have had here, thanks.
RUEDIGER VOLK: First of all, I would like to have Enno sometime on answering Geoff's question, which I think wasn't completely rhetorically.
GEOFF HUSTON: You are going to tell us about evil, cool. You have saved me from evil, Ruediger.
ENNO RAY: The funny thing I consider my job as initiating this discussion. I didn't pose a question. Personally, I fully agree and hope that as Geoff said, the market will decide. This was more meant like a waking call, whoever here in the room does strict filtering should consider this RIS policy. I am not a fan of big filtering, strict filtering and I hope the market will decide anyway, we have all this deaggregation debate and I wanted to contribute to that one.
RUEDIGER VOLK: OK. So I am a fan of strict filtering. Maybe the question is, what kind of filtering? Where do you pull the information for the filters, and pointing to your example of Burda, kind of the relevant information for my filter policy didn't show up. You said, well, OK, we assume that someone is ‑‑ you looking at the INET 6 nums, I would be looking for exactly matching route 6s.
ENNO RAY: Which accessed that?
RUEDIGER VOLK: At least I would have accepted those, and yes, I tend ‑‑ I tend to say, well, OK, prefix length policy for filtering is kind of not the thing that I really appreciate. On the other hand, yes, I am doing prefix length filtering, nothing longer than 48 will be accepted. And kind of ‑‑ and kind of I think that is fairly reasonable, well, OK, it matches the 24 we usually do in v4 and yes, I am not aware of any document where consensus that all of the community agrees this is the right length to cut it off, has been defined. And when somebody asks me about it, well OK, I tell people, I cannot really predict what all the weird guys in the business will be doing over the next two decades. My prediction is, well, OK, the /32, that is in the assignment, is very likely to be accepted by everybody, so well, OK, if you want to actually announce if there is any reason for you to announce more specifics, it would be a good idea to organise so that you have a covering announcement and organise for it, well, OK, if you decide, if you decide to actually just work with long prefixes, maybe actually the decision to go there and have the large aggregate and pull out stuff there, may have been not really that well‑advised making some decisions in that space, of course is kind of a challenge for all the people who haven't been looking into this as their main job. I have to admit. But well OK, some business for consultants should be there as well. But please give them good advice.
ENNO RAY: I hear you, Ruediger.
AUDIENCE SPEAKER: Consultant for German government and Swiss government so I am one of these lousy consultants here. And I agree to ‑‑ I think this is an issue and my customers have these issues, actually, because what I see is with IPv4, most LIRs were some organisations which provided the addresses and the infrastructure and the network. So for them, it was easy to manage aggregated space because they had all the thought in their heads to do this. What IPv6 what we see is organisations found LIRs to organise their address space in their organisation but they don't have the infrastructure so they lease the infrastructure from several paths and they have to deaggregate to be able to act. We are now from my perspective with IPv6 in a changing environment, and this was, as Enno said, was one approach to take this into account, and I support this for what I see in Swiss government and German government, the 8 bit would fit perfectly. Thank you.
ENNO RAY: Thanks again everybody for the lively discussion and to the Chairs for inviting me.
JOAO DAMAS: One thing I would like to ask everyone, you have noticed how in the plenary you have, next to the names of the talks, you have these button to rate it. I asked the RIPE NCC half an hour if we could have the same for the routing Working Group and they actually managed to get it up and running. If you could actually click on those as we go by, provide us with very helpful information about which topics you find more interesting and less interesting for future talks. So if you could.
BENNO OVEREINDER: Welcome. Actually, my student prepared a beautiful set of slides and I completely ruined it, so all the credits to my student and all the blame on me.
This is the next slide. So, we want to talk about route leaks, and what about route leaks? Route leaks are not route hijacks, so a route ‑‑ we announced prefixes which ‑‑ a route leak is actually a forward announcement a prefix which I shouldn't according to my policies. It's more subtle in this respect. And therefore, route leaks are not easy to detect.
You can observe them, you can monitor them as things change but it's not that apparent, it's a little bit silent event, you don't notice very easily. And there is of course, potential security risk if you do these routing very specific against new data and organising man in the middle attack. Well, examples, and I don't say that word necessarily an attack where the bell Russian and Iceland traffic misdirection, they were ongoing for couple of weeks on and off and in the analysis they thought well, they were actually playing something, it was actually they were snooping, diversing, transit, which ‑‑ or information data which would ‑‑ should stay in the US but were deferred vice I can't England to Iceland, for example. So potential security risk here.
I think we can skip this, we all know how we route our routes, and what are kind of agreements we have, we have business agreements, customers and providers and traffic should go from my ‑‑ from a customer to a provider, to a provider to peer maybe and downwards to its customer. And if that is violated we call that a route leak. Beautiful slide. This is how we use actually, how we defect valley free route. So it's customer to provider, one or more steps. There is one or zero peer to peer relation and it goes down again to the customer. Some examples? Well the typical correct fail free route and on the bottom we see we have a valley here and we consider this a valley, so a peer to peer going up to customer and giving transit for its peer through its rest of the network. And we want to detect these events. And how to do that?
Well, these are the origins of misconfiguration, so the intention is always difficult to tell. You cannot tell. Most of them will probably be misconfigurations anyway. Sometimes, and we will come to that later on, a lot of intentional free routes and we can discuss that later and get some input from you, from the room. And sometimes there definitely will be malicious intents. We want to find these relations, these valleys and maybe we can learn something from that. Maybe we can categorise them even. Oh this is intentional one, just organisations are close to each other, so that is fine. But actually I have to say, we thought the problem was easier to tackle, naive maybe, so we have to cover a lot of ground still. So you might think this is kind of research presentation. We started as a research presentation, I think the main thing is represented a tool to get these routings in a database, we can do some statistics, and maybe the analysis and the real research has still to be done.
Why analyse them? Well, there is not much statistics available. There has been and we will come to that later. No counter measures are planned. There are counter measures planned. It's ongoing discussion in the SIDR Working Group. It's in the context of BGP Sec. It's far ahead. Lets see what is the problem, how often does it appear and what is the scale of the problem.
So previous work. Jared /PHA*UFP, what he has done, he goes through BGP dump, route views, and by his operation and knowledge things like tier one ‑‑ do peer, but three tiers 1 in the path, this probably shouldn't happen. That is one of the rules. He is in contact with his other colleagues, other networks, has a number of these rules which shouldn't happen. I know these large two organisation ISP don't have this kind of relation. So he collects the BGP dumps, he goes through the ‑‑ to his own rules and spits out all the route leaks every five minutes. It's too small to see. It's just a long list of all instances and there are a lot, in the order of thousands every time.
So what we want to do something similar but different. So our idea was well, how can we infer these relations. We don't have this knowledge and we don't know which kind of, what are the business relations between the different networks. Can we find data source to infer this, infer relations. We have of course route views and RIPE route collectors. This is the data source. And we have in these dumps we have the AS path so if we have this business relations, we have the data from the route collectors and route views, maybe you can do something with that and analyse that. So there are two potential source of inference business relations, one is by Rick card dough, this was written down, he is working elsewhere now, it's at UCLA. And said well the tier 1, that is a clique. From that we infer the relations, so all all their customers or all their relations should be customers, etc.. and their assumption is there shouldn't be all routes ‑‑ or assumption should be all the routes should be fairly free. That is not a good starting point for us. Matthew lucky, they do it different, they don't do a click per se, they say well our assumption is you need customer to provide relations to reach global reachability. From that, from these they define, infer the click of the tier ones and the other relations, and also there are no cycles in their customer to provide links but they don't assume there are no non‑ ‑‑ routes, so in their inference there are non‑valley routes of relations in their topology. So that is a good starting point for us to start.
I will skip this for the time. This is how we developed our software. We have a database and collect all kind of stuff, we put it in a database and then from the database we can generate all kind of statistics we like to see.
So, just these are the gross numbers. From all the data we collect every time, every five minutes we read out the dumps from the route use and BGP, the route collectors from RIPE. About 4% contain valleys. It's 63 million valleys announced for a month. But it's a lot. So what does it mean? Well, we make, and a pie chart, so what kind of do we see most often. Most often we see peer to peer, peer to peer valley, so there is two peers providing transit for each other or a peer in the middle providing transit to each other peers. We come to that later. These other two are the ‑‑ the other valleys, it's important to see about 16%, we put it here in the pie chart are not really route leaks, actually these are siblings to networks which are considered to be siblings related to and then have a relation with it on the network. So we obtained later on these sibling relations from CAIDA, so they are complementary to the relation to policy file they provide so we combine as well both of the data sources. Interesting to see now is that from all announcements containing a route leak, about more than half, it's in the order of 60%, are less than 30 seconds, so apparently something is going on during, well ‑‑ convergence. So I talked about Alexander, maybe he is in the room, he can comment on that later on. Yes, he looked at more at qualitative, we only have here numbers. So that is very brief and very short, 30 seconds, nothing going on here. But along some more of them ‑‑ this is not a scale, we have quite a long time of long during route leaks being announced. The next time we look at the top ten, which one do most often ‑‑ appear in our statistics. And 12% of all valleys are in this top ten. We don't consider less than one minute route leaks here. And the next one is can we classify these top ten, do we understand them? And I go ‑‑ so one thing and maybe APNIC people can comment on this, give us insight, so one of the frequently appearing route leak here in our dump is that why project is leaking route for APNIC to APEN. And we know all this organisations has some relation with each other, and RIPE is the collector here. So we see that through Asia Pacific peering with RIPE route collectors here. Probably we can see this is intentional behaviour, WIDE giving transit to APNIC to the rest of the world, at least academic, as I can't Pacific. Here this is very difficult to tell. There are two networks in Iran, information technology company, this is what it says, very generic name and tell commune company infrastructure company. Whatever that means. They have a little bit different names but they are in the same office building. So, we cannot say for certain but from our data we can say probably they are siblings or the same company anyway, and this is not a route valley at all. But that is the difficulty with getting into these route leaks and they occur very often these kind of route leaks, it's serious enough. This one is Hurricane Electric in the middle, with peer to peer, peer to peer between PAC nick and warrant Internet, isn't actually a customer of Hurricane Electric but for IPv4 they are a peer. Giving the AS hurricane. So these are the AS hurricane AS set also includes PA C net, we don't see that in relation, so again this is probably in all intentional we cannot find it ‑‑
MARTIN LEVY: Once was at hurricane. They are a customer.
BENNO OVEREINDER: They are a customer, yes.
MARTIN LEVY: Both
BENNO OVEREINDER: Yes.
MARTIN LEVY: I mean you are believing this database.
BENNO OVEREINDER: Exact ‑‑ it's a good question.
MARTIN LEVY: Next slide, please.
BENNO OVEREINDER: Some of the details I probably removed from the slide. The database, the topology file we obtained from CAIDA, they tested one‑third of the routes they inferred and according to their data at that test it's about 99% correct. We will come to that later, about how to trust data.
OK. So, actually, well, we can say that ‑‑ these are the summary of the conclusion that we found. 65% of the valleys are very short‑lived. So nothing going on there. 6% are very persistent, probably be looked into more carefully. And from the top ten we found that 12% AS ‑‑ actually we worked through all of them and most of them are special relations, so they are intentional or have other relations we can infer from RPSL. That is the challenge also, to include more knowledge into this inference and database, make use of RPSL data, it's very inconsisten if not incomplete. But these are all the challenges. Please, step ahead, give comments, we want to improve the data set, these tools actually to make it more useful, make it available to you. Look at your network or prefixes are on a path just check manually if it's correct or not correct so we can put in more knowledge into this inference machine and make it more useful and get ‑‑ remove the false positives we have and coming back to Martin's comment about do you trust the AS relations topology file, we don't know. Of course, because we have ‑‑ I haven't shown that on the slide, every month you get a new topology file and then we go from couple of 100,000 of current route leaks and it drops from 600,000 to 400,000. So apparently every time this topology file has been regenerated, updated, a lot of false positives disappear and probably a number of them reappear. I see someone is shaking his head violently. Step ahead. Shoot at it, give comments. Thank you.
GEOFF HUSTON: Thanks for the talk, and certainly this is well‑travelled ground. You might recall in legiongals paper at the time I seem to recall that identified about 7% of the routes that they looked at didn't obey the rule set they were formulating and it's certainly true that in the detail of routing, for some prefixes you might be my customer and if I have really got an inventive relationship with you for other prefixes I might be yours, and you are kind of assuming that relationships between ASs are strictly homogenous all of the time and they are kind of not. And then you go and say there are these 600,000 route leaks. Or there are 600,000 instances where my model of the world doesn't correspond to reality. I am serious. And the issue really comes ‑‑
BENNO OVEREINDER: We have this also ‑‑
GEOFF HUSTON: Are you looking at the accuracy of your model and true route leaks, so in some ways that is a metric of how good the model is in the fine instance and your 600,000 isn't 600,000 leaks, I suspect it's a much lower number appearing many times in all kinds of guises, there aren't that many prefixes on the planet, right?
BENNO OVEREINDER: No. Indeed. They are appearing through the month so they appear, disa/TKPWRAOER, appear. So they are double counted through the month, that's correct.
GEOFF HUSTON: The bit I find curious about the routing system is slightly aside from this. If you take a one week long view of the total routing table, in any week someone spits out between one and 6,000 routes for a couple of minutes. Every single weeks. That is a leak. What you are looking at is something a little bit different which is about my conceptual model of AS relationships and the degree to which reality doesn't quite match it and whether there is a leak or an issue of what your model is about is really the discussion, isn't it?
BENNO OVEREINDER: Yes. We try to try it like that.
AUDIENCE SPEAKER: Alex. For the record, so I have a question according to your description of CAIDA data, so your slide is right there is a validation of relationships, can you give me additional information concerning this question? So I am very surprised there is any validation made by CAIDA.
BENNO OVEREINDER: So, yeah, I can point you to the paper. I haven't ‑‑ they wrote a series of papers and I did their, well, validation. How they inferred ‑‑ so they described their model how to infer, I won't go into the details.
AUDIENCE SPEAKER: I had a look at the CAIDA data and as far as I am concerned the most project problem CAIDA is it free peering or not a free peering, paid peering. And in this case your market is a route leak and I think most of your cases with a route leaks when there is three peerings innist paths are related to these problems (inIS) (.
BENNO OVEREINDER: Paid peerings?
AUDIENCE SPEAKER: Yes.
BENNO OVEREINDER: Thank you.
JOAO DAMAS: Thank you very much, Benno.
AUDIENCE SPEAKER: First, thank you for having saved me something like Martin did because you ‑‑ 286 slide very briefly so in my past life I was at KPN ‑‑ I started recognising the modules in that slide so I am very grateful you skimmed through it. I am Paul low, Cisco. At the IETF we have in the grow group, we have with Francoise and Camilla a draft. We are taking a different approach. So instead of, you know, building a database, trying to say what, from the operator perspective, at the moment in which you detect a leak, so, we advocate what to do and we advocate to detect and not to block, for example, and then we also suggest how some tooling, how to detect this kind of stuff. So maybe we can speak later, I think there is maybe some larger cooperation between these two views on the same issue. Thanks.
JOAO DAMAS: Next speaker is Daniele Iamartino.
DANIELE IAMARTINO: Hello. I am from the Polytechnic University of Milan, which is the largest in Italy. And I have been working with Randy Bush at IIJ and I have been studying the BGP route origin validation and registration in the RPKI infrastructure.
So, as, you know, RPKI was developed in 2012 in order to secure the Internet routing. And the ‑‑ what is going on now, it's the route origin validation, which is check the origin AS of BGP announcement, correct if it's correct using the ‑‑ this is not cleatly crypto checked so it can be eventually violated but it should prevent the vast majority of accidental hijackings on the Internet.
So as, you know, an ISP get a certificate signed by the CA of the RIR. Then the ISP sign a ROA, route origin authorisation file and put on the RPKI reposse of the RIR. It contains a prefix with the length and autonomous system number which is authorised to announce such prefix. So in this example, 10 /16 can be announced by AS 42. And then when we receive a BGP announcement for that prefix we check that the origin AS of the announcement is correct.
So, there is a detail about the ROA which is very important. If you create a 10 /16 only that prefix can be announced. So if we announce a longer prefix like 10.0.1.0 /24, even if it's coming from the correct AS number, the announcement will be invalid. And you can solve this thing in two ways: Create another ROA with the longer prefix, or you can just keep the other ROA and set a maximum length so you say okay I can announce ten /16 but up to 24 of length. In this case, also, the longer announcement can be done.
So, my goal was to understand what is the deployment of RPKI today, and the history. And are today's BGP routes valid against RPKI based origin validation and what happens if he filter invalid today? Is it safe? So this research was decided into two steps, the first one was just looking at the RPKI repositories without looking at BGP traffic. About the ROA publication. And then in the second part instead of looking at the real BGP traffic to see if it is valid or not.
So in the first part, the first thing I have done was going to look at how much IPv4 host addresses are locate ‑‑ allocated by each RIR and to see for each RIR how many host addresses are covered by a ROA. And as you can see on this table, RIPE NCC is leader in ROA registration because 15% are covered by some ROA. And even if ARIN has allocated most of the address space of the Internet, it lags far behind most of the other RIRs. And here you can see on the total that the total coverage of the IPv4 space by ROAs is about 5%.
So, we validate files in RPKI repository using the R C Y N IC tool. We have the RPKI repositories since 2012. We download every two hours from all repositories. And so we validated all the history and plotted valid ROA files, and this is what we got. And so here we have a log scale graph where you can see the number of valid accepted ROA files in each repository of each RIR. And here you can see quite the same percentage you saw before. And you can note that at some point the blue line which is LACNIC, just dropped down to zero, and this happened between December 2012 and August 2013. And we believe that this was due to expiration of their main thrust anchor and the fact that this was undetected is disturbing. There is a hole in the middle of the graph but that was our fault because we just left one month of data collection. And you can see that ARIN data is just starting from August of this year because of ARIN legal barriers on data collection because if you want to get data from ARIN repository, you have to sign a document. Just that.
And so we want to validate real BGP announcements. And we have BGP announcement history for the same period as the RPKI repository data we just saw. So, how do you validate that? We take BGP RIB dump every 30 days since 2012. We searched for each dump and for each announcement of the RIB dump we check if there is a valid covering. And this is what we got.
So on the ‑‑ here you can see our plot starting from 2012 up to August of this year. And this graph shows on the Y axis the number of unique peers of prefix and origin AS, and as you can see we divide it into valid announcements and invalid announcements. And here you can see that there is a huge drop in the middle and in fact, this was due to LACNIC, because as we saw before the main Trust Anchor went away and so all the prefixes ‑‑ all the announcements which were covered by that repository just became ROA not found, and so they didn't became invalid but they were not secure any more. And you can see that about 10% of announcements are invalid, and this was interesting but it's more meaningful to look at validation of prefixes because as I said, I measured the number of valid and invalid announcements and I mean, you can have multiple announcements for the same prefix, so it is better to just look at the unique prefixes. And this is what we got. It's quite the same thing, slightly different percentages. And as you can see, we divide it into three groups. So when our prefix is valid in all announcements we see from the monitor, when our prefix is invalid in all announcements and when our prefix may be valid in some announcements and invalid into our announcements, this, for example, if you announce the same prefix from two different origin AS and one is covered by a ROA and the other is not so you fall in the third group, which, by the way, just very few prefixes are in the last group.
So, even looking at this data we can see that about 5% of global prefixes are RPKI covered. And still we can see about 10% of invalid prefixes this time, not announcements. And so we ask why do we have invalid prefixes? So we break down the reason of invalidity. And this is what we got. We divide into three groups so when I have a prefix which is invalid due to max length or which is invalid due to ASN so the origin AS is wrong or the max in the first case, the maximum length is wrong, so I am announcing a longer prefix and ROA for shorter one. Or I have both the problems at the same time. And you can see here that about 50%, which is the max length group, it's quite a lot and as I said, in the max length case the origin AS is correct, they are exist but the prefix is longer so this I think can only be possible because of people who register a ROA, but then forgot of what they registered and they announce a different stuff of on BGP and so they get this error. And yes, we thought yes, this interesting, but what if we take into account the coverage because lets say we drop all invalid prefixes we receive, do we lose connectivity or not? You should think that if we drop RAGNAR ANFINSEN: ‑‑ we may drop an invalid prefix but that may be covered by another valid prefix, for example I drop a /16 but I have two /17 which are valid and so even if the shorter prefix is invalid, my traffic will come there anyway.
And this is what we got. So, you can see again a bar plot with the left bar is the same as we saw before so valid, invalid prefixes, and the right bar establishes reachable and reachable prefixes. So in this case, when I say that our prefix is reachable, it's because there is a valid announcement for it or there is an invalid but it's covered by something else. And so here you can see that a lot of invalid prefixes are actually reachable anyway.
And also, when we see an announcement coming from the wrong origin AS, in the 72% of the cases we can find the correct AS in one of the AS paths of that prefix, and this looks strange but we understood that, for example, if you are an ISP with AS 42 and you register a ROA for your 10 /16, then you have a customer with a different AS number, but the same subset of your address space, which, for example, announced 10.0.2 /24 from AS 66. In this case your customer will end up looking like max length and ASN error, but he is not aware of that. So, yes, so we can detect these cases just looking at the AS path. And if we consider also this, and we split again the reason of invalidity, we have again the max length problem alone and then when we have the wrong AS number in most of the cases as we can see, it's because of the ‑‑ of an ISP or someone else registering a shorter prefix. So, RPKI deployment is about 5%, and but we thought, yeah maybe it's interesting to see what is going on on a real network because this is just a BGP traffic and so we went to look on a big American research network, which is validating traffic going through, and here we saw that just 0.3% of the bytes going through the network are RPKI covered, but we are not sure about this measurement because this is research network where several universities and other corporations are exchanging literary terabytes of data and is not covered so this might affect our analysis. So we still have to work on this.
So, the deployment is good but it's slightly increasing but very slow, and so everyone should help to secure the Internet routing and you can do that just by registering a ROA and be sure that you announce the same stuff on BGP. And then you can start to deploy validation and filtering after. And also the top ISP coverage problem as I explained before, is very common, so when you register a ROA for your ISP you must be very careful that your customer also have a ROA or you register one for them. And most of the max length is about 5% and that is only because of people didn't understood how max length ‑‑ how the max length works, so you should be careful about that.
And that is all. Thank you.
Is there any questions?
JEN LINKOVA: In case of invalid originist have you checked how many of those cases are private advertised number of origin?
Answer: It's not very many actually, about one or two percent, I think. I don't have them here.
AUDIENCE SPEAKER: ‑‑ who has been considering invalids, I would like to know about those prefixes that would be truly unreachable and not rescued or anything like that. Do you have any insight into what they are? Are they, for instance, bunched up in the LACNIC region or spread evenly? I think RIPE NCC will send you an e‑mail if they see that your prefix is invalid, maybe you can confirm that. I am wondering if I drop the invalids will I lose Latin America or Russia or will I lose something like that, or evenly distributed?
DANIELE IAMARTINO: We didn't do that analysis yet because we just discovered about the coverage later in the research so we still don't have the data about that but we found out in several cases which are not reachable, we can find that the, maybe there is a ROA registered and if you go to look at the AS name of the AS number in the ROA, and the AS name of the AS number in the announcement, they are very similar so I think there are just people who register the ROA for one of their AS but they are actually announcing from another one, I don't know why.
RUEDIGER VOLK: So thanks. Let me kind of thank you for the suggestion, everybody goes out and register ROAs. Small fix, register the right ROA.
AUDIENCE SPEAKER: One short question. Did you actually try to contact some of these and ask them why or what went wrong?
DANIELE IAMARTINO: Not yet, sadly.
JOAO DAMAS: Go back again and more results. Thank you very much.
And now for an item that hopefully should be short. Actually, two items, we have to clear this thing with the routing Working Group charter that has been going on for far too long. So, we have a charter, that is the one at the top there, which is kind of doesn't describe what the Working Group is about any more. We propose the new charter, there were some comments leading up to the last RIPE meeting, I think those were on the list by Rob's answers, there were no further comments. My suggestion is we adopt these as the new Working Group charter going forward unless someone wants to scream about it. If you are caught by surprise by this thing because you haven't been reading the mailing list perhaps we can give you one more week and that will be it.
Now, this is a pattern that you may have been observing throughout the week. What is this all about? Well, right now, up until now, the whole "who is chair or not" or "when does one leave," "how do I put myself forward as a potential RIPE Chair," hasn't been documented, everything has been very informal, and the need to formalise this a little bit more has been specified so we are trying to address this. This is what we proposed on the list. There were a few comments in terms of perhaps the dates should be a little bit more pushed forward so that to allow a better playing ground, a fair playing ground for everyone, which I think is reasonable. And some of the Working Groups are opting for some staying away of voting for the cases where there is no, where there isn't a clear ‑‑ there is more than one candidate and not a clear decision from the Working Group, so stay away from elections and go wards some random selection process as the last step because it is felt by some that actually an election in the absence of a consensus is just as random and calls for more trouble than it solves. So we would like to move this forward. If you have any comment, let us know. We will riposte this to the list, hopefully I think we will give about two weeks for to gather any remaining input and then document it as how things are done for the Routing Working Group. To me personally it's kind of important because I would like to step down. As part of this process I looked at how long I have been doing this and it turns out I have been doing this since RIPE 43, the one in Rhodes, it was just over 12 years ago. It is really time for someone with different mindset to come up and take over and help Rob move these things forward. So think about it. I will stay until the next meeting so there is no vacuum, and everyone who wants to consider putting themselves forward, has time to consider what they really want and do it properly. And that is it. Thank you very much. Unless anyone has comments. We have five minutes. I think that is a bit short to try to fit anyone.
GERT DORING: Just thanks for chairing this for so long.
JOAO DAMAS: OK. Thank you. We said at the beginning I would try to fit an AOB, it's 3:25, it will be a bit of an ask to ask everyone to squeeze the thing in. What do you think, Rob? He has gone. That solves it. Come back at 4 p.m., please.
LIVE CAPTIONING BY AOIFE DOWNES RPR
DOYLE COURT REPORTERS LTD, DUBLIN IRELAND.