DNS Working Group session

6 November, 2014, at 9 a.m.:

PETER KOCH: Good morning, I am without the microphone. First, we probably start a few minutes later because there seems to be some difficulty for some of the local people to reach the venue. Second, those of the speakers who haven't talked to us Chairs this morning, could you please identify yourself and come forward to receive the honourable medals of whatever. And the rest may watch or go read your e‑mail again. Thanks.

Good morning. So, good morning, everyone. For the DNS Working Group session at RIPE 69. I work for DENIC, I am one of the Co. Chairs, and the two others are sitting in the front, Jim Reid and Jaap Akkerhuis. We have a number of presentations this morning. For the first time in a couple of years we are only having one slot, so if you come back after the break, don't be surprised to find a completely different topic. But for the time until the break, we see the usual RIPE NCC report with some news presented by Anand. We are looking at DNS attacks and, again, DNS dynamic abuse, so darker side of the domain name system. A talk from George Michaelson about crypto readiness or non‑readiness. Finally, as you have probably experienced a couple of times, we are going to have a discussion, debate, mud fight, whatever you want, about the Working Group Chair election, designation, whatever procedure, and you have probably seen the invitation to that on the mailing list and made up your mind and that is how ‑‑ lets have a crisp and short discussion there.

And then we already have one submission for the AOB. Before we come to the first presentation, I think I need to make some introduction. Emile is going to be the scribe, Emile from RIPE NCC. We have Andreas is Jabber scribe. For the benefit of the people on Jabberland and remote participants, please, if you have a question or comment, come up to the microphone and state your name and your affiliation if you so desire and be nice and look into the camera for the purpose of security and everything.

Are there any comments or suggestions to change the agenda? Going, gone. Thank you. Anything arising from the previous minutes? Has anybody read the RIPE 68 minutes, actually? OK. I can't see the people behind the peculiars ‑‑

AUDIENCE SPEAKER: They all had their hands up.

PETER KOCH: I believe so. Behind the peculiar is exactly the position to have your hand up. Review of action items, I guess we are coming to that in the RIPE NCC report in a couple of minutes. So without further ado, I would like to invite Anand to the stage to give the RIPE NCC update.

ANAND BUDDHDEV: Good morning. And I work for the RIPE NCC. And I am going to do a DNS update this morning, just a quick one, tell you what we have been up to.

I will start with K‑root operations. So mostly it's business as usual, we still have the same number of instances, 17 across the globe, serving about 25,000 queries per second. This year, we have been busy doing lots of hardware renewals. We have our five global sites and we are replacing all the servers and the rooter hardware in some of these sites, some of it is a little bit old so it's time to replace it. Some of our local instances are also at the end of their lives so we are replacing this hardware as well.

As part of this renewal process, when we installed, you know, the OS afresh, we are taking the opportunity to upgrade to NSD4, because currently we are on NSD 3, and we are also going to take the opportunity to introduce some diversity into K‑root's ecosystem with BIND and Knot DNS. Some of these local smaller instances will go from being full‑blown multi‑server router type of instances to being just single boxes, because single boxes can handle most of the load at these places.

Something that we have done recently, we have published an article on RIPE Labs and this article is about a proposal, an experiment we want to do to try and lower the latency and hop counts for clients to K‑root. This experiment is using RIPE Atlas data to produce some numbers, showing what the current latency and hop count to K‑root is from various locations around the world and regions and how we can try and lower this by adding instances in strategic places. So please have a look at this article on RIPE Labs. Feel free to comment on it. If you like, please bring up discussion on the DNS Working Group mailing list, and as we proceed with this, we will provide more and more updates on the DNS Working Group mailing list.

We have our authoritative DNS service, this is where we provide DNS for and other forward zones. The RIPE NCC's deverse DNS zones and secondary service including ccTLDs, we currently have three sites: London, Amsterdam and Stockholm, of which Stockholm is our newest site; it's been active since June 2014 this year, so that is just after the last RIPE meeting. So currently we have nine servers across three sites. We have some diversity in the network; we are using both Juniper and Cisco routers and also in this cluster we already have some name server diversity we are running four instances with BIND, three with knot and two with NSD and these are all the latest versions. (Diversity) and we are hitting about 100,000 queries per second on this cluster across the three sites.

Something that we have been working on this year and are continuing to work on, is provisioning resiliency, so we have a provisioning server in Amsterdam and where is and where all the Reverse‑DNS zones are provisioned, when the LIRs these get provisioned. Currently we have just one set‑up in Amsterdam and we would like to improve this by acting a second server in Stockholm. The servers are in place and we are now busy with the design and the testing phase of this. So, the second provisioning server presents some challenges. Slave zones are not a problem because they can just provision themselves independently on each provisioning server; they just pull the zones from masters and they don't need to keep in sync with each other. But then these manually maintained zones such as that we maintain ourselves and we keep all our zones in our repository so we need to find a way out of this, whether we do it with an export and R sync or whatever. There are various ways of doing this.

And then of course we have the reverse zones, the dynamically updated ones, and I think these present the biggest challenge because what we want is to be able to update both servers at the same time, but they need to be kept in sync with each other, so we have to consider things such as update forwarding or perhaps we have to figure some way out of doing master application or we can look at new technologies such as Rabid MQ and having multiple consumers but then we have the issue of keeping the numbers in sync. These are some of the things we are working on and trying to find ideal solutions for.

So this is an interesting area for us, and if any of you have some experience here or any clever ideas, I'd love it for to you come and talk to me after the session or off‑line by e‑mail as well.

Something else that we have been thinking about this year is DNSSEC algorithm roll‑over. So the RIPE NCC has been signing its zones since 2005. And back then, the only algorithm defined was Sha 1, and so that is what we used for signing our zones, and we have been doing that since then. In 2009, Sha 2 was defined by RFCs, and the root zone was signed with Sha 2 in 2010. And as a result, any resolver that wants to validate from top down using the root zones key must also support Sha 2. And so we we think, we feel we should also be upgrading to RIPE NCC for the Sha 2 zones because this is current best practice and Sha one has some known collision attacks and there is a possibility that it will be deprecated soon.

So, one of the challenges with DNSSEC algorithm roll‑over is that it's not as trivial as it might look, it's not just a regular case roll‑over. So I'd like to refer you to a report from cz.nic that was presented at an org workshop and they were talking about their experiences with algorithm roll‑over, and they were written a little bit by RFC 4035 Section 2.2 which says if you have keys of different algorithms there must be made with each of these keys. If you introduce a new key into the zone but you haven't signed with it, then resolver might think that somebody is trying to do a downgrade attack of some sort and stripping out the SHA‑1 signatures and stuff. So zones may fail to validate.

So what you really need to do is sign simultaneously with SHA‑1 and SHA‑2 but our current signer does not support this road mode, so we think we would have to go through a temporary phase, insecure phase where we have to remove all the DS records of the zones from parents, perform a case roll over to new algorithms and reintroduce the new DS records. So this is something that I would like to open up to the community and hear opinion and feedback about. Do you think it's OK for us to go insecure for a brief period in order to perform this roll over? Should we perform this at all? And if so, why so? And if not, why not? So I would love to hear opinion about this at the end of the presentation or come and talk to us later.

And finally, we have some open issues from past discussions within the Working Group. The RIPE NCC provides secondary DNS service for number of ccTLDs and we do this on a best effort basis and we don't have any agreements at the moment with any of these operators, but we would like to formalise this a little bit so that it is clear for both sides what roles and responsibilities are, and there has been some discussion about this but it would be great if we could get some consensus on this so that we can move forward.

One of the other services that the RIPE NCC is also providing is DNSMON, we have been doing this for years and originally this was based on TTM. As you may be aware, we have switched to using RIPE Atlas for providing DNSMON visualisations now, and one of the issues that is outstanding is a visualisation delay; the old DNSMON had a delay, so that if you were not logged in, you couldn't see the latest results. The current DNSMON doesn't have this delay and again, there was discussion about this in the Working Group and it would be nice if we could have some consensus so that we can move forward and close this issue.

And the other issues with the DNSMON is, at the moment, we are monitoring a small number of zones, mainly ccTLDs and some other zones like the root zone. Before we add new zones, we need to make sure that we have the capacity, of course, but also the criteria, you know, who gets to add new zones to DNSMON, which should be added and which not, again there was discussion about this on the mailing list, and it would be nice if we can reach some consensus about this.

So, that was my presentation, and thank you for listening and I will welcome any questions now.

AUDIENCE SPEAKER: About this algorithm roll over, there was indeed a bit of a problem back then because one of the validating resolvers was a bit overly strict, which was probably my fault, though that has been fixed by now, so it should be at least a little bit easier, depending on whether you do a double signature roll over or a republished roll over, so I am not sure whether this is still a problem.

ANAND BUDDHDEV: So I think you are referring to unbound being overly strict at that time?


ANAND BUDDHDEV: And I am aware it's been fixed in the meantime, it's become a little bit less strict and there are other validators and we don't know about them, so I think it would be prudent to be a little bit conservative.

AUDIENCE SPEAKER: Yes, of course, but I would like to see going insecure just for algorithm alone because it could be possible to do that.

AUDIENCE SPEAKER: We have a comment about /PHA*EUTS from DEAN, it says I guess we have to know exactly what is a brief period. Eventually you will want to switch to ‑ 2 with or without going through an insecure place. (SHA)

ANAND BUDDHDEV: Thanks for that comment. In order to go insecure we would have to remove DNS records and wait for twice the TTL to expire. This is not within our control. They are controlled by the parent and in some cases they are two days long and so we would have to be insecure for something like four, maybe five days.

AUDIENCE SPEAKER: And then he is adding ‑‑ still sign with ‑‑ validators should accept only one valid.


JIM REID: Just speaking for myself again. I think you are in a bit of a bind here if you will excuse the pun, what you need to do is do the roll over, give people plenty of notice. Explain there might be cases for some people running broken validating code, I don't think it will be very many of them, not many people are doing validating either so I think the impact is going to be very low. If we give it plenty of notice people can realise they should upgrade their code and fix in their installations and give them an indication there is a slight possibility of validation and be prepared to accept that. I think if there is going to be plenty of notice for that, there shouldn't be any real problem.

ANAND BUDDHDEV: I have one question in response to your comment. So I indicated earlier that our vendors ‑‑ our signer vendor doesn't support signing with algorithms of two different keys, what we would have to do before do this roll over is either request support for this feature and then test it before we do anything; or, migrate from our current vendor to a different sign up platform and then go back. Something along these lines. But this would take a little bit more time. We normally do rollovers somewhere mid‑November, which is coming up soon, and if the community feels that it is OK for us to delay our current roll over and keep using our current keys for the time being, that would be great. If not, then we will just proceed with the regular roll over and just roll over to new SHA and keys for now and think about algorithm roll over for next year then, what does the community feel?

JIM REID: That is a good point, I think maybe this is something that should go out to the list and I would suggest you put forward what you recommend you think the approach is going to be or in a default situation, if you don't get a satisfactory response from most of the Working Group to have a consensus, that at that point this is a course of action that we propose to take so we don't have the situation where you are blocked trying to figure out what to do next, the Working Group is assuming you have taken responsibility for it and of course then nothing really takes place. So my own personal feeling is go ahead, do a roll over just now as you would normally do, fix whatever needs to be fixed in the key signing software and the key management software so you can deal with the situation for the future and perhaps another another roll over sometime in the middle of next year when all of these have been sorted out, people have had plenty of advance notice and then move over to brief new world of SHA‑2 system.


PETER KOCH: I need to close the microphone lines.

AUDIENCE SPEAKER: Well, first of all, in a non‑emergency situation it seems to be to be an odd situation to deal with the security problem by disabling security so don't go insecure, there was a reason people ask you to sign the zones. Second, if the vendor doesn't support what you need, you give them a little bit of time after you first ask for the feature and if they don't respond you move to a new vendor, that is how things roll, right? So, if do you this things together, take it easy and the SHA‑1 attacks are not that dire at this time, but move firmly in the sense of forcing your vendors to provide a feature that you know you are going to need. I mean, this sort of scenario for roll over has been known for a while. You need this to symmetries. And move on.

GEOFF HUSTON: I just want to point out sort of piece of information because what Jim Reid said is basically not right. He made the comment that not many people use DNSSEC. 20% of the world's users use resolvers that validate. Now, if v6 had 20% of users using v6 we would be swimming in champagne out there, yet for some reason, 20% of users actually using validating resolvers is considered to be not a lot? It's amazing. So, that is the first thing. A huge amount of the Net already validates. So, be wary of that. The second thing if you head into areas that create out of date keys out there, be wary, because once you start serving ‑‑ signing information that cannot be validated, the query rate goes up by a factor of a minimum of about 35 to 40, so all of a sudden you get query thrashing because nothing gets cached, everything goes to the authoritative name servers and they get placed under enormous stress, it looks like a DDoS attack all over again, you already had that about three years ago. You are going to get it into a again. Joel is right, if you are not using a vendor that can't handle an algorithm role, you are using the wrong vendor of equipment and if someone from Secure64 is here or listening they should be listening very carefully because algorithm role is part of what we do.


PETER KOCH: And can we make an action item out of this for you or for the RIPE NCC to elaborate a bit on the presentation and send something to the list?


PETER KOCH: I know it will probably come out in an article on RIPE Labs but if you could really send text to the list so we can discuss this, that would be really great, so we can can elaborate on this because it appears to me that there is a kind of setting an example question there and doing it right in public for the RIPE NCC might help others to solve the issue.

ANAND BUDDHDEV: OK, we will do.

PETER KOCH: Thank you, Anand.

So, that brings us to the next presentation from Nicolas Cartron from Efficient IP DNS attacks and mitigations.

NICOLAS CARTRON: Good morning everybody. And today I will present a topic around DNS attacks. So it is a bit provocative, basically we collect users of DNS and what we see is that you have a lot of attacks that makes news every day but if you don't today compared with other topics not DNS‑related. The point of today's presentation is to show that you have existing techniques which allows you to secure DNS and to basically enter a path of the question we have today.

And the other point it's not necessarily complex to implement; a lot of techniques today here that I will show you already exist for years if not decades.

So that is basically the four points today. So the first one is stealth DNS. So basically, the stealth DNS is to hide your DNS master, to not disclose your architecture, right? So you have the green path on the left which contain all of your data being master and you only publish your slaves on the Internet, on the blue path on the right, having one of your ND servers pretending to be a master suede master.

It's slightly more complex to maintain than normal master slave, you need to watch out not to make any mistake, if do you a mistake you do exactly what you were seeking not do, meaning disclosing the information.

But in terms of you have advantages, one of them being you have a read only architecture which is all about improving security and if so, it's conformed ‑‑ confirmed to DNS best practices.

Second topic is around vulnerabilities from DNS and zero day ones. Before I kick off, this is not about BIND bashing, but the fact is BIND is most used DNS server in the world. Many people ‑‑ it was once of the first DNS enginings to be developed and any ‑‑ it allows you to be very flexible and to implement everything you can dream of but it still comes with some issues and one of them is that you ‑‑ BIND does not implement a clear separation between authoritative and recursive servers. A little bit like you have a lot of attacks targeting Windows and a few targeting Macintosh, the same issue going on with BIND because it's the most used DNS engine, most popular for hackers.

The point here is when you have zero day when you don't have the patch ready, you have two options: The first one is you need to monitor closely your DNS server to see if something is going on. The second option is more like crossing your fingers type of things, and we believe we have ‑‑ there is a third way to manage this kind of issues. And it's just being able to deploy DNS engines with different vendor, different technologies. So, and exactly what before me was explaining as a ‑‑ we see this in terms of interest from users, we see this coming more and more. People are starting to not necessarily move away from BIND, because BIND is still a very good DNS engine, but at least being able to use different engines and I have just list add few of them, there is from NLnet Labs but so from ‑‑ NIC.CZ, the idea is not about a dependent on single one. If you were to compare just an ‑‑ it's very much different, nothing, if you are used to dealing with BIND you will have to basically learn a new way of dealing with configuration. So it's very different to maintain. And there is so in terms of maintenance of patching, it's mainly that you have do more patching because you have more DNS engines.

So that is a cons.

In terms of pros, you are not forced to wait until a patch is available because either you switch from lets say BIND to NSD or the other way around, so you can have

TOMAS HLAVACEK: Half of your DNS serve running on bind and the other half running on NSD points. So basically ‑‑ something that again is really a shame when you talk about DNS because when you talk with any force at the level with firewalls, you find out that anyone is telling you I have two technologies for my firewalls, the first one is going to be a vendor like Juniper and second is checkpoint, it's very important to have different vendor because if something bad happens on that first layer, I have got the second one which is safe. Almost no one is doing this at DNS level. People keep just using one engine. Which is a shame. The idea here is really to fix this issue and allow you to have two different engines or maybe more if you like, like the RIPE NCC is using.

So, the point I wanted to discuss today is around DDoS attacks. So, there are two techniques basically to face a DDoS attacks today. The first one is about filtering and the second is about absorbing. So the first one filtering, aims at identifying which DNS packet should be answered and not answering the ones which are not not good. So that sounds like a good idea when you think about it but the thing is it's not easy to identify which packets are good from the ones which are not good or shouldn't be answered. If you see a huge increase in terms of DNS requests coming from the same single IP, in most of the cases it may sound like an attack. So just block it. Right? It may work for lets say user A. But say user B is a company which just had a TV show talking about this company and a lot of people are interested to find out what is this company about, the idea I will use their ISP caching servers and hit your user B DNS servers, meaning a huge increase of terms of DNS request and you could block something which is just normal. So that is tricky to do and it doesn't solve the issue in terms of ‑‑ you still have the same traffic entering the network. So at least half of it is still entering the network until you filter half of it.

So that is the first technique.

The second is more about absorbing, so that is the other way around meaning not filtering anything because we want to answer any DNS request. We think that in that case, trying to identify bad packets is not safe because you may open possibly these to have cache poisoning. In that case of course, it will require a much more robust DNS architecture and also when you deal with that you have to implement RRL because you don't want to be a reflector for your DNS.

In terms of techniques for this one, you have two big ones, the first is piling upload balancers and servers, you have to absorb 1 gig of traffic, you have architecture A and if you want to double this capacity, you just add to double the number of load balancers and stuff. That is the first one.

The second one is being able to use more efficient DNS servers. In that case, just a few examples, you know, in terms of performances as we we can see on the diagram, the idea is to say we have not only BIND, I will explain before, you have other DNS engines, we named a few like DNS ‑‑ which are able to be much more ‑‑ you have depending on the size of the request, you have different of performances which are between three and five times more when you compare DNS engines. It's a good idea to look at author technologies again, I named a few, when you have to be more efficient, more performance.

That is topic for today RFP Z, response policy zone. It's slightly new because compared to the other techniques I showed, RFP Z was introduced three years ago so the idea is to use DNS to be able to filter, it's really based on DNS, it's normal DNS zone which at least you can configure in your BIND or inbound and the idea is to say I would like to use my DNS to either not answer or to request which are not meant to be answered, so you have an example here with some malware trying to use DNS to identify their common control server, to help traffic information. The other one, that you want not to have specific domain name to be answered. You can maintain your DNS zones and/or subscribe to external one.

What is nice is you have different policies, number 2 and 3 aims at not answering that. Past through the last one DNS clients ‑‑ the first one is interesting when you are in a situation when you have a chain of recursive servers where you do not see the client's IP address so having the recursive IP address you just retire it, the DNS client, to a port where you can find out all of the IP which are potentially by malware. As you can see just behind that I just bought from Steffann, the idea that it's still based on DNS, so in terms of it. P file it's ‑‑

So challenges. Well, it's lying DNS, so using DNS it wasn't meant to do. Breaks DNSSEC so it will increase the load of your DNS server because every time you need to do a resolution instead of doing recursion you have to evaluate whether this is. It's more granular than doing when you wanted to block the specific DNS domain, you were just creating it and then having wild card on 12701. This one is more interesting because it allows you having exclusion, for instance block every single thing but a ‑‑ HR is using it to do recruitment. So you have different policies available in terms of what you want to do once you have matched a recorder from your PR ‑‑

So as a conclusion, so the idea of today was really to show you that most of the time we will find basically master slave and BIND architecture. The idea is just to tell you, just there are different techniques already existing today, already validated which are working well which allows you, when you are facing attacks or DNS issues, which allows you to react by not having to change everything in your architecture and not investing a lot of money or time by change your DNS architecture.

That is it. Any question?

PETER KOCH: Thank you. We should have time for like one or two questions.

AUDIENCE SPEAKER: Thank you for this very wide presentation. I have two comments: When it comes to DDoS or does for DNS, it's rather the number of packets per second than the saturation of the up stream link so the challenge is really whether it is answering the attack or absorbing it is really how many mega packets you can absorb or answer to. Second thing: The mitigation when it comes to the zero day attacks, the engines genetic diversity is very useful if it is only related to implementations but if it is related to the protocol you can have as many diverse, unless some of the engines don't implement the protocol and in that case chance they might be safe. So, as a community, we should think also of if we have a zero day attack on the protocol, what should we do? Imagine as a safe solution ‑‑ I don't have the answer.

NICOLAS CARTRON: Of course it was more about on specific engine.

PETER KOCH: Any other questions? If that is not the case, I think we can say thank you to Nicolas.


And next in line is Chris Baker from Dyn and we will hear experiences with DNS abuse on the observing side, I suppose. And by the way, I didn't mention that before so, with out of any particular link to a particular presentation, in the plenary, you can rate the presentations on‑line; you can't do that for the Working Group, but you can always come back to the Chairs or send some proxy if you are shy enough to, let us know what you thought about the presentations or what else you would like to see. With that, Chris it's yours.

CHRIS BAKER: Feel free to shame me on Twitter. It's lucky to have an opening talk like that especially with the closing being about DNS RPC. So what we are going to talk about now is some of the issues we face in operating DNS dynamic platform. Earlier there was series of incidents which led to some large actions and part of the goal of this talk is to highlight some of the complexity and managing the back end and preventing abuse in the first place. So we are all DNS people, so we can skip first two slides. They are kind of in the deck someone wants that high level explanation. We will skip right through.

So, why is abuse a problem here? Well, I think this is kind the challenge between the ingress on our end of the DNS and the people who are actually doing the registration. So one of the challenges we face is the notion of a disposable host name. We have dynamic updates, APIs which facilitate automation and a large number of providers which you can hedge your infrastructure across.

So if you think about the operational security of an attacker, one of the places they are going to lose detail or create a pattern for identification is in the Whois registrations in a few recent large scale attacks you can find they used a provider which is doing down reg for them or a fingerprint they used across a collection of domains. In case of DNS they are owned by provider so you loose this whole level of control. Domain reputation, they weren't new that have popped up, they have existed for a while and pretty heavy usage, people have used this to access their computers at home and DV R. And the last is economics of scale, trying to scale their oranges, each down reg ‑‑ they are going to have to register each individual domain incurring a post, they can pay a single upfront fee and cycle through many host names. In our case that is 260 host names. When you look across at the providers, if you think at 25 dollars for our dollars, 25 for somebody else's, now have well over 5,000 domains you can cycle through at minimal cost. This is where you walk through some rhetorical questions about problems people might have and ask questions why they aren't doing these specifically. When you think about what you are doing in your organisation, how you are going to operate your RFP Z what is it that you are logging and tracking and how is this information later going to help you. So specifically in this case, it's a question of how much logging are you doing and how in‑depth are you doing that. So if you are only looking accounts by TLD or domain you might be using some of these things essentially the person is renting that third tier domain. This is extremely expensive because the number of possible variations is a lot higher.

But everybody wants to get to specific use cases, all are actual incidents that we have mitigated and patterns we have tried to defect. Phishing, everybody's classic case. We are thinking about this a lot of cases we see someone will come along and pick a host names and register the register domain to it. The next step there is for them to put a wild card in so any permutation can be used and put this link into an e‑mail so the top one was one targeting Italian citizens and the bottom is more like an example which we will see frequently where they put Wells Fargo or some big company they are trying to impersonate into the URL itself. Behind that is a compromised CMS system usually word press, it will have somebody will login and collect all their details in a plain text file and and luckily somebody else owns that system so in the event they are identified, it doesn't really matter to the attacker because they have just taken the text file off that box and left the innocent person who had a compromised CMS out there.

So how are we detecting this. Well one of the things we are doing is when customers have a wild card we need to consider all the permutations which start start to exist in the world. We start to track is it I tunes domain name, is it also far go and there is one of the simpler things. Luckily as a provider in the back end we have ‑‑ the patterns and persons using and these could be useful if you starting to look at internally at your recursive name servers. One of the things we can see which most people can't is the number of names that the person owns, did somebody create an account and that day create 30 host names. In most use cases you think of an average human. Last time I did the numbers two was the highest, they will create a host name for that computer and some cases two because they purchased up to 30. But if you see somebody purchasing account and they purchased 30 host names, flags would go up and risk index occurs relative to that rate.

How many IP addresss are they using? In a lot of cases criminals are trying to limit exposure of infrastructure so want to limb imthe number of IP addresses or hosts which are related to these things and use them over time. Once they have started to abuse a host name they won't let it cool down for a while, point it to 127.01 ‑‑ and other cases more clever, they'll do 1.1.1. A lot of it is looking at domain registration toss RFC reserve spaces or places you know their A records shouldn't be pointing to.

So that is like a very simple case of phishing. The more popular case was looking for the command and control points. So as we all know DNS is extremely resilient. So you don't want to leave it based on the IP address to assign your home router to simple IP aaddress, you want to use the DNS. And it's gotten to the point where if anybody goes on YouTube you can watch a video of how to configure a handful of common access Trojans.

So in this case what are we doing? Luckily people do large scale collection of ‑‑ we put them into a sandbox and we kind of see what network activity jumps out. For this specific extreme RAT, they always had to have a clever name because it's marketing, you need to have a handle for it. I made a call out to this specific host name using community resources of people who are contributing known viruses and binaries, we look at the network activity and our system and see what we see on our back end.

A larger case is this wonderful Perl script which we found, so this is a bit more clever, using some more object fuss case here, create various name for this Perl to dial home to and pull down details of who they are going to attack next. They are using the API to follow through on that domain name algorithm. So these are two sets of patterns so we can look for strings in a binary and a script but also better that we start to see and these are people who understand more the patterns that we are starting to look for here. So in this case the person has two separate types of classification so they are having number of host name which they are including in that algorithm, as well as a collection of an grams which you can cycle through. Why is this better? It's the type of ‑‑ harder for us to detect. So because they are using that end gram strategy, if you look at the bottom, there are a collection of tools people have created to use various types of machine learning to takes lexicon and use those as legitimate and then cycle through large lists like the one on the left and say this combination of letters doesn't occur very frequently, therefore you might want to look at this. Or on the right‑hand side, wouldn't produce a similar warning because they are real worlds so it trips up a little bit more on those.

You want to ask yourself: Is this another problem with the economy of scale. Why it is. You see these more clever or better constructed host names, involved in direction traffic services. How do we detect those because they are so much harder to find? In this case we are relying on the way they are being used so this, at the bottom here you will notice there is three different calls to the DNS. Two of those are host names. So now we have to start to look for different recursive patterns falling in a similar frequency. When this happens the person is going to one site and going to the next and to a third. In that case we are going to see trip lets of these queries if you see a new host name you realise something here is outside the norms of the pattern and all of a sudden these are moving in concert and you start to look for other types that arise. This is where we start to think about what is the standard distribution that is we start to see recursive queries coming in for the specific domains. The new domains have a anonymous value and we start to look at the NX domain rates, because the person can't control them, as they are shifting between patterns to hide where they are, you will see high as they shift to the new one. This helps you say OK maybe that is associated with the same behaviour.

Then we start to look into and reach out to people who might be impacted by these things because essentially these are people who are abusing our services and harming our names. So one question is: How many unique ASs are involved in this population, is it a targeted attack on a language or certain type of bank, how many IP blocks, what countries and in more dangerous cases what types of companies are being targeted?

So this is where it will enter into the questions period. But before that, I would like to mention anybody who is looking for reports of this activity in their network, shadow server produces these, any time we find bad data we hand over to them and do mass outreach. They are nonprofit, everything is free and helps with scaling outreach ‑‑ it doesn't work too well. Sign up, they are free and very helpful. Back to the questions.

PETER KOCH: Thank you, Chris. Questions? Comments?


AUDIENCE SPEAKER: Ralph. Great presentation. Thank you. The outreach stuff is only for the infected ISPs, not for the domains that are infected?

CHRIS BAKER: That's correct. So it's the owners of the IP space which are seen as impacted.


PETER KOCH: Any other questions? OK. Thanks, Chris.


PETER KOCH: And next up is I think George or Geoff. George. We have had a short discussion about DNSSEC and crypto algorithm already. This is forward looking into the future, I guess. George, the stage is yours

GEORGE MICHAELSON: If you were at the recent meeting in North America, you have already seen this Pack, but hang around for the conversation; and if you weren't, hang around for the pack and the conversation.

We are aware of the fact that using RSA is not cheap, it's not cheap in bits, it's not cheap in compute power, and as time goes on, the RSA algorithm becomes more and more prone to concerns about the ability to achieve the level of factorisation that means it could be broken, so there is pressure to increase the key length, to get back the protected bits that you need to have an effective space that can't be broken. It's somewhere around ten to one cost, so if you have an RSA key you get effectively two to the 200 power of protection of the number of things that can be done.

That is a function of that form of public private key pair and the factorising attack. But there is this other class of crypto, when I read about it I get two things: For the same effective length of protected space, so if you imagine two to the 200 bits of protected space, in a lip trick curve in discreet you don't need ‑‑ you need about twice. So it's somewhere around 400 bits of effective key size compared to 2000 bits of effective key size. That is the first thing. They are physically a lot shorter.

And the second thing is that the maths that lies behind computing the discreet logarithm function is approximately 8 to 9 times faster than doing the DNS algorithm and broadly comparable for RSA, it is less computational load. You can do 9,000 of these per second and only do one second or of the RSA algorithm. Were it possible to use EC C there are two huge gains. Smaller packets on the wire, and as the slide says, there are some concerns in the DNS about how big the packets are getting because think about this: If you are performing a key roll over you have to have two of every signature and two of every DS and two of every DNS key and in the case of the domain zone when you have got quite a lot of attributes you get a lot of RR 6 against signed components and NSEC a lot of signed statements against the preceding and succeeding records, if you have left two of those your packets are getting extremely big and if you have to increase key length they are getting bigger. So dropping size here could be extremely useful. Dropping CPU cost to compute ‑‑ could be extremely useful if the volume of people perform willing validation is rising. So the question could we use EC C it's actually a very real question.

So, OK, lets use this. Well, what if if it turned out not to be so good, do we know? No, we don't know, we don't have a good handle on just how widespread use of this technology is. So we were asked the question: Could we use our measurement technique to do an investigation of capability in this space. So you know that we are using Google ads, we get a large worldwide pool of samples, 350 to 7,000 per day, we put four tests into the system. We had completely unsigned fetch, we had a well formed RSA signed fetch, an RSA that was deliberately broken, and we had an ECDSA instance. And we use a ex permal name form that has a long dotted string element but if you look at the annotation down at the bottom, most of what you see here is a wild card inside the zone and is simply preservation of unique and some collating information that we use to work out which experiment is which. The critical thing is that this path here is a unique zone fragment, we are running a space of about 750,000 unique /H*EBG strings and in the time of the retention of the record in the SOA, the TTL, we don't reuse the domain, so there is a very high chance had a any resolver seeing this has not got a cache and there is no cached state against that zone, and that zone part is really important for what we are testing here.

So in the eye eve view you ask a question with ‑‑ in the validating case you are actually doing this sequencing where, because you sent an EC DS flag in your request, the others say here is a SIG and you get in the extra section that block of extra data which is helping grow the packet so you buy the big answer pretty much immediately if people have that bit set and by now 90% of the world is setting ECDSA 0. So if you are validating, if you want to test that SIG, you have got to come back and get the DS and an associated come back and get the DNS key, you have got to perform two separated queries in order to perform a validation. And the theory is that you are going to go to the parent and you are going to get the1 DS record because you are doing top down validation, it doesn't make sense to do anything else because you have got to trust the base state and work your way down and the parents are going to say I have got a SIG over my DS so you can trust what I am saying and you ask the child what was the DNS key you think you are using and you can use theand DNS key to do the cross comparison. The problem is that is not how DNS works any more. It it is not your mother's DNS, it's a lot more complicated because you go to the farms and they say I am busy, I am going to farm out your query and we are individually running a timer because you thought you were going to see this followed by this, no, we are going to do all three at once in random order. Our timers will go off and we will ask one twice and one three times and you can't control any of that do co‑hearing of the behaviour. So things have got a lot more complicated on how it looks on the receiving side if you are trying to infer what is actually going on.

So we did first approximation looking at this thing that says if the fundamental signature of a DNSSEC resolution is that you do both theand DNS key, lets look at the relative volumes of DNS and DS keys. And if we do that, in RSA they are broadly speaking line and ball the same, there is really a very strong agreement between the number of people who fetch both records types. But in ECDSA there is a asymmetry that emerges so out of the pool of experiments seen in September about from about three‑and‑a‑half, you can see we are seeing almost a million of them doing the DNS key in RSA but there is this drop‑off that only about 600,000 of that million actually went the distance then do the EC C request, so we are immediately seeing from the aggregate number emerges. And that is a phenomenoninal number, that is a third. We are not talking oh one or two percent are not doing the right thing here. We are talking large numbers of the deployed worldwide resolver service don't appear to be behaving the same way in a different algorithm space. That was a bit unpleasant.

All what of end user effect. So more in‑depth analysis that Geoff has done which goes and does the correlation between web log behaviour and DNS query behaviour. So we understand what we think happens here. If you see a badly signed zone or an unzone algorithm, we think fine, you are going to be told stop or are you? So, no, that is fine, we understand this, you are going to get the DNS RR and told it's in an algorithm you don't understand. Going to do surf fail or fetch both of them and say I can't decode this and what do I do? If it just stops validating entirely. So we go through and look at the web logs and we get the numbers of people who are actually performing the validate question. And it looks like that 25% of people are failing in DNS, but the numbers don't entirely work out for the clients, because the clients' side, it's a much lower number of people who were effectively precluded from the ‑‑ not performing the validation exercise but we are seeing that higher number 25, 22% of people fetching, how can we see clients not going to the end point. So, we thought there would be a surf fail signal. And we thought there would be a result coming back that was a clear statement you can't trust what you are seeing here because I have been given an algorithm I can't understand there would be a positive indication to the end user that said, I don't know what is going on here. You need to think about this. That is what we thought we were going to see. OK, so if we look at this failure rate, this country break down which Geoff has compiled and you can see there is no overarching signal here of a particular kind of economy or deployment space which is having this behaviour. It's widespread between the economies of the world, different sizes and locations. If we do the AS analysis you seat same thing: There are some very large providers including in developed west earn economies failing to complete EC C validation, this is the top set and we are at 20, 25 instances, 96% for rejection.

So, here is some of the background. There was an IPR claim over the algorithm. And this caused some of the OpenSource software communities to decide to exclude the algorithm from their open SSL, in particular Fedora, Red Hat and CentOS made the call they wouldn't include crypto that had RFP I claim and somewhere around 50 share market share. Even though it is subject to distribute and has been updated to include the algorithm it doesn't appear to us that that has hit the resolver farms that are using this code base. This is not ary tune in the updated software suite. We have one part of it people didn't choose to include the algorithm.

The second thing is slightly more worrying. There is definitional qualities in the RFC which says if you come across an algorithm but don't know what it is, just treat it as if it was unsigned. Don't infer anything about it. Return to unsigned state. It's in the spec when you can't understand your crypto, cease behaving as if there is any crypto.

So, Google, in particular, appear to be fetching both parts of this record but then they return to the end user a missing flag set, they don't do the AD bit, so you don't get any authority signal coming back that would I ‑‑ Google are performing a fetch we were seeing as both halves are being fetched, they are not transmitting the same side of the information back to the user. If you take Google out of the equation it turns out 75% of end user resolver access can't handle the script ‑‑ not 25%, 75% and it's because the algorithm failover mechanism is the turn off crypto, there is no signal to end user which is why only one to one‑and‑a‑half appear to be affected by this. So that original question is a viable to deploy, well if you care about validation, the answer has to be no, right now this is not viable for deployment and that seems a really sad outcome because this was a technology which could potentially be eight times faster to perform computation and about half the size and the effective outcome in crypto when you take into account the SIG lengths so we have missed an opportunity here. But there is a bigger problem that Geoff has been discussing with me and with others that is not on the pack and that is this standards outcome that has said in the event you don't understand algorithm turn off the crypto behaviour feels to us potentially the for downgrade attack. The end user is not given any information that would allow them to infer they are potentially no longer protected by DNSSEC and we are proposing all behaviours but we are not signalling that you stop doing it. That doesn't feel good.

That is me.


AUDIENCE SPEAKER: ALT at that. So are you suggesting that we should change that into serve failing for unknown algorithms because that would mean that now ‑‑ where now 75% would be unsigned as a result, then it would be 75% of the world would not be able to resolver your zone ‑‑

GEORGE MICHAELSON: No it's 75% of the people performing validation which is somewhere around 20 to 30 depending on the numbers, it's shift of people who live in validating world, it's not the population of the global Internet but people performing validation. But yes, you are absolutely right, the impact is the number who would say I can't grock this key would go through the roof. True. On the other hand, the end user knows.

AUDIENCE SPEAKER: Yes, I think that the ‑‑ the validation should be done at the end user point anyway. But that is a different discussion.


AUDIENCE SPEAKER: Right now nobody does it. You could have this viable ‑‑ you could have a viable way to do this. But otherwise I think there wouldn't be an upgrade path to new algorithms at all if you ServFail on it.

GEORGE MICHAELSON: You say because of the potential to introduce an unknown algorithm you want to have a mechanism to do it smoothly during the transition period, you could silently be subject to an attack vector and not know. I could deliberately break the chain and do bad stuff in the new algorithm space, you would be seeing an RSA bad bad bad, but in the space that is transited into the ECDSA space you stop being told bad.

AUDIENCE SPEAKER: That is not entirely true. Maybe we should take this discussion off‑line. I think we could argue about this for a long time.

GEORGE MICHAELSON: I am often wrong but that is my fear the potential for breakage in the chain to go undetected.

AUDIENCE SPEAKER: So two things: First of all, the DS record is kind of signed in the parent zone


AUDIENCE SPEAKER: So has decided to use that algorithm

GEORGE MICHAELSON: No zone owner has I am using ‑‑ here is a DS structure is an EC C key I am signing over. So the DS is well signed and it has the algorithm in it.

AUDIENCE SPEAKER: Yes, and kind of this relationship usually if I sign my zone with that algorithm I give the key with the ‑‑ to the parent zone.


AUDIENCE SPEAKER: So it is kind of on purpose. The other thing is the discrepancy you had between people fetching DS and DNS key

GEORGE MICHAELSON: Yes, it was an observe in RSA that was broadly speaking at aggregate level, pretty much in aggregate volumes of both seen. This is deliberately uncacheable space. It's likely you would have a significant amount of cache state

AUDIENCE SPEAKER: If you see the parent zone and you know you don't have the algorithm to do the validation by fetching DNS key at all. But that is a valid kind of approach.

GEORGE MICHAELSON: But Google do. They fetch it and then perform internal discussions to decide they don't know what to do with it and then send the user no effective signal and have done downgrade of crypto.

AUDIENCE SPEAKER: I am trying to explain why you see the difference.

GEORGE MICHAELSON: I wouldn't fetch a piece of gobbledygook I couldn't infer. In general you wouldn't do that. But the strength of the signal is a really strong indicator of the number of people who cant /TKPWRO*EBG ECDSA. It gave us enormously strong flash. You can't assume this is a fringe issue, it's core. The algorithm is not viable for widespread use right now.

Martin: Martin Levy. You just said the words I want to draw down on. You said this algorithm is not viable for widespread use.


MARTIN LEVY: Now. So, my ‑‑ what I was coming to say, I want to invert that statement. What if all guns blazing we go out en masse and use this algorithm and does that in any way ‑‑ and this is a statement to the room ‑‑

GEORGE MICHAELSON: I collect my he is steamed colleague from the other house.

MARTIN LEVY: I know what I triggered when I said this. What if all guns blazing we go out with this algorithm and we force the issue and I ‑‑


AUDIENCE SPEAKER: If you go back to the list of ISPs who don't support ECDSA, I'd have a healthy bet that none of them know that they don't support it. Because they keep on serving the domain and that is the other problem about being so nice and not sending back ServFail, just blots over the problem of this whole fractured gap in ECDSA that the free software folk and their lawyers got so snotty about some GHOST of IPR that they broke an otherwise really using algorithm. So how do we make those folk change? Send back ServFail. Because at that point it's in your face this is a problem. These guys don't even know they are not validating because they have this resolver, names come out, who cares?

MARTIN LEVY: You have the stats but that statement you just made still comes down to a finite number of physical bodies, people, that we have to meet and beat up. I have beat

AUDIENCE SPEAKER: I have beaten up four of them but the problem is you are saying the problem is you should get SSL 1.something and look at your build module and make sure no ECDSA has been expunged. The problem is most of these see this as a cost not a benefit. People don't pay for queries so they sit there and let this crap rot and we are now paying the cost of that. If you see your name here and some of you will, go fix, go and put up a decent version of Open SL 1.something and fix your resolver. So if you are in that list there is another list on‑line of an even larger set of ISP, we know who you are, go fix it.

MARTIN LEVY: I had a hard time deciding whether I wanted to listen to the v6 talks I have heard before.

GEORGE MICHAELSON: When we have considered the question do you grock 50 .11, the lack of signalling in the response from serves or from clients to see competencies the fact that we don't have a richness of signalling is a real problem here and there have been drafts for improved signalling from the client up and from the server back and, you know, it could be really useful to have that. Because then we could have said, I have turned off DNSSEC. We actually could signal intent, client and server if with did a bit of work.

MARTIN LEVY: I am going to sit down. If this is an achilles heel that is going to stop us, I am to go back and say all guns blazing is lock and load.

AUDIENCE SPEAKER: Peter. ISC. So, inside of that table, do you actually do any sort of break down by what version dot BIND or ‑‑

GEORGE MICHAELSON: We don't do any query into the to do any analysis, nothing. It would be lovely if there was a signal in the resolver to say this is what type of software I am. We do that in http

AUDIENCE SPEAKER: And inside DNS software but every security to do things says block it. So you end up ‑‑

GEORGE MICHAELSON: What if they are wrong and we are right?

AUDIENCE SPEAKER: In any case, this is basically the long tail of the dog here, as Geoff pointed out, folks take their DNS servers and never see them for another ten years. These are also the same people who are still running BIND 9 .5 and so on.

GEORGE MICHAELSON: I think some of them are running BIND 4.

AUDIENCE SPEAKER: I try not to think that darkedly.

GEORGE MICHAELSON: How many Os are in behoove.

AUDIENCE SPEAKER: Some of them are running up‑to‑date BIND, it actually is the open SSL library and what it's been built with. A couple of these folk calm back to me and been on FreeBSD ten with a crippled open SSL build so the problem sort of lies in this crypto nonsense that, somehow, a problem that was resolved five years ago is still tainted today and an otherwise useful algorithm is branded as oh, no, we can't use that.

PETER KOCH: Thanks, George. With that I guess we are done with the technical topics for the day. Except for one submission for any other business.

JIM REID: Thanks very much. Run through this very quickry. As you are all aware there has been a lot of discussion going on in the background for too long about documenting a process or trying to come up with processes to deal with the appointment, selection and removal of Working Group Chairs. And this is what we have come up with as a suggestion. We don't have a consensus amongst the three Co. Chairs about this, we think what was being suggested here should form the basis of what we think we should do with the company.

General points is please keep this simple. Lets not try build a huge temple or look for corner cases. Trust ourselves to get this right. We don't need to have complicated procedures even though we are engineers and we like complexity. Lets try to take basic engineering principle, is this good enough? And if the answer is yes, go with it. If this procedure doesn't work, we can always come back and revisit it and try something else later.

Really what we have to do here is just simply trust ourselves and apply common sense. Like my mother says, common sense isn't always that common.

There is a few points, some of the stuff should ‑‑ we are processing we are going to have N Co. Chairs which the company will side. Currently the value of N is 3 and the recommended maximum for any Working Group at the moment from one of our documents, I think it's 5.9.2, no more than 3, and we have three at the moment.

What we are suggesting is that every year one of those Co. Chairs will stand down or make themselves available for reappointment and we style this process one per year so a three‑year term if you are three. If you are two Co. Chairs each has a two term and stagger them about. I need to go back there. One of the points of contention is the suggestion that nobody can serve more than two consecutive terms. I think this is a good idea because it will encourage some rotation in the mechanism and you guys are not going to be stuck with the same faces all the time. Some people think that is a bad thing, give us your feelings of whether you think it's good or bad idea. We will give you plenty of notice in advance that this is coming about and the decision about this will be made primarily on the mailing list. It's not going to be decided at a company meeting at a RIPE meeting. And anybody is welcome to volunteer, we want to keep the entry as low as physically possible.

And we will decide by consensus who should be the new ‑‑ if we end up in without a clear consensus, we will put all the names of who we thought are equal deserving candidates in a hat and draw one out, that will be the new Working Group co‑chair. Lets not Rye to get into complicated procedures and elimination rounds and bottom candidates drops out. Blah‑blah‑blah. Lets keep this simple. And at any particular point if you are fed up with us you can say we are fed up with co‑chair Jim Reid, please go, and that will be the end of it.

The consensus decision will be done on the mailing list and the other Co. Chairs of the Working Group will maintain that consensus judgement. Lets say, for example, my seat or my position is up for reconsideration, it would then down to Peter and Jaap to make the decision about what the consensus of the Working Group is about that. I will have no part in it. Likewise, that will change if it was somebody else that was involved.

And finally, if we are unhappy we kick it upstairs to the RIPE Chair or Co‑Chair who makes the final decision about that.

So this is what we are proposing to do. I welcome any comments from the floor, through rotten tomorrow at mows, bricks, money, preferably money. Lets us know what you think about this. I would clear a clear steer from the Working Group on two points: Do we think this procedure is satisfactory or not, and if it's not, what should replace it or how should it be changed? And what do you think about this idea of having ten minutes on Co‑Chairs? I would like a clear statement from the mailing list where we could lead this process or not. Silence will imply consent of sorts, I hope we find the final text of this process. We have statements of support for it in the mailing list so a clearly defined procedure in place. Over to you guys now for comments now or do you care about this at all. Have you heard it too much already?

AUDIENCE SPEAKER: Brett. I support the proposal completely and I think we should rotate the Co. Chairs periodically. Anyone else?

AUDIENCE SPEAKER: Jim dot L U. One question to the Chairs. What is the number of Chairs you are proposing, is three too much or is two OK?

JIM REID: That is a very good question and this touches on something Peter was going to bring up under AOB. We have tried as an experiment this time to have a single Working Group session slot and the discussion of the Working Group Chairs because there is new being created, we are having more of a problem managing the meeting over all and we thought we could manage with a single slot of the Working Group and as it happens that looked as if it was going to be OK until about two weeks before the RIPE meeting and there was a flurry of last minute submissions, so we could have filled up two slots but we had already committed to one and it was too late to change things. If we continue with the two slot model and there is enough, three three Co. Chairs could be about right, if we drop to single slot, maybe only two Co. Chairs. There is a fair degree of uncertainty at the moment but I think two or three should be just about right most of the time, maybe, perhaps. Possibly. Any other comments? Nothing. What I will do then is I will riposte the text to the mailing list. I will ask for statements of support. Assuming we get consensus for this, we will adopt that, and then we will document this, we will pop it on the website page and might put it as a RIPE document, who knows, we will have that process in place and assuming the ‑‑ the first selection mechanism will take place right about the time of RIPE 70. OK. Thank you.


PETER KOCH: So, we have two submissions for AOB, one of the slides and I think that is Ondrej.

ONDREJ SURY: Ondrej Sury from cz.nic. So this is just short announcement that we are working on resolver from the Knot DNS family. It should be scaleable, standards compliant and all the stuff you would expect. And it's currently work in progress. It come piles, well sort of resolves the issues. You should have DNS validation early next year. And if you are really, really interested in getting on the source code you can ask us. That is basically it. This is the first ‑‑ this is where we are, basically. OK, thank you.


And finally, I think Kaveh wanted to make an announcement.

Kaveh: Quick update from last RIPE meeting where I said we would produce results on history of DNSMON data because we had host oracle pages. We looked at the logs, we have a few thousand hits on that page every month. We think most of it is scripts which get an empty page but cycles, analysing that log we decided for the moment we spend those minimum cycles and keeping the history of DNSMON alive. The status is not going to change, the visualisations will be available. If you want to change that we will give the Working Group plenty of announcement time.

PETER KOCH: Thank you. And that brings us to the end of the session. I would like to thank all the presenters for submitting their presentations on time and contributing to the Working Group. Thanks to Emile for scribing, to Andreas for being a Jabber relay, thanks to the remote participants, thanks to our stenographer and to everyone and we will probably meet each other again next time in Amsterdam.