MAT Working Group session
5 November 2014
MAT Working Group session
CHAIR: Welcome to the MAT Working Group session. Please find a seat, fasten your seat belts, put the tray tables up.
As usual, we have quite a diverse round of topics around measurement analysis and tools for the next 90 minutes. We have, as usual, a scribe, Jabber and stenographer. So if you have somewhere in the middle a question and you want to participate in the chat room, then let us know, ask, and the questions will be transferred.
There are some power outages around the chat room, so if you are actually getting kicked out of the chat room, you can use Twitter, which should be up and running, and if you use the hash tag RIPE 69, then you can relay your question via Twitter.
Something quickly about the microphone etiquette. If you have a question or comment, which you hopefully have after the presentations, the microphones are in the middle, please state your name, and affiliation.
Assuming you have read the minutes from the last RIPE meeting, are there any questions and comments to the published ones on the mailing list? Doesn't look like...
Does anyone object that we approve them? Then, I declare the minutes as approved. And we move onto the, at least for my little presentation, most interesting part.
So, two things happened. One part is Richard, who is the other co‑chair, can't make it today, and as you have seen it on the mailing list he actually has to step down because his day job doesn't give him enough time to participate any more. There is a second component, which is, and you might have seen that in other Working Groups, the Working Group Chair collective has worked on processors to elect and, if necessary, remove Working Group co‑chairs. So, this comes handy because now we need that process, so what happens is Richard and I worked on a very small ‑‑ process is even a big word for seven or eight bullet points on a paper. These bullet points we will in the next one or two weeks circulate on the mailing list, sent send it to you so that we as the Working Group, can actually hopefully find consensus and approve them. So, this, in comparison to the past, where Working Group Chairs kind of fall from heaven and they just appeared on the stage gives us a more transparent process to elect them or if we, or you don't like them any more, get rid of them. So please have a look at the mailing list in the future and when we send out the proposal, please comment. Because this is what we then want to approve. And I guess so far, if we agree what we want to do is, we want to call for interested people who, well want to stand next to me or actually stand here so that I can sit comfortably on a Chair, and Chair the next session. The idea, if everything works fine is that we get enough people which are interested in the nominations and then we find via the mailing list an candidate, a second one. And that one will then hopefully help me at the next meeting to run that in Amsterdam.
Any questions or comments actually to that part? No?
Then, let's start with the agenda. And the first speaker is Randy. So Randy will talk on the suitability of two large scale Internet measurement platforms which is a joint venture out of multiple people, but I guess Randy will tell you about it.
RANDY BUSH: Hi, Randy Bush, IIJ, I am really standing in for Thomas Holterbach, who is a grad student who spent the ‑‑ well he is still at IIJ research, and so, he's smarter than I am, as is Crystal, and so I'll try to make it through this presentation, but we'll see.
So, we know what RIPE Atlas probes are, I won't go into it. But they are cute but they have gotten fatter, that's a shame. Everybody knows what ring nodes are. So, there's about 300 of them out there, most are VMs, and so, they are much more powerful and flexible, I plus they have got some tools.
So the question is, how good are they in comparison? If I want to conduct RTT measurements, which one do I want to use? Can I use them together? We wanted to measure how accurate they were, specifically precision under certain conditions. So, there happened to be 12 land segments in the world that have an NL ring node and an Atlas probe on them, so they should, from an external point of view, be on the same network and act the same way. So we're going to measure from each of these to the others, so, it's, you know, 11 factorial, and just for control, 2 BSD servers and for those of us who remember our Tokyo ping paper, we are going to use different flow IDs.
So, they are on the same local network, and we're going to ping towards the others, and we therefore should be able to compare the behaviour of the Atlas probes with the NL ring nodes and the BSD servers. By the way, Daniel, anybody else, feel free to interject questions, etc., this can be as interactive as you're willing to make it.
So, for example, we have R, which stands for NL ring node and an Atlas probe, on the same network here and we are going to use the NL ring node to ping both the Atlas probe and the NL ring node and all the others, and we're also going to ping, as I said, a server. And similarly, here this is the probe and it's going to the others, and similarly, here, okay. So we have 12 nodes, so they are 288 coupled source destination pairs effectively.
So, scamper flow ID, Atlas probe, ring node server and whatever else you want, but those are what we used. And we have a collector to collect the results. So, the NL ring nodes allow us to ping more ‑‑ as a source, allow us to ping more frequently than Atlas probes do because Atlas probes are on a diet ‑‑ I should learn from them. So, there are 12 pairs, plus the 2 servers. Every 2 seconds, 284 pings effectively, how many got back, the statistics. Lots of them.
So, here is an example where we have the ring node and the Atlas probe, they're both on the same segment. We're pinging from far away, it looks like we're pinging from about 160 milliseconds away, well actually 80 milliseconds away 180 round trip, and these are different flow IDs. So, both of them are acting the same way. It looks like there were events that really blew things up but it blew them up in the same way. So we think these are probably network events. Somewhere between the source and the LAN on which these two targets sit. In this case, here is an event that affected the Atlas node, but not the ring node. So, we can hypothesize, we don't know, we can hypothesize that maybe somebody else was running experiments on that Atlas probe and it was loaded.
So, here is one that an event occurred and another event occurred and it was only with the ring node. So, was another VM competing with the ring node? Was the ring node itself doing something? We will never know.
So, here, same destination, same time, a different source, so this is the same as this one going ‑‑ as the target is the same, different source. Why did we see something different? We can't tell you. Don't get your hopes up. So, it's probably ‑‑ that just tells it that this is probably something that's very local, but probably networked local not device local.
So, what's it look like over a week? Violin plot, I'll try to explain it, at this point Thomas is much smarter than I am and I'm not going to do well. So here we have the 12 LANs, each with a ring node and an Atlas node on them. Here is the LAN number 1. Destination 1. These are the destinations. The Atlas probe and the ring node. Here is the source and it tells you ‑‑ you'll notice that the source has this little circle, so you can go to these little circles and see that for this pair of destinations, here is the mean ping time from this source. Here the ping times for this source, etc. Now, these are different this way, of course, because the sources are different network distances from the destinations. But notice how well they pair.
So, here we have the global roundtrip average. By global we mean from all sources to this destination, what's the average RTT. So, notice how close they always are.
So, between these two platforms, the average global RTT difference is half a millisecond. The worst case is 1.5 milliseconds. One of these, I don't know which it is, probably this one looks ugliest. That one is a little bad too.
So, I guess he is trying to tell me which one it was. Probably that one, or maybe that one. So, we want to look at the flow ID effect. So, we take and we define the average RTT as the standard deviation over the 14 flow IDs, excuse the math, you know what it says, it's the standard deviation between the source S and the destination for the particular flow ID and it's the mean of the standard deviations.
So, here, for this ring and Atlas pair, here the flow IDs, and here is the spread of those flow IDs. Why is the spread different for the ring node and the Atlas probe? I am not going to answer that on the next slide. Sorry gen. This is a measurement study. It's not an answer study.
So, because there are several sources, the 11 ring nodes for destination, we have the mean of the sources for the destination, and this says half of the destinations ‑‑ we're here going to compare, remember we had two servers for control, here the two servers, okay, here is the significant ma, and notice that most at class and ring pairs (SIG ma) half of them are better than dedicated hardware servers. The bottom line from this whole presentation, I hope you're getting it by now, is these measurements platforms ain't bad. They are surprisingly good.
So, here is a scary one, but the Atlas probes don't lose more packets than the ring nodes and the two servers. This is packet loss percentage, something happened to this one, who knows, R1 A1.
So, what have we increase ‑‑ so we said, okay, we're getting these results for 12 of them, the 12 pairs. How do we know these aren't special? How do we know that these are representative of the network? So, we did what I jokingly call spray and pray. Take 250 ring nodes and see how they behave in comparison.
So, we have ring node X perform towards the 12 selected ring nodes and compare it to 250 random ring nodes. So, the same thing for the Atlas probes, the 12 selected ones and 250 random ones. The duration is a week for 16 flow IDs, here is the ping times etc.. we tried to ping once every 20 seconds. It ends up 21.5. I think it's city tax or something.
Now, this is a little difficult. Here is the curve of the random ones. Here is the 12, what you have to do is imagine this spread out horizontally and that is very much the same curve is the point.
So what we're seeing is the 250 acts very similar to the 12. So those ‑‑ the whole point is those 12 are kind of representative.
Here is packet loss. Pretty much the same. This shape is pretty much that shape aside from horizontal scale. Both for ring nodes and for Atlas probes.
So, what else? Some events, here is we see again the flow IDs, some events only affect some flow IDs. Not others. Now, this one is more believable at least to me, because since it affected both the Atlas probe and the ring node, it's a network event, and the network we maybe add multipath or something so the flow IDs discovered a change in some of the paths and not in the others, so we kind of maybe we could make up a story about this one.
The Atlas probes finally the folk at the NCC upgraded the Atlas probes with release 4660, so that they can fix the bug in Paris traceroute so they could use Atlas probe at source for multi‑flow ID, so we are currently measuring on that.
So the on collusion is the Atlas probes, were able to provide comparable results to the NL ring nodes and to the BSD servers. These things aren't bad. I mean, it's the size of my thumb. I'm surprised.
So, it runs LINX instead of Berkley. So, what we want to do is study the Atlas probes now and the sources now that the Atlas probes will do flow ID. We want to see how the maximum frequency Atlas probes can stand up. And I think that's it.
Questions? I got lots of questions, come on...
AUDIENCE SPEAKER: Philip home best of my recollection, RIPE NCC, I think, I mean it's a great talk. There's one thing curious that I found is that in some cases where you plot the different flow IDs, that say on the ring nodes it goes down and on the Atlas probe it goes up and vice versa, is that something that happens a lot or did you just select them because they are interesting or...?
RANDY BUSH: Those were somewhat typical. Yes, I mean of course, we didn't show you the 50 boring ones, we showed you the 50 interesting ones. Vesna.
VESNA MANOJLIVIC: RIPE NCC community builder. If you could repeat this measurement as RIPE Atlas probes as destinations and make a distinction between the RIPE Atlas angers, which of the actual large servers that can take a lot of measurements while being a target, and make a distinction between them and the small probes which in my opinion are not actually supposed to be used adds destination but rather as a source.
RANDY BUSH: They work well.
VESNA MANOJLIVIC: You got lucky.
RANDY BUSH: Yes, there are a number of things we could do. Also the problem is the student is going onto the wide world to more interesting things. But, if somebody wants to work on this by the way, more than happy. We should study anchors versus sales, we should study versions of the probes. Daniel?
DANIEL KARRENBERG: One question: The raw data of this, is that ‑‑
RANDY BUSH: Yes, we're putting it up. And the code.
DANIEL KARRENBERG: If you could send that to the MAT Working Group mailing list.
RANDY BUSH: Send me an e‑mail message to remind me.
DANIEL KARRENBERG: And a second thing is, if you, in your random sample of the 250, spray and pray part, did you actually look how many of those were behind NATs where actually the NAT is answering your ping, or may be answering your ping?
RANDY BUSH: Actually he did something on that dimension. He did something on that dimension, I don't remember what.
DANIEL KARRENBERG: Fine. That would be really interesting, because ‑‑ and that's may be also interesting for the general public. If you use ‑‑
RANDY BUSH: A lot of Atlas is behind NAT.
DANIEL KARRENBERG: If you use Atlas probes as destinations, a lot of those are behind NATs and I believe there is a tag even
RANDY BUSH: Yes, there is a tag.
DANIEL KARRENBERG: There is a tag that says I'm behind the NAT, so, you have to be aware that if the probes behind the NAT there is a high probability that the probe will not be answering but the NAT, which may be just okay, but you have to know it.
AUDIENCE SPEAKER: Gen owning owe Google. You mentioned that before.
RANDY BUSH: Yes are of course. We are measuring the Internet. It would be nice to do v6, but the problem is if we try to do v6, we don't know that we'll have, you know, only 12 pairs was a little small. So... but point taken. Point taken. Ring nodes are supposed to be v4 and v6, so therefore those pairs should be v4s and v6s.
AUDIENCE SPEAKER: RIPE Atlas angers are also v4‑v6.
CHAIR: Do we have any questions on the chat? No... well I'm very happy to see as I use RIPE Atlas a lot ‑‑ I am very happy to see as I used RIPE Atlas a lot for myself for my masters thesis, that the result in itself are not just comparable between the various Atlas probes but you can actually not calibrate, at least compare them with others. So you know that what you are actually measuring is kind of the reality not just you know, you have a lot of samples and you compare the samples which might be good enough but that's actually kind of showing you the truth. Is that what you said or not?
RANDY BUSH: Well, you don't know it's the truth. You just know everybody is telling the same lie. But, I forgot to say just like the other study with Tokyo ping, this was not the purpose of our study. What happened was, again, we had a study we wanted to do, to do large scale measurements of RTT and we said oops, we better calibrate our instrument and this is just the calibration.
CHAIR: Now let's come to something completely different and to the next speaker, which is Brook.
BROOK SCHOFIELD: Hi everyone. I'm Brook Schofield from from an organisation that's recently changed its name, so the e‑mail address still works, this was a badge I made at an OGF meeting that I didn't quite understand, so I spent a bit of time designing my own badge. So, this works as my e‑mail address so you can contact me if you find any of this interesting. The flags represent not the languages that I speak, if I was a flight attendant, just the fact that I am an Australian who lives in the Netherlands, unfortunately, I do not speak Dutch. Started off working on the Eduroam programme group and I slowly migrateed to work on Terana and work on Eduroam programmes there. There is a video if you don't know what Eduroam is about, but I won't eat the three or four minutes of my time in playing that.
So, Eduroam, I don't know if the audience knows what Eduroam is. Presumably if you are from the academic world you'll know, but if you're not, it's a wireless roaming service for research and education, it's based on WPPA and 802.1 X. We have a hierarchy of infrastructure which transfers your credentials back to your home institution. We have a peering of that radius infrastructure and the ‑‑ and policies, in order to build this trust fabric, and we don't have web splash screens or you know shared passwords for getting onto the wireless network. This is not what we want in our community or at least this is what our community has voted for. And it started or in Terana's Task Force on mobility. It sort of stands for education roaming. So the infrastructure looks like this, this is an old diagram, you can see wonderful window C device and things like that. But to connect to Eduroam, your device attempts to associate to an access point, the information is passed on to an institutional radius server. If it's not fulfilled by your home institution it goes to a roaming operator's radius server where it will eventually get to the institution that you belong to, authenticate the user database and the message that gets returned hopefully if you got your user name and password correct, you'll connect to a network. And this is sort of the infrastructure that you'll often see for other wireless services that implement this kind of technology. They don't always support federated access.
So, if you are from the Netherlands you might be familiar with UPCs wifi spots if they have turned your home cable router into a wifi access point but the user names are flat. There's the Cloud service available in the UK and other countries. They have an SSID called the Cloud X, which if you install their fast connect app, you can use this to get access to their network where it's available. And Comcast, also have something where we have you know taken over your cable modem and offer a wifi service on top of that. So these are other groups that have got sort of an Eduroam like network where we have ‑‑ but they have a flat name space.
So, the difference here though is Eduroam is an overlay network. We don't run our own hot spot infrastructure. We rely on the infrastructure that's provisions by campuses, and so that creates some challenges for us because the infrastructure run by campuses, you know, campuses want to make that reliable but maybe they don't always have Eduroam as their primary concern, maybe they don't go through a full suite of testing when they deploy that service, so it can vary between campuses unfortunately. And we hear that because students use Twitter.
So, we're sort of the worst franchise in the world, we allow campuses to deploy this and we don't have the means to actually verify that people are doing it correctly. If you run a pizza franchise, if you don't make the pizzas according to your franchise specification, it costs more in your compliance, but getting Eduroam wrong, we can't verify that, except via sort of user complaints and we work with people and it's unfortunate that Ashley thinks that Eduroam ruins her life, but you know, we don't want that to be the case. We want to make Ashley happy.
In fact, we do want to imagine a world where Eduroam actually works consistently. This is the goal and this is why I am here today, and some of the comments get really, really quite weird, with regards to how how damaging Eduroam liability can be in a personal sense. I'm not too sure, maybe we should go and look through births and records and see if there is any naming of children Eduroam, I fear not.
So, we have actually tried to solve this problem in our community. We have had a few stabs at it. The first was the Eduroam Pulse made by Arnet in Australia, Sheeva plugs in 2009 were very important. It didn't scale as a service and a community didn't evolve from it. So while there was some limited deployment in Australian universities, it didn't go much further than that.
SURFnet have had a project around the same time. They wanted to do the Sheeva plug way, but they ended up finding these Alix system boards instead. They actually put the cover on it so it doesn't look quite this ugly when it's deployed as a customer site. It's deployed at approximately 14 sites. So it's not setting the world on fire but they are continuing to do deployment and try and get up take.
We have what I call the Franken probe within the general project ‑‑ I call it the Franken probe because it looks like this in its sort of development stage. It's a raspberry pie based system, built on, by Sersi in Croatia. It's not really a solution, at least not at this point in time, and it's primarily looking at scanning of SIDs and signal quality, and signal use, so, not ‑‑ it's not there yet. Maybe it will be in the future.
And then JAnet, the NREN within the UK working with the University of Luthbra, have a probe system that's actually progressed quite well. Used W PA sum can't and scripts to orchestrate itself. Initially 20 deployments around the country and soon to grow to 200. It's based on the, it P link M R 3020, which you'll be familiar with, so it's in effect functionally equivalent to the Atlas v3 probes. It looks like this, except it ends up with an Eduroam sticker on it rather than an Atlas sticker.
So, why would we want to do wireless measurements on Atlas? So the Janet Luftbra work, as a proof of concept has proven that you can actually do it, but their solution doesn't scale well they don't have the infrastructure, they don't have the backing in the infrastructure. They don't have the mechanisms to do firmware updates. They don't have the longitudinal data that Atlas records, so there is lots of gaps in their knowledge. Eduroam is pretty much equivalent to the hot spot 2.0 next generation hot spot work, so, there might be other operators of hot spot 2 service that is might want to utilise Atlas to verify, because I believe in the UK, the Cloud has partnered with JAnet to offer their service as an overlay within campus networks so they might want to make sure that the institutions are getting the Cloud deployment correct.
And the best thing is Atlas has the footprint and infrastructure with more than 7,000 probes deployed and I presume, I guess about 4,000 of those are the v3 series. Atlas also has data sets that are publicly available and the longitudinal data that we wanted.
So if we were going to partner with RIPE Atlas, what's actually in it for RIPE Atlas? And there is a huge benefit for the Eduroam community, how do we do something to give back to the RIPE community?
So, firstly, a footprint, hopefully we can improve the footprint of Atlas where it's notes always covering ‑‑ I forgot to look up how many countries the Giant network covers, it's 401 in Europe. It's 70 more outside. We have a big relationship in building networks into East Africa, into Latin America, the Caribbean, the orient link to China and Asia. I'm not exactly sure of the country number. Maybe someone else can help me on that. But we have a big influence in the R&E network development around the globe and so hopefully we can get Atlas into those locations as that network build out happens. Because if you look at the density map for RIPE Atlas, there's still some spots of no coverage.
We have actually decided to put some money into it, we have now started to sponsor RIPE Atlas as everyone whose happy with RIPE Atlas should do, I think.
And as we improve Eduroam, you know, coverage and quality, that will also have some benefit to Atlas.
This is where Eduroam is at the moment. It's in 70 countries on the planet, or at least 70 territories. So, if you quickly count that up, you can see whether my map is accurate. We have pilots in another 22 locations, which you might not be able to see the light blue on this map. And then there are places where there isn't Eduroam but there are other networks so we are trying to work with people to ensure that we get Eduroam in those locations.
I am an ambassador for Atlas, so this is where I have bothered to deploy the probes that RIPE have given me. And actually ‑‑ and then all of the grey bits are the people that I have to continually remind them to register their probes and plug them in. But you know, we do a lot of work around the globe and this is where I have managed to deploy, I think I have only been an Atlas ambassador for this year, so I have just levelled up to gold.
The other thing is what if RIPE says no to doing wireless measurements? Well, it probably creates anti‑patent for deployment. I guess we look at how to build our own system and that's not preferred, I don't think. In fact SURFnet are willing to get out of the probe business, they would like a solution that their members liked, used, scaled well, and was looked after by someone else. Also, these tools don't benefit the RIPE community or Atlas community if we build this parallel infrastructure.
And we're going to be spending you know, public money on a resource, because Atlas is more than probes, it's all of the back‑end infrastructure and actually the presentations that you'll see later, show you know how Atlas keeps developing and improving and it's worth us getting on that bandwagon.
We have proof of concept code, so thanks to Philip, who works in the Atlas team, we have proof of concept code that it seems to show that our work and the work of University of Lufbra, it can actually happen. We sort of need and API to be able to you know, create our measurements, and associated with various SSIDs, authenticate, optionally prefer measurements. The measurements are of less interest to us than the actual association to the wireless networks because actually, seeing whether the authentication works is important to us. And then the Eduroam ops community has to work on the reporting, the orchestration, and the visualisation of our results because we need to correlate some information from other locations.
Also, we have got to work out how we turn on wireless LAN measurements. Is that going to be an opt‑in process? And how do we manage that? How would you prefer to do that you know as hosts yourself or part of the community? Are you willing to have the potential for wireless measurements operate object on your probes? You might be fearful of that. And that's why we don't have the scanning option. We took scanning sort of off the table early. We want to associate with known SSIDs, but we don't want to scan for any wireless networks that are available. So, how best should we do that, and what is the timeline, because there is lots of feature requests for Atlas that people want, and we would like to obviously push our requirements further up the priority list, but we know that there's competing demands and limited resource within the Atlas team, so we want to know if there's value forth wider community and whether you would also be willing to support this. I mean once upon time I presume that the Atlas, or the RIPE membership was very much aligned with the academic community, but over time the commercial providers have increased so the needle has swung away from you know majority academic to ‑‑ well we're not 50% any more, universities aren't growing at the same rate as commercial enterprises. So, I'd appreciate your comments on this and whether we should do. I see people wanting to stand up, either to leave or ask me a question.
AUDIENCE SPEAKER: Andrei: I like this idea of measuring especially at Rome very much because I just ran through an issue that there was some poorly monitored Eduroam on some university that only local accounts worked and no roaming actually, so, yes, it was quite set and this is what it could be, it could be measured by some means like Atlas probes but I have this problem that the probes are usually deployed in some data centres or on some recs that are quite insulated from the wireless environment and there is only few access points in the reachability of this probe, so I think for a proper measurement, you would have to deploy the same number of Atlas points as access point in the network which I think is not the way that Atlas system should be scaled. So this is ‑‑ if you have something, some opinion about this.
BROOK SCHOFIELD: We think that scaling probably to every campus is at least our goal, but obviously the density of campuses, not all campuses run their own network and they might be just connected to an NREN that would buy us, having lots of probes just on one network and obviously it wouldn't necessarily have value to the wider Atlas community. But we think only one probe per controller is needed. So on any campus that has a suitably large setup, you wouldn't need one for every access point. You should be able to verify the operation via the controller. Whether an access point is visible from, you know, from the data centre, that's something to be explored. So, yeah, ideally we'd love to be able to turn on wireless LAN measurements to see if we can spot Eduroam access points that are around, and you know...
AUDIENCE SPEAKER: And second question: About the measurements itself, after the association to an Eduroam network, I'm quite unsure how to do this safely, because now there is some kind of test accounts that are dedicated for testing the functionality of this hierarchy of radio service and these test accounts are deliberately set in the way that after association, you are connected to some black hole VLAN that you don't get any Internet access. So, you would have to, in case you would like to measure some IP measurements on this networks, you will have to fill in some real accounts and in that case, it would, by my opinion, somehow compromise the security of whole Eduroam as a whole because you would have some accounts that, by using which you could get access to the networks.
BROOK SCHOFIELD: So, we debated this, so we brought together all of the people that had had a probe initiative, and we talked it over for a couple of days before we then invited the RIPE database guys together to see if this was feasible. So we did discuss whether we should have the ability to have a password or not, deploy a password and do a test at all. Because actually, you can get quite a long way just doing association tests and seeing if you get a certificate back from a TTLS site. We decided, yes, you know, that there was some value in doing full connectivity, seeing whether the network that gets provisions, that supports v4 or v6 or both and how, whether one is NATTed, you know, whether v4 is NATTed or native. And within Europe, the Eduroam confederation has port requirements. So whether we could test for the functionality of the service.
I do not think that many people have test accounts that do black hole you. That is not universal and certainly a test account, because it will work in other locations, won't always be the case. Maybe ‑‑
AUDIENCE SPEAKER: It may be a local policy of Sessnet.
BROOK SCHOFIELD: The downside is people could use their own personal account, which could also be bad but how can we universally stop people doing the wrong thing and we think that there is at least some value in you know deploying a password for association. So, it's a decision on whether we want to do that or not and we think that there is wholly good reasons for doing it but we know that people can get it wrong.
DAVE WILSON: I work for an organisations that both a member of the RIPE NCC and a partner in Geant. And I just want to say I really support this work, not least because hopefully it will save us paying for it twice but that's not the only reason. I think there is a lot of work we have been doing in both our communities for a long time that I thought each community could benefit from the other. And this is great example of that. In particular, I think this is exactly the sort of thing that Atlas should be doing. And I am delighted to see that we're able to support it financially as well as other ways, so I really hope there is going to be a lot more corporation like this. Thank you.
AUDIENCE SPEAKER: Daniel Karrenberg. Christian asked me whether the NCC has an opinion on this. Well, we certainly, I'm looking at the experts here, we certainly can do it, we did some studies already on whether we can use the hardware that's in the v3 probes to do this. It's all a question of priorities and I think that's something where this Working Group is actually the place. We have Vesna and others have put a lot of work in the road map, that outlines what the things are that are requested, and what I understand and what my own experience is that to engage on the road map and get opinions on what should be done first and what should be done at a later stage, or prioritization of things and so on, is quite difficult. You know, when ‑‑ and so, if it can be done, but it needs to be ‑‑ we need to know what priority it has in relation to the other things. And if we don't hear from you, obviously we'll do what we think is best. Let me say that as clearly as I can. But please do look at the road map. We are going to put it in there and it's already in there, as far as I know. And let us know what you think more important. Personally I think it's a very good example on how we can do synergies. The idea of Atlas was really the underlying idea is to have a common measurement platform rather than everybody rolling their own and I really like the idea of a community coming to us and saying, hey, can we do something together. Because as brook has already said, it will give us more coverage for RIPE Atlas. So there is synergy here.
AUDIENCE SPEAKER: Calfage RIPE NCC. Just to follow up on what Daniel said, to have a more concrete plan, my suggestion is we'll work closely with and we'll prepare a proposal to the committee, we'll send to the Working Group how it can do it and possible plus the questions. After that the ball is in the Working Group's court and we'll ask the chairs to let us know if there is consensus or not and after that we'll provide a road map on all this stuff. So basically I think in a few weeks time, we will send basically an implementation proposal to the Working Group.
AUDIENCE SPEAKER: Thank you for this inspiring idea, and as a first step I would like to point out that maybe we should remembering about giving the possibility opt‑in or opt out as you presented it on one of the slides before not to limit the scope of the study but some particular cases like we may have the Atlas probe in part of our data centre, that is more lab like, saturated with SSIDs and so on that we don't like to disclose to the world, so then let us opt out
BROOK SCHOFIELD: It was not the case that we would scan your SSID, so you would have to do a test to determine whether the SSID was present. Because Atlas is an active, as we were reminded by the Atlas team, it's an active measurement network so we're not going to scan for things on your network in the same way that the Atlas probes don't disclose information about your local network set up.
DANIEL KARRENBERG: Let me say very clearly, we will not turn on the wifi hardware without the consent of the probe host, period. So the probe host will have to say it's okay to turn on the wifi hardware and even when we turn on the wifi hardware I'm quite sure that the plan will be like brook ‑‑ I'm quite sure that we're not going to scan or try to sniff around.
BROOK SCHOFIELD: I just have one more comment before we finish. Because the wifi measurements popped up on the road map before I asked for them. So, potentially, someone might have had a different view of what they wanted to get out of wireless measurements. So if you had a different idea, remember to contribute to the discussion on the Working Group mailing list because otherwise, as Daniel said, we're going down a path that maybe you don't want us to, or there might be a parallel effort that we're required. So thank you.
CHAIR: Thanks a lot brook.
Next one is Emile and he will talk about open IP map and an update.
EMILE ABEN: Thanks, and this is going to be a short update on open IP map which was presented at earlier RIPE meetings already.
A briefly recap of what it is. An open and crowdsourced mapping of IPs and host names to geographic locations. But it's different from other geoIP databases because there is an emphasis on infrastructure acted not on eyeballs. The eyeballs are covered by MaxMind or others, they do pretty well on that, they don't do so well on infrastructure that's what we're trying to improve. And open and crowd source so think of this as the open street map or the Wikipedia for IP addresses so you just add extra information to an IP address.
Example: This IP address, if you see the IP address you probably don't know where it is. If you run like some of the free edge geolocation sluices on this, it says it's in the US. Well fine. With open IP map there was actually somebody who actually took time to see this in a trace route in the context of a trace route and of course the host name also gives it away. It probably was in a trace, in a context where you could actually see multiple things around London, so it was pretty clear to somebody that this thing was in London so this is a data entry in open IP map.
And why this is useful is because it allows you to put traceroutes on a map. We have lots of traceroutes in RIPE Atlas, so, being able to shrink down on that and just show the interesting ones, the ones ‑‑ the odd ones out is really interesting and what's shown below here is actually an example of that. That's two dozen probes in Ireland to a destination in Ireland and you can actually see the green dots are the crowdsourced geolocation information in these traceroutes. You can actually see some of the odd traceroutes outgoing to places where you don't expect them to go maybe.
You can actually click on these in the interactive version and get the actual trace route out so you can see if the inferred information is correct and you can correct it at the same time. And there is also useful if you do lots of traceroutes, for instance, from things in a country to other things in the same country, to see what percentage of paths go outside the country. We have been experimenting a little with that. But I don't have time to go into that.
So, data status. We have crowd source data now. We have a total of over 19,000 host names to location mappings that were contributed by an army of 18 dedicated people, and that was our volunteer that helped test the service and many thanks for that.
And this allows you to already put lots of traceroutes on a map and show key locations in these traceroutes. The URL for the prototype and I am a chemist by training and it may show up in the prototype because it might blow up in your face, it's a prototype.
It uses RIPE access for user authentication. If you use this and you put in an Atlas measurement ID and it will just visualise your traceroutes, I advise to you look at the help because it might not be as self explanatory as I want it to be. On the data part, we also have a collaboration with the IX map projects on the data exchange.
On the code: Open data, open code, the prototype code is on GitHub, that URL is sort of the collection of two sub projects. It's the actual open IP map code with the Python and a provisions part where you can actually create a development instance of this using vagrants, and virtual box so if you want to play around with, it tinker with it, it imports all the data and you don't have to deal with all the complexities of postgres and post gits and all the extensions and stuff like this.
If you want to contribute, as I said the current version can blow up in your face. I'm not an UI developer. So, that's where help is really needed because that will lower the barrier to entry for ‑‑ lower the barrier for entry of information in this system.
So, open data. You can build all this data. What it currently contains. Is this. Where a user just makes at the station saying at this time, I saw this in a trace route, this host name or IP address in a trace route and I think it's this location. So it could be multiple things per user or per host name to make it ‑‑ so, you are on your own to make sense of that data if you do the bulk download. There is also query API if you just want to have the information for a single IP address that is in open IP map. It has sort of crowdsourced information listed by time stamp. It also has suggestions which can be pretty accurate if you are using full city names or silly codes in your host names, it can be wildly inaccurate. There is also a little programm I can query interface, and that's also on GitHub.
On the community status of this is that initially we had this small army of volunteers and now we have a beta of the RIPE Atlas measurements interface. There was an old measurements interface, there was an e‑mailing out to the MAT Working Group mailing list, that everybody here has subscribed to here I think. So, there you have the URL for the new beta of the measurements interface, if you do a public trace route, a publicly available trace route there is actually a tab, as you can see here, that has a link to the open IP map prototype so you don't have to first ‑‑ this is maybe a more convenient way to get to that data.
There is a mailing list, if you are really interested in this subject and want to discuss the features, project updates. So for instance if you use query interfaces like I showed on the last slide, if these changed that's probably something that's announced here.
Some things we want to focus on, which are not there yet, are RTT based constraints and validation. So if you have a source in London and you have an IP address that is 1 millisecond away you are pretty sure that's in London, right. So, capturing that type of information, maybe doing some slightly more difficult things in triangulation, that's definitely something we want to add. And challenges, there are incorrect probe location and if you have probe that has a high last mile or first mile whichever way you look at it, latency, this won't work.
Last thing is the bulk data entry and capturing naming schemes. There is some tools for bulk upload. But having gone down the path yet of capturing naming schemes. For one, there is a project called CAIDA D deck, URL is there, these people are also on the mailing list and this is ‑‑ it is an attempt to create a structure for capturing naming schemes and one challenge here is to actually keep the data quality high. Right now what's in the database is individual users doing stuff on individual IP addresses. So if a single thing is wrong, that's no problem, but here, if you do a matching and stuff, if something is wrong that can sort of ruin your data quality and we don't want that, so...
And the problem there is that actually for the people that do all the provisions of reverse DNS, their geolocation information in reverse DNS is usually pretty spot on. For the people who do it manually, we see a lot of errors, so that's why this is a bit of a challenge.
So, that's it. I invite people to use the data, help crowd source, subscribe. So, if people have any questions, please let me know.
AUDIENCE SPEAKER: I would like to actually add a question to these, because right now it's basically a project done in part of Emile's free time and as you saw a lot of contributions from the community. The design of the project is, it is based on contribution for the location data but for development and all the stuff, we have basically only used part of Emile's time because it wasn't full on project, we introduced it in the last two RIPE meetings, there was some support. But I would like to ask the committee either here or please indicate on the list if you think this project is useful and you want to see RIPE NCC putting more development resources in that, please let us know and that's what we can do. Thank you.
CHAIR: Thanks. Apparently someone from the chat, very good.
AUDIENCE SPEAKER: Philip from the RIPE NCC I have a question from the chat. Alison Wheeler from the Creative Organisation. It's possible to use lock type in the DNS records for a domain which is user created. Has thought been given to getting users themselves to crowdsource where they are, where a dynamic DNS would quickly identify itself?
EMILE ABEN: For DNS LOC there is stuff in the database that does look ups for DNS LOC records. The graphical interface exposes that. But for like self identification, that's sort of like more on the edge side of things and that's not where this project is focused on. So, I hope that answers the question.
CHAIR: Thanks a lot. Vesna.
AUDIENCE SPEAKER: I don't know if I missed it, but tomorrow morning there will be a breakfast for the community who wants to join in this and have some more information and more chat. So here in this hotel, eight o'clock, breakfast meeting.
CHAIR: I guess we get another e‑mail to the list which states the local time. Thanks a lot Emile.
What would be an MAT session without Vesna, the community builder? So apparently Vesna has uploaded 40 slides and she has ten minutes.
VESNA MANOJLIVIC: Well, since a picture says more than 1,000 words, it should be enough since these slides are only pictures this time. I love data visualisations, and I'm a collector and a curator for data visualisations. Unfortunately, I'm not a programmer any more nor a designer, but I'm also community builder for measurements tools, so that's why I'm talking to you, to the community today, on behalf of the programmers and designers in the RIPE Atlas team to show you, first of all, what are the new data visualisations, to remind you of some old ones and to invite you to join us and you'll see in the end how can you do that.
So, we have three types of data visualisations, I tried to illustrate it here as a geographical ones or the map‑based ones. A time=based visualisations and finally statistics or info graphics.
They will be marked with a little sign on the top and this is the most famous one, so just one sentence, summary and reminder of what RIPE Atlas is, the largest open active measurements network. This is where the hardware probes are located. The name Atlas was supposed to show that this is going to be a collection of maps. So, on the main results page, you get the collection of maps. And this is one of them. This is the B route name server which is not Anycasts. So on the colour scheme you can actually see where it is geographically located. And this is also an illustration of how RTT ping map is going to look for your ping measurement. It's just not going to have all these dots. It's going to be up to 500 dots for now, but since they are so coloured coordinated, then you can immediately see the results of your measurement.
Another map shows the location of RIPE Atlas anchors. We have 80 as of today. Emile just covered this, yet another map powered by RIPE Atlas, not directly RIPE Atlas hosted, and you have all the links in the slides that you can download.
And another one created by another community, so a lot of hacker spaces actually host hepatitis probe and they have made their own map and so if you have a probe in a hacker space, which I know some of you that are here actually do, you can mark it and it's going to appear on this map.
So, that was a sub selection. There is a little bit more in the additional slides.
Now, the coolest new feature is the data streaming. So, since I'm an experienced presenter, I will not go into the live data and the live demos here. So this is just a screen‑shot, but it's actually beautiful moving picture that shows currently the DNS measurements as they come in. And so, all these little dots, different colours of them, different sizes, represent the result of one specific measurement and they are arranged in these rows and illustrate the small multiples principle of the data visualisation actually popularised by Adver Tuta and it's a great way to show the streaming data. So this is one of the newest features. Please read more in the article and we are working on enabling this for other data sets.
A good oldey. A so‑called seismograph. So the stacked ping graph. And it has been used by other people to show the events in their network and to diagnose problems in their networks, and, again, it's a very powerful visualisation that allows you to show a lot of details and choose for a lot of options.
Another old one, the trace route, used for IPv6 testing. I have to speed up. This is a prototype. It's something that Emile is also working on. So it actually shows for the peers on the Internet Exchange are the paths between them going through IXP or not so you can actually optimise your peering if you find yourself in one of these yellow spots, you know who you have to have a beer with afterwards, and talk about it.
So, some people tweet about their RIPE Atlas visualisations, this one is stolen from Twitter and a person who is in this room currently, waving he might tell you more tomorrow in the lightning talks, so it's a CloudFlare and they also log RIPE Atlas.
Researchers are also using it and this is another work, people who explore the medling of a Turkish Government with some DNS services and BGP and so on. You can see that using RIPE Atlas.
So, finally, some statistics and pretty pictures. So this is the growth ‑‑ graph of the RIPE Atlas anchors, and this is somebody else who made their own visualisation of RIPE Atlas anchors, because they were really happy to take part in the project, but I guess they didn't like our logo so they made their own. It's really nice. Thanks.
The poster that you can see hanging here recollect it's info graphics and technically shows the boring statistics in a very pretty way.
And statistics from GitHub. So there is a vibrant community of programmers that are contributing their code to GitHub, so please join them and at the next RIPE meeting your name will be here.
So this translated from the info graphics language marines, please help to add my favourite visualisations which are based on the longer time periods as in calendar days, and the moving pictures. Okay. Here are some of them, and you can find the links to all of them later. For example, my favourite one with the effect of the earthquake to the sleeping patterns of people in California depending on how far away from the epi centre they were. And then there is the plain finder, the economics data, how awake or asleep the Internet is. So, all of these could actually be examples for the visualisations of the RIPE Atlas and I hope you can help in making this happen, so, the exclusive RIPE meeting pre‑announcement is that next year, 2015, early January or March, we are going to organise data viz hack‑a‑thon on this and so you can take part in that if you are a programmer, designer, or data scientist and create a little team of people and register with me or through the mailing list, let us know, or you can join the sponsors, we already have one sponsor who promised that they will provide the monetary prices for this hack an on this and if you are an operator and don't have time to actually pay somebody to code on the applications of RIPE Atlas data for you, you can help by joining the hack‑a‑thon and do it with the international community that is going to share the expertise and their data of course. All the software is going to be given back to the community so it's going to be published with free software licences and data sets are also going to be open. So the main data set will be RIPE Atlas but we will combine it with other open data sets.
CHAIR: Thanks a lot. I think we should start publishing a book or so. Question? Apparently cafe has a lot of questions today.
AUDIENCE SPEAKER: It's not a question, actually because one of the visualisations Vesna showed about the streaming. It's cool here, this one that you see, at AMS‑IX we have nodes in AMS‑IX, they had a planned outage, or port numbering, so K‑root was going out of, went out the nodes in AMS‑IX went out of service that end they came back. When they came back we recorded the movie from this new streaming feature, and if we can play it you can see the nodes. You see on top you have the AMS‑IX nodes, it's very hard to read ‑‑ so, small lines start up here, so you see the clients from other nodes are being routed back to AMS‑IX which is coming up slightly, slowly, and this trend continues until the K‑root node in AMS‑IX is operational again. This is just one of the examples of the new Atlas's streaming feature which is available through API and we are adding new visualisations on that API.
AUDIENCE SPEAKER: Hi. George Michaelson from APNIC. I recall at the launch discussion of Atlas there being a quite nice image of the view of the world from space highlighting the way that you can detect continents and economies because of the amount of lights and a kind of general suggestion that a goal would be to get penetration to a level where the Atlas probes had the same behaviour. And the zoom on the world map that you put up at the start, it wasn't quite there, because of the size of the eye confor UI. But, I actually think you have achieved in Europe. I think you have done it.
VESNA MANOJLIVIC: Thank you. Daniel?
DANIEL KARRENBERG: I think at the next RIPE meeting, we'll do that. I have done actually the same map that Vesna showed with smaller yellow dots, and it almost ‑‑ it almost looks like the original ones and if we wait for the next RIPE meeting, I think it will be even better and I promise to show them then. I just didn't, for other reasons I couldn't complete them.
CHAIR: Thanks. The gentleman in the middle.
AUDIENCE SPEAKER: Thomas King, DECIX, I want to say thank you for the great work and you showed the IXP tool already I think Emile showed it, I just want to stress that we are highly interested in this, and it would be really cool if you can open source it so that we can work on that. Thanks.
AUDIENCE SPEAKER: Martin Levy, CloudFlare. I love this stuff. I want to not really address Vesna, I want to address the rest of the audience, because you have just seen a bunch of cool graphs and maps and artisticically interesting stuff. If you have got something that you look like ‑‑ that looks like you could use that visualises into Atlas, give that suggestion into the Atlas team, into Vesna and the like. I get my inspiration from seeing really cool stuff out there that just really has nothing to do with the Internet, but in some way gives you that wonderful artistical buzz that you say, hey this is cool. I think that the example of the view from space that George talked about is just one of them. But my comment is just, guys, bring some stuff in, let's see what we can do for the next meeting that looks just visually mind blowing from this data. It's good stuff.
CHAIR: Thanks Martin. And thanks Vesna.
So, the next presenter. We decided to do something specific and try ‑‑ that's always fun because there are so many things which could potentially go wrong. One more point: As we are the MAT Working Group, I had a word with Nick and inspired him to put rating buttons behind the Working Group presentations. So we are the first one, so as we believe in numbers and measurements, we're we go. Ideally you would fill out these little survey buttons which has two benefits. One is for the presenter, they get a feedback what you thought about them. Not just the ones which catch them in the hallway. The other one is actually for the Working Group Chairs. In comparison to like the PC or other people, we get some feedback, what you like and what you want to see. But ideally we want to bring presenters and content which you like. So, by you rating it and actually making comments, we have a better idea what you want to see. And therefore, can address that. So please use the little buttons, it's the first time. It's a bit of an experiment, and then we try to address that.
VAIBHAV BAJPAL: Hello. I am a masters student at Jacobs university Bremen, and I'm here to present our work on measurements within Python which is a joint work done by me and my colleagues under the supervision of our professor, Dr. Schewalder.
Unlike the conventional presentation, we are actually giving a live demo for which we use the Python notebook, which is online available on the Working Group. We have an instruction file which can actually take you to the I‑Python, which you can actually run along with me. So, I basically, will be talking about some of the Python tools that we found useful in our tool to carry out our measurement research. So basically I first tell you how we use the Python tool to hook into the RIPE APIs and get the multi‑dimensional data from them and store the data and then later use the data for analysis and then also show some visualisation techniques that we used.
First, talking about fetching the multidimensional data, we first introduced Python requests module which is a tool which you can use in the Python environment to get the data from various APIs. So we first show you a very simple example. In this function, it takes those URI so a JSON and along with the query parameters and in a few lines of code you actually get the JSON data. Here below we show you a simple example, which is a completely unrelated example to our research. This shows you how you can use the request module to get the whole JSON. And now I show you how we have used the request module to hook in the Atlas API. For example we use a function like a phone line code that can give you a count of the total Atlas probes in a few lines of code. If you see the example you can see that currently the RIPE Atlas has around 11 K registered probes. And now moving on, we extend the previous function to actually show how much connected propose RIPE Atlas has at this moment, and we can see that in the example. It is around 7,000 right now so these are just a few lines of code that you can actually use, so these are a few functions, you can also use like you can also use the Python tools to get ‑‑ this is just the metadata from the RIPE Atlas API. So now we want to ‑‑ I show you how we used the RIPE Stat API using the Python modules like the RIPE Stat which was introduced by Vesna in the tutorial on Monday. We can showing how you can use the Python to programme it to get the data from the API. The first example which is a simple one which you just provide an ASN into the function and then the ASN gives you the organisation name, or also the holder name. And the next we show you another function where you actually give an end point and it gives you actually the details of the ASN which announces the matching prefix which belongs to the end point. And it also takes care of both IPv6 and IPv4 as well.
And this is a small function we have used in our research where we needed to have a consistency in our naming representation of ASNs and their names.
Now, like, regarding the data storage, we use the Python module, Pandas, which let's us ‑‑ which let's us formulate API data ‑‑ which let's us tap late the API data, so this is a simple function. So we are using the previously mentioned building block functions to actually show how we get API data and tablets them into a format.
So now we show how we can use this data frame to get the data into a database table using a simple function to SQL. So these are just a few lines of code that gets you the Python the RIPE Atlas API data into a database table. Now to show that the database actually works we actually show you the read function which is also like Panda's read function which gives you the data back from the table using a simple query, which like for example, this is a simple query that we have used.
Now, we also have like, I know I'm running out of time so I have to quickly go through the few modules that we have used. So, since we are researching on the network, we also use this IP address module which actually gives you an IP address object out of the strings that you give. Like whether IPv6 or IPv4. You actually get IP address object which you can ‑‑ which has following Atlas for example whether it's RFC 1918 address or ‑‑ it also tells you like, give it a string and IP object and it tells you the version of the IP.
Now, we would like to show you some of the visualisation techniques that we have used using the Python map, just for example, we have used all the functionses and the Python data frames and combined all of them together. To plots such a graph which actually tells, so what we have done is we have taken ‑‑ we have grouped the RIPE Atlas probe APIs into their respective AS numbers and now we have ranked the ASNs according to the number of probes registered in each, and now we are actually showing the distribution of the probes on the ASNs ‑‑ like on the X axis we actually haven't shown the AS number. It's just the AS rank. We can see like the the top AS which has like, it has more than 350 RIPE Atlas probes connected to it and we see that it's a long tail distribution, because you see the X axis is in long scale and we can see the long tail distribution of the RIPE Atlas probe along the ASNs which host the probes.
And this work is supported by the Leona project and I would like to thank the project and my colleague, and my professor for giving me the opportunity to present at such a nice meeting. Thanks a lot.
CHAIR: I hope the time hasn't stressed you too much. We have the good behaviour that we always overrun. So don't worry about the timing part. Any questions and comments to the project and the Python? I found it quite interesting to see that you can do very much with very little lines, assuming you can programme which I'm not very good at. At least I have some snippets from you which I can use.
SPEAKER: And we would also be happy if you have any suggestions on any Python modules which we can use in our research.
CHAIR: Good. No questions? Well then, thanks a lot.
That brings us to the last agenda point, which is AOB. Any other business, comments, besides people want a coffee and running out? Vesna?
VESNA MANOJLIVIC: The workshop tomorrow, there is a workshop on the advanced topics with programming and data creation and data analysis with RIPE Atlas. The workshop was originally was scheduled for Tuesday but it has been moved to Thursday, from 6 to 7.
CHAIR: Thanks a lot.
One reminder again. Please, if you see the proposal for the Working Group Chair election, dismissal, please comment on it. And for the ones who are interested, well either to stand here next to me, or stand here alone and I can sit, please make yourself known so that we are not ending up with just one Working Group Chair. That's very boring. Thanks a lot and see you next time.
LIVE CAPTIONING BY MARY McKEON RMR, CRR, CBC
DOYLE COURT REPORTERS LTD, DUBLIN, IRELAND.