Plenary session

Tuesday, 4th of November, 2014, at 9 a.m.:

SHANE KERR: If we could take our seats. It's 9 o'clock, we can get started here. Great, thank you. So I am Shane, and this is Benno, we are going to be chairing the session this morning. So it should be really interesting. We have got a couple of talks about SDN which is exciting for me which I could never figure out what it's supposed to be. I am looking forward to those. But first we have a talk from Grzegorz talking about data acquisition, so this comes from CERN.

GRZEGORZ JERECZEK: Good morning, thank you for coming here so early, I would like to talk to you about data acquisition networks for large experiments and by large experiments I mean the large volume collider at CERN in Geneva. So I would like to give a feeling of what the problems typical for these networks are and how can you deal and how can you live with that.

So me personally, I am a PhD student, I am working in the IPs project which is cooperation between Intel and CERN but also many other companies are involved, as you can see from all the others down there. So the main idea behind this project is to have industry, research and academia cooperate for next generation data acquisition systems. And my part of the job is to look for the solutions for that acquisition networks.

So I would like to start my presentation with a number, and this is 600 terabit per second. This is the data rate that will be produced by one of the detecters in the LHC and it's called Atlas. So please forgive me, for these 30 minutes, please forget about the RIPE Atlas, that is totally different thing. So I am talking here about the Atlas detector which consists of million of sensors and produces lots of data. So, the physics particles collide and they create really a lot of data which needs to be processed by the data acquisition systems.

So, in general, when you talk about data acquisition, so all the system ‑‑ all the systems are really simple, the main idea is very simple, so you have a kind of a sensor, for example, when you are reading out a thermometer, you do some kind of digital to analog conversion, some processing with that and you save your experimental data somewhere, it it can be a sheet of paper or a disc. What is different in case of the LHC detectors is the size and the number of channels that it has. It has literally million of channels inside of that and you need to read this out with the frequency of 40 megahertz. Obviously, you cannot save all this data to storage so you need to do some tricks, you need to do some filtering and choose only the interesting parts of the data and only this part of data will be then saved to storage. So the whole chain becomes very, very complex. There is a general desire to make as many as possible ‑‑ as many components as possible of this chain out of commodity hardware because it makes the life of physicists easier and the configuration easier and the costs go down. So there is a general question: What can can be done with commodity hardware in this kind of a system? So, for me, this question is what can be done with commodity networking equipment in this kind of system. So I am really trying to make TCP IP and ethernet friendly to the data acquisition systems.

So, when you look now at the data flow of a typical LHC experiment, so this is the example of this Atlas detector. So we start from the upper side ‑‑ from the top of the slide with the detector, which is our data source. So the first filtering stage when you throw out some of the data that is not really interesting for you, so this is done with the custom hardware, so this is so‑called level 1 filtering stage, which is done with hardware that is made by CERN, actually. So the first part, which is actually done by commodity hardware, is the read out system. And this is actually a set of servers, commodity servers which are used basically as buffers, so you store temporarily all the data that is coming from custom hardware and then you push it through the data acquisition network to a huge filtering farm which consists of a number of servers ‑ and you do second stage filtering and you get rid of next portion of your data. What is left is then saved to permanent storage.

When you look at the numbers now, so in the read out system you have about 100 servers for buffering of the data. And you move this data over the TCP IP network to the filtering farm, which consists of about 50 racks, which is about 2000 commodity servers.

Looking at the rates, so we start at the detector rate of 40 megahertz, which goes down to about 100 kHz, which was filtered out by custom hardware. What goes to the permanent storage is only one kilo hertz. A single snapshot which is recorded with this frequency of 40 megahertz, which is produced, actually, with this frequency of 40 MHz, is about two megabytes. The algorithms inside of the filtering farm need on average about 50% of this data to make a decision whether this event, this snapshot is interesting or not, so it decides whether to push it to the storage or drop it. So when you take the numbers together, you can see that the bandwidth of the data acquisition network is about one terabits per second.

So what is the critical point about networking in this kind of system: The main factor which you care about is the latency, and it's not really the absolute value of it; of course, it should be kept in reasonable change and it should be low but more critical is the jitter, so you want very predictable latency so that you know how long you need to wait for the data to arrive from this read out sub system at your filtering units. So that you don't waste of the CPU resources.

So when you now look at the latency, in the function of the data fragments that you request from these read out buffers you can see that in this range from zero to about 500, the latency is really low and you have ‑‑ is really predictable so you have really no jitter. But then something strange happens, so after this 500 limit the latency starts to increase very rapidly and you have a lot of jitter and this is what you don't really like. The reason for that is that the buffer ‑‑ buffers in the switches start to overflow and you start to drop packets, your network on average it should work but due to the data it's not able to accommodate any bursts of the data in your network.

But in theory, TCP should actually work, so TCP have ‑‑ has a recovery mechanism to deal with this kind of situation so when you start losing packet it should adapt the sending rate to work in this network, in this particular network, but it's not the case in data acquisition systems because there is something very specific; it's the many‑to‑one communication pattern. So when you ask for data from one of these filtering units in your filtering farm you need to get the response from all of the, lets say here, 100 ROS servers and what happens when you lose some packets of these flows is that TCP is not able to recover any more because there is no more data flowing in this network any more, so TCP cannot trigger any faster transmissions; it just needs to wait for the 200 millisecond TCP timeout. And actually your CPU is doing nothing in that time, it's just waiting for this 200 millisecond timeout to occur. Your throughput of your network is dramatically decreased.

So, in this plot you can now see the fraction of the snapshot of the events that suffer from at least one TCP timeout during the data acquisition. So you can see that after this 500 limit you quickly go to the situation that every single snapshot that is acquired in your system suffers from at least one TCP timeout, and this makes your system not working at all, actually.

This problem already has its definition, actually, and its name so this is the TCP incast problem and and it's already known in the data centre for example when you have distributed storage systems and you acquired data from ‑‑ so you ask for files which are distributed across many servers. And there is actually a very simple equation which can give you a feeling about when you will start having problems with your network, so when the incast problem will start to occur, so when you take into account how many bytes you inject into your network, so you can describe this as a sum of all the TCP Windows of flows that you have in your network so this is the term on the right side of this equation and if this number is higher than the BDP of the network, so the sum and the buffer size that you have, then the problems will probably start to occur. The BDP is the bandwidth delay product of the network so it gives you the minimum amount of bytes that you need to inject into your network in order to keep its links busy all the time and the access above the BDP needs to be buffered somewhere. So the general precondition for this incast thing to occur is very large number of flows, which are synchronised and are relatively small.

When you take, now, as an example what is really happening in the Atlas data acquisition network so the only 200 micro seconds and the resulting BDP from this is then about 17 TCP segments. It means that you have ‑‑ if you consider only one guy in the filtering farm asking for data from all of the 100 read out buffers, you have a total of 100 TCP flows, and if each of the flows had a window of only one TCP segment, it would mean that you will exceed the BDP of your network by a factor of five. So all of this excess must be buffered somewhere. And this is only one TCP ‑‑ this is only one filtering unit, consider that you have 2000 of these units in the filtering farm.

When you look at the distribution of the latency, so I am working again actually the same data all the time, so you would expect that you would have a latency around 16 milliseconds for this amount of data. And you would ‑‑ what you would like to have is a very smooth peak actually around the 16 milliseconds going up to one straight to the top at once, so you don't have any jitter. But what is happening is that you have actually no events that were recorded with this latency; you have first events that are at least 200 milliseconds, we have then more with 400 milliseconds, at most of the events were acquired with at least 600 milliseconds latency and this is all caused by the TCP time outs, so this shows that the default condition control of it. TCP is not able to to work optimally in this kind of environment.

So, what can you do about this? So, you need to look again at the simple equation. So from then you can see that you have generally three ways how to deal with that. So, you can increase the BDP of your network by simply increasing the link speeds but it's really expensive and you could actually use that to increase the average rate of the whole system instead of dealing just with incast.

The second thing is to increase the buffering capabilities of your network, it's again expensive but it's one of the ways that is actually now being used at the experiments, so normally they buy networking hardware with very large buffers.

And the third one is the term on the right side of the equation, so you can keep this global window, so the sum of all of your TCP Windows in your system under control, and you can do that at the link layer with data centre bridging, at the transport layer, with different TCP that are designed for incast or you can design your own solution at the application layer.

So what is done in the ATLAS experiment is so‑called traffic shaping at the application layer. So it's a very simple algorithm which assigns a pre‑defined quarter of traffic shaping credits to each of the filtering units of your farm, so each of these guys is not able ‑‑ is not allowed to ask for more data than the number of credits that were assigned to him. So you can now look for the optimum operating range of the number of credits that you should assign to each of the guys in the filtering units ‑‑ the filtering farm.

On this diagram you can see that from about 100 to 500 credits is the optimum operating range in this network. So above that we again, we are at the same limit, 500, so it's about 500 kilobytes, which is the purple memory of the switches we have at data acquisition network. So above this limit we start to have very bad latency with very high jitter.

So now you can take a look at the distribution. So with traffic shaping you have this expected distribution, so all the events are acquired with the latency of around 16 milliseconds and it goes straight up ‑‑ it goes straight up to one. So this is what you really want your system to have.

So you get rid of all the transmissions in your system

Another option is to do something at the transport layer. So there are many proposals for modified TCP congestion control algorithms, but actually, in case of the data acquisition it can be done even simpler, so basically what I tried to do is to turn off the TCP congestion control algorithm at all so I switched off the slow start phase and so on and I assigned a static TCP congestion window at the sending side. So it means that each of the flows in the network was not allowed more than the CWND of bytes into the network before receiving the corresponding acknowledgments.

With this kind of approach, we can actually make the whole system work optimally so you need to make sure that the sum of all of your Windows, so in this case it's a static number, which is a multiplication of the number of flows that you have in your system, times the static congestion window. It needs to be higher than the BDP to have all your links busy in your network so that it it's fully utilised on all the time. All the other side you want to prevent from incast and buffer overflow so you need to keep it lower than the sum of the BDP and the buffer size in your network.

And you actually don't need any programme to do that so it's a very simple loadable panel module in Linux, you define your value and it works like that.

If you look now at the distribution, so this performs actually pretty the same like the solution at the application layer, like traffic shaping. So both of them actually are performing very well and in both cases you get rid of the TCP transmission time outs.

So, this all, until now this all works very fine but I was actually analysing this from point of view of a single filtering unit in the filtering farm so actually we had only one machine working in the system because the bottleneck was the top of rack switch in the farm but when you now consider that your farm is actually consists of 2000 machines, you need to make sure that you have enough buffering in the core of your network. So in this plot you can see what happens with the latency in the fraction of the buffer sizes that you have in the network. So you can see that it drops and eventually, after about 100 kilobytes, you start not to have any other transmissions more in your latencies predictable.

What I mean here is that in case of data acquisition systems is ‑‑ it will be always about buffering because even with the perfect solution for congestion control you would move your buffer pressure from the network to this read out system, you will need more buffering in there, so generally in the actual you will always be talking about buffering.

The question is ‑‑ the also the simplest solution so if you buy network hardware with lot of buffers it will work. If it's able to sustain the burstness of the traffic it will work, so this is the simplest solution but it's the least gullible and the most expensive. So the question is, if you can do something about that. So, can you change the way you work with this buffer? So you want to stop buying the very expensive hardware with a lot of buffers inside because it's really, really expensive. So the part I want to follow in my PhD is answering the question: If you can change the expensive core routers with commodity servers and the commodity servers you can then equip with multiple network ports and you can put in there as much memory as you wish, actually. And this thing is now becoming possible because with the SDN and NFE trans the amount of software base packet processing is now being busted. Very good example here is the Intel DPDK, allows very fast processing on commodity processors in user land. With this kind of approach you could actually design a solution that will be really tailored for the needs of the data acquisition, so it can provide as much buffering as you want. You can think of actually many different things that could work in this kind of system.

The initial answer to this question is on this slide, so I created a prototype, so I have a commodity server with a dual socket XEON processor and I have 12gigabit ethernet ports in there and with the DPDK based very simple IP forwarding application I was able to reach 120 gigabits per second so the maximum that I can reach with this number of ports and actually I did not have to do any kind of tuning, so this actually worked pretty well. And especially when you compare that to the performance of the typical kernel bridge in Linux, you can see that with this DPDK without any issues you get the rate that you can achieve.

So, this was actually my last slide. So thank you very much for your attention and do you have any comments or questions?

SHANE KERR: So I have a quick question. So, you are pushing your data processing into user space, right? Do you think that is eventually going to be a general solution to networking or do you think it's going to stay in the areas of specialised needs like data acquisition?

GRZEGORZ JERECZEK: So I am not very good in predicting the future, but from the trends that I observed right now, I do a lot of reading so I can see that there are actually many solutions that are going into this directions, so there are companies that are doing traffic generation on ‑‑ with DPDK so they are doing networking in user space. There are many projects trying to make either software switches or actually first of all, the SDN trends boosting this thing so you can see a lot of network functional filterisation going on with DPDK in user space so I think this trend is really getting popular and I think that in some years that can get really, so maybe not the standard, but there is a good chance that the it will get really attention.

BENNO OVEREINDER: Hi. NLnet Labs. So, in the past, there has also been some research if elephant flows network 2 project and how does your work differ or similar to these elephant flows?

GRZEGORZ JERECZEK: So it's actually different. So the thing about data acquisition is that your system is very steady, and it's steady for months, yeah, so you deal all the time with the same kind of flows, and because of that, you can actually design the systems statically with the TCP thing. So my solution would not be good for dealing with incasting data centre, for example, when you have a kind of ‑‑ when you have a mix of flows with elephant, my sFlows and so on, because with huge buffering capabilities you end up with very high latencies of your ‑‑ for example. So, this is really a different kind of a system.

BENNO OVEREINDER: Thank you. Thank you again.

GRZEGORZ JERECZEK: Thank you very much.

BENNO OVEREINDER: So, the next speaker is Giacomo Bernardi CTO at NGI in innovative Italian ISP and he will tell something about SDN for real. So they will use it in their networks.

GIACOMO BERNARDI: Hello everyone. Thank you for hosting me, I am from NGI which you may not know but it's a fixed wireless access carrier. We are covering at the moment north and central Italy with our fixed wireless solution, offering services up a gigabit point‑to‑point and 10 meg bits multi‑point to 130,000 customers with quite dense population in the centre and north of Italy. We are doing quite well, we are growing of about 4% on the store base per month, and about 5 to 10% as a number of tower, radio towers we install each month and we have been following this pace for quite a while. When we drew the topology of our network it looks something similar to this so it's quite a mess. It's a very dense highly meshed topology and I hope you are all signature because the scary news here is that our network composed of 1,500 switches is purely L2 domain at this stage. And just to make it clear indeed it is a single broadcast and PVST plus domain. So we cannot really blame anyone or anything for this choice because indeed it wasn't even a choice; it is due to the grow rate we had so far. We really kept installing new towers, we already add customer, thousands of customers each month and we ended up in a situation like this and now it's really complicated from a network engineering point of view to evolve this network without disrupting services but it is also very clear to me and my team if we don't take an action soon, we will soon be in a disaster situation that we won't be able to get out of.

So our network has a couple of very peculiar features. The first is that virtually all of the back haul you saw in is wireless. We have very few very long distance fibre links ‑‑ but all the rest is high capacity point‑to‑point wireless links. And because of the high volume of hardware we buy from wireless vendors and because of the discount we get from government licence, frequency licensing, the incremental cost for us of introducing new wireless link is quite low. So we can do what we call opportunistic wireless planning. So basically in our network many times when two radio towers are in line of sight and reasonably close to each other we just connect them together creating a new recording link and by following this strategy for years we ended up with a mesh factor, a link to radio tower factor of around 2.1 meaning each tower has on average 2.1 back hauling on the network. 27 hops we are talking about a Layer 2 switch, Layer 2 network. And because of this architecture we currently have we are still very inefficiently exploiting the redundancy we have in the network, it's very hard to do balancing or even impossible.

So a couple of years ago we summoned the vendors and here our legal office asked me not to quote them, to ask them, so what are you selling us, what do you want to us buy, how do you want us to evolve network? Their proposal was ‑‑ SPF or with some sort of network automation to provide redundancy in case of a back hauling link. This was the main plan with a few changes from one vendor to the other. We felt it was really a sub optimal in our scenario because we couldn't exploit multi‑load ‑‑ multipath load balancing and indeed in our network as I showed you before we have a lot of multipath. Also it doesn't really scale that much because if you consider, would you either do it as a single area but you cannot do that with 10,000s of routers in a OSPF area 0, but then we wouldn't be able to provide point‑to‑point ethernet services between different regions. So it was a very disappointing moment a year ago and we got a bit lost, to be honest. And me and my my colleague, we started wondering about networking, about how to engineer things, about the sense of life in the universe. Then we come back to earth and we realised that someone else did something similar to what we had in mind, the open computer project is brought forward by Facebook and others in order to do two things: The first is to acknowledge that computing is becoming a commodity; and the second is range of engineering computing for specific needs in order to increment efficiency, to design it explicitly for efficiency and stability. And when you read around you see even smaller start‑up than ours that designed their own solution in their own domains. Starting from the routes, starting from the hardware going back on the software and on implementation. We started wondering is that really the way we want to go? We all love networking in this room, I hope, but we need to acknowledge that many applications of networking are becoming commodity, so as the talk before me was also pointing out commodity is the way forward, at least for many applications.

So after we ‑‑ after a few unsuccessful attempts we finally managed to partner with OEM and have this own tower router built for us in a few thousand units. The first thing we noticed it's blue, but when you open the box it is basically a solution based on our needs. We decided to go with a very exotic CPU architecture which is tile GX which has been recently bought by a company called EzChip. It has very particular feature: First of all 72 cores. It has 4 DDR memory controllers, so it doesn't create bottleneck on the RAM access and it has a large amount of Layer 2 and layer 1 cache on chip. So we could wire up some storage, which you will see later on how it is used, and quite a large amount of RAM. As I said, using different memory controllers and different tiers of caching on CP1 chip in order to reduce latency. We added a bunch of hardware monitoring on the PCP and from our early paper analysis we saw 150 watts was roughly the energy we would need to operate this piece of hardware in the worst scenario, but actually when we performed the testing in our lab we saw the typical consumption around 90 watts and this is going to be installed on radio towers which are often in very remote location where every watt is several euros, it costs a lot to provide redundant energy in this location because of the batteries you have to provide and energy cost itself.

So the whole box beside the power specification is designed on our need. It is only for 40 centimetre deep so it can fit in any telco shelter. All the cabling is done in the front. It has dual power supply for redundancy which could be DC or AC. It has, even the tin knee thing we engineered from scratch, the con sell cable is USB and not serial. They usually don't have a serial interface on laptop, so it's a small tweak we could do. For connectivity we have 16 gigabit port in copper, SSFP plus. After a couple of iterations, when we got finally our second prototype and it looked, it was working well, we brought it to a testing facility, we rented I believe it's one of the testing facility in Europe, to do some testing for environmental safety, EMC and IEEE testing for radio frequency immunity. What we did was firing towards our router different patterns and different radio magnetic energy and different frequency and different characteristics, so see whether it would perform well and error free and still on radio towers and radio tower shelters and at the end it performed quite well. We went on for days testing and trying to reproduce every possible condition we find in radio tower on the field and we managed to ensure error‑free operation in EMI up to 180 volt per metre, which is a much higher value than what is required by law and which is around 20 volt per metre. We also brought the device into climate chamber to make sure our own worked nine temperatures between minus 15 and plus 75 cellsious and it did. We even had the device and package tested from some standard ‑‑ for vibration and packaging safety and so apparently at this stage the hardware is fine. There are a few glitches we have to fix right now but we have a platform we can work on, we can develop on. So regarding the software architecture instead, it is obviously based on Linux, this box, and we gathered together a number of software components, open software, which we create the main blocks of our architecture.

The first is open V‑switch which we tested up to a million flow rules which we will see later on for what they are used. So basically being an SDN solution we decide the routing on a layer to architecture from a central controller so each of these boxes will have, will be told how to forward packets to its neighbours. A million rule is a proof of it's much higher than what we will need on the field but we did it to test how it would break from, a scalability point of view. We run a standard Quagga for BGP and OSPF, provide IP addresses to the CPE at the customer location. We use ‑‑ because as I said, most of our back hauling is wireless based we had to in/PHREPLT BFD in order to ensure all the links with bi‑directional because one of the problems we see very often on microwave links is links became uni directional because of the failure of one of the two ends so we had to ensure this was not the case.

And finally, we also with the tell, we ported Node JS to this platform which we used for management and CLI operations.

So all these boxes are connected to our central SDN controller and I am sure you are all aware of the real buzz that is SDN right now. It's very hard to actually understand what is going on and what is the, who to follow, basically, and to be honest, we are not finding it easy, and we have been experimenting with the two main SDN controller projects so far which are open daylight and floodlight, but to be honest, and this is only our personal feeling so far, it may change in the future when this project evolves, we are still questioning their adoption because in our opinion, the real value of an SDN controller is almost inevitably, is the business logic behind it. The value of the SDN controller is not in the housekeeping, in the protocol serialisation, in the topology management; it is in the application layer, and what you decide to controller to do, how you instruct the controller.

So, we found at this stage that according to our development work flow, none of this to controller would fit in easily and would fit in well. So the current strategy we have right now is to develop a lightweight, a very simple and straightforward open OpenFlow 1.3, we already use a lot in Nodejs in our company because even the most basic, the most ‑‑ the application which are taken for granted most of the time in ISPs, like radius, DHCP, syslog, we are actually running our own implementation which is fully integrated with our management system and this took several years to build all this intellectual property and framework which we believe works very well. So this SDN controller would be a part of that architecture.

And one ‑‑ I want to give you one example of an application which we believe would create value to this architecture we are designing right now. As I said, we operate on almost completely wireless network, where each link has a very low latency in the order of 100s of micro seconds per one directional crossing. But their capacity varies over time because it's a wireless link so during adverse weather condition, the incapacity may decrease. And also, if you take another fact, the fact that 75%, roughly three quarter, of the traffic we are carrying to the Internet is delay unaware, so basically doesn't depend ‑‑ the user experience doesn't depend much on the latency, so, for example bulk transfers or http transfers, as, you know, with this application a small increase in latency doesn't really affect customer user experience. I am sure this issue is similar to what you are seeing on your network as well. It's a global trend.

So, what we can do is modify our controller so that in case of an unexpected situation, because of adverse weather in a specific region, we can route the part of traffic which is delay unaware of a slightly longer path and we have many of these paths because of the network topology and the mesh factor I showed you before. So although classification doesn't need to be perfect we can take during this emergency situation part of the bulkage transfer or ‑‑ route it over slightly longer path and maybe giving 10 or 20 millisecond more of latency for that flow to the user, during that emergency situation we can avoid a situation using this tweaking of the controller. We did some simulation using our topology of traffic and we see that we could basically, if we imagine a situation where we don't do any more link upgrades, so we sit down and we don't do any upgrade of the network at all, in this hypothetical situation we can leave six months without any saturation by the traffic although keeping ‑‑ although we didn't ‑‑ a threshold of acceptable dilational latency. So there is value there.

Our core is not at this stage SDN‑aware and I am not even sure it will be any time soon and it's not really a priority right now. So we have a core network between three different data centering in Italy and we have a bunch of thousands of towers all over the place. So the first idea is to basically segment this, regionalise this networking into OSPF areas, around plane OSPF within each area with and connector back to the core. This will be for ‑‑ will be our safety net, because you have to keep in mind this is a wireless network so there is no side channel, there is no out‑of‑band. We have to connect only using the same wireless link we use for the coding and being, by definition, extremely experimental, our implementation we don't want to screw things up; we want to be able to recover easily. So management will always go through standard OSPF to each of these nodes. But on top each mode is basically reporting back to the central controller who are the neighbours. So that the controller can have a quasi‑realtime knowledge of the network topology and how networks ‑‑ how routers are connected, what is the calling between them and the capacity. Once the controller has all of this information we can provision OpenFlow rules to each of the switches, in order to match an MPLS label, in order to decide which outward port to use by creating a very simple L2 forward mechanism based on MPLS label. The advantage of using this architecture is that we still have a safety net underneath of traditional L3 OSPF in order to reach management and repair what is wrong. There is a problem with the interfacing of this with the legacy core. As I said, the core doesn't know anything about the SDN or the controllers and we don't want to do this at this stage, we don't want to add too much. So what we do is creating a custom label discovery protocol on the ‑‑ sorry, LDP on the ABR routers, in order to have the controller telling the ABR router which prefixes to announce and which MPLS label to announce back to the core, it's a bit of a hack so basically we have a new SDN network integrating with a central core by using fake or forged message created by a local processor running on ABR routers which is instructed by the central controllers. An advantage of this topology or architecture is when we have a link like the following, like the one on the bottom, which is going between two different ‑‑ which is across two different regions, we wouldn't be able to traditionally L3 routing by definition to allow traffic directly between routers on different areas, but because traffic, use of traffic is following MPLS OpenFlow rules, we can reroute the traffic across links on different regions which are not carrying ‑‑ which are not following the underlying L3 topology.

So the reason why I wanted to tell you about our experience over the last couple of years and about our architecture we are implementing is that it's for three reasons, basically:

The first is for my ego. The second is to urge you to think about commodity in networking. The way we did it could be used ‑‑ could be followed also by different networks in different scenarios, maybe not from an Internet service providers but from carriers or content delivery networks. We are not the only one; we are backed up by other, larger projects like open computing, a different domain area, trying to push people to arrange reengineering of their own networks. So, I don't have any proposition in mind right now. We are not ‑‑ I am not planning to sell these machines. I think it would be a disaster if we did so. But if you are feel in a somewhat similar situation to us, please do get in touch. Maybe we can share experiences or tackle the problem together. So thank you.


BENNO OVEREINDER: Thank you. Any questions?

DANIEL KARRENBERG: Always obliging. RIPE NCC. I love this talk. This is the stuff that made RIPE interesting ten years ago. It's doing ‑‑ looking at stuff in a new way. I like it, thank you.


DANIEL KARRENBERG: I have a pretty simple question. Can you give a ballpark figure on what these boxes cost you? Ballpark, you know, no trade secrets.

GIACOMO BERNARDI: We can talk in detail outside. But lets say they are comparable to a metro ethernet switch.

BENNO OVEREINDER: I had a similar question, but just before that you decided to do this, so did you some risk calculation, so you were thinking, well, we cannot buy something we really like to have, so did you just started the travel, the endeavour, and to boldly go where no man has gone before.

GIACOMO BERNARDI: It's a good question. I strongly believe that the problem, many operators, even ourselves, are facing right now, it's a cultural problem, so you don't want to take risk yourself as a network ‑‑ sorry head of network engineering. Why would you want to risk it? There are main vendors producing things which may not adapt for your very peculiar network. Instead, we had a very peculiar ‑‑ you know, no one wants to reinvent the wheel. We had a very peculiar bicycle so wanted a peculiar wheel. I think it's a cultural obstacle. In our case, it's a big start‑up, it's like 190 people but it's mostly made of ‑‑ so this obstacle is something we can tackle, we can overcome. I think we should all rethink whether it is the right moment or the moment is about to come for the solution to be applicable generally.


AUDIENCE SPEAKER: Martin Winter, NetDEF/ OpenSource routing. I am wondering did you do any performance measurement on the forwarding and use any like zero overhead Linux or the M pipe for basic line stuff?

GIACOMO BERNARDI: Yes. As, you know, Telero and ‑‑ has a very cool feature which is N pipe, it's a small piece of asic on chip which does inspection of the first I think 256 bytes of each packet, if I remember correctly. We do use this and we do use ‑‑ we explore large amount of caching that is on chip in several applications, currently not in open B switch, so it is plain vanilla which has been imported to Talera, but we do use it for other applications such as current ‑‑

AUDIENCE SPEAKER: Did you do any performance measurements? I notice you have small interface for 72 core talera?

GIACOMO BERNARDI: Sorry, when I mentioned that we tested up to a million flow, what I meant is, with eight 10 gig interfaces we were doing line rate speed with a million flows, is what I meant.


SHANE KERR: So I had one final question ‑‑ well a couple of final questions, maybe. So, first question is: How replicatable do you think this work is? Do you think any organisation with 190 geeks could do it?

GIACOMO BERNARDI: Only the best ones. No, I am joking. You know, one killing feature in our network is the mesh factor, so it really creates value, I believe, for this application. If we were a traditional DSL or incumbent‑based carrier there wouldn't probably be such a great value in inventing the wheel, so generally speaking, I don't think it is applicable everywhere, to be honest, but we should question whether what we are doing right now with adapting our networks to what is existing in the market, is the best way forward.

SHANE KERR: OK. That is fair. And my final thing was: You mention delay unaware network usage and I was wondering what kind of validation you did on that because it occurred to me when you were talking about http flows because nowadays people are adopting all kind of things and dropping them on top of that, so unless you apply some sort of heuristics, you may not actually know that http transfer actually is delay‑tolerant, lets say.

GIACOMO BERNARDI: Yes, this is what I meant with very simple classification. At this stage, we don't ‑‑ although I believe there is the computing capabilities on chip, what we want to do have very plain. The unaware traffic during an emergency situation, all the traffic that is coming for the local CDN cache so very high level classification, and if you misclassify, you should tweak it towards a force negative so that if you classify what you are really, really sure is low priority.

SHANE KERR: Makes sense. And I guess the goal is sort of like having a little bit of delay is better than the packets just not getting through.


DANIEL KARRENBERG: My age is showing, I forgot one. You mentioned when you describe the hardware, oh we put SSDs in it and I will tell you later what we use it for. Did I miss ‑‑

GIACOMO BERNARDI: I may not have mentioned but what I meant is all that software architecture, it's a standard Linux ‑‑ on that DS SDN there is plenty of space it's 128 gigabyte, so it takes a quarter of that space, I think.

SHANE KERR: OK, thank you very much.



SHANE KERR: Our next speaker is going to be by Shawn Morris from NTT Communications.

SHAWN MORRIS: I manage the IP development group, as you can see Jobe enthusiastly waving you. So I run the IP development group for the global IP network.

I want to talk to you this morning about some of the stuff that we are doing, I think it falls under the rubric of SDN but a lot different than what what was just talked about. This is what central says it is. I am going to talk about today, this part is the most important, it's really about programmatic control of the network. It's something that we are strong believers in and has become the lifeblood of our network.

So, who are we? We are 2914, we used to be Verio which back in the day. We were acquired by NTT in 2000. So we are a wholesale IP transit network that is basically all that we do. We don't offer retail services. We have about 150 IBGP nodes in our network, about half are core which run RSVP‑TE. We are running 14 metro DWDM systems in US and Europe. We offer pseudowire Ethernet services. Most of what we do is 10 gig or really at this point end byte 10 gig, and we are in a bunch of places.

So what is is it. We call it GUMS, that is terrible acronym but we came up with it a long time ago. It's fully automated network operation so everything that happens on the network is driven through this system. People are not getting in and configuring things by hand. It's all home‑grown, so this was an organic engineering driven effort. This isn't something where we went to Tokyo and sought several million dollars of funding or something to start up a project. It really just started as engineers looking for ways to automate their work and make things easier. It started all the way back in late '90s, I started with Verio in July '99, the initial version of this was already in use. We now have four full‑time developers plus other people within the development organisation contributing. And I say it's almost a full SDN, so to me when I say full SDN, I mean now putting an IP layer on top of this. The best example is we have a separate OSS system that our customer engineers use. Today, they have to do a lot of double entry of data both in that system and then they move it over manly into this system, the goal is to put an API layer on top of our controller so the OSS director can talk directly into that. We have about 200 other devices, this number is probably larger now. Switches, out‑of‑band routers, routers for SLA measurements and then DWDM devices as well.

So why do we do it? As the gentleman from TeleGeography talked about yesterday, IP transit prices, they just keep going down and we like to make money. So the way you can make money in this market is to keep your costs under control.

This minimising peer review so I have talked to a lot of other operators, they spend a lot of time peer reviewing changes. We do it as well but what we do is when templates and other parts of the code are generated and once that is done we can trust an installation engineer to turn up a new peering session or new links in the network and we know that they will be configured the way we expect them to be.

So that in turn leads to lower staffing requirements. It gives us a very high quality of service, I think if you talk to most of our customers they are pretty happy. We have literally changed out every edge facing port in the network at least twice in the last four years and there is almost positive been zero configuration errors in that so you are talking about thousands and thousands of move edge change activities Wouterer. It's consistent. If you buy a 10 gig. Port from us in Hong Kong or Tokyo or London, it doesn't matter; you know that your port will be configured the same way.

And it leads to very fast move edge change activity so it's not uncommon for us to move 50 plus in a three‑hour window with no outages. And it gives us extensive network visibility and it's now in an SQL database. So if you want to know how many pseudowires you have configured with an MTU greater than 1,500, it's an SQL query. If you want to know how many 10 gig E customers are con ‑‑ all of that stuff is in the database.

And this just gives you a visualisation too. So this is our network in 2004. This is it today. And that you will growth was managed with a very minimal increase in staffing. And that wouldn't have been possible without this system.

So how do we do it? It's databse‑driven. The network is modelled in the database. You could blow up every device we have in the network today, we will ship new gear in and regenerate the configs and there it will be. Data from that database is taken and transformed through platform‑specific templates and puts out an entire device configuration, every single thing we want to configure on the box. The knock may go into a device in the course of troubleshooting and change some settings but at some point the configuration will be pushed back from the server so everything that lives on the server is canonical. We are using brute force, because you have to remember this was started in the late '90s really about the time JUNOS came out, we generate that entire config and we ship it to the device and tell the device to sort it out.

It's built on pretty much open technology, postgress, GNU make, uses M4, some of you may be familiar with M4, M4 is a horrible terrible language that I don't recommend to anybody but the reality is that it is a macro language, so it suits our purposes fairly well. We have a custom binary that includes a customised M4 processor, that is what takes the collection of the templates and the data and puts them together and spits out a configuration. There is a plain text file for each router in CV so. If there is something else we need to do that is not covered in the current code in the tools we can get into the plain text file and manually put in configuration. We have custom scripts built on RANCID and we still run RANCID for historical archive purposes. You can go all the way back to I think '96 or 97 and see every configuration that ever existed in the network.

So what is in the templates: Just your standard device parameters, AAA config, SNMP, all of the interface parameters that you can configure, it's all specified in the templates. Routing policy, which is the vast, vast majority of the templates, as a wholesale IP provider, you know the biggest differentiator we offer is in our BGP policy and it can include version dependent options so we can put branching in there so if a new feature didn't arrive until JUNOS 12, it can say OK, only enable this option if the router is running this revision or later.

As far as what is in the database, pretty much everything else. All your SNMP communities, there is a table full of every ASN that we peer with, whether customer or peer. ACLs, we have support for both standard and raw extended ACLs, all of our out‑of‑band ports are configured through this system and then when you get into the interfaces, just all your IP addressing, v4, v6, last, do you want to turn MAC accounting on, sonic configuration is there, all the MPLS configuration is driven out of this system and of course the routing.

So what requirements do we have for the devices: The biggest one, you have to be able to SSH into it. The ability for the device to go out to our configuration server and retrieve the file via FTP. And then commit and roll back, I think everybody is familiar with. Gerard coined this term roll forward, it's commit/replace, I am going to ship the configuration, you figure out, add what what has been added and remove what has been removed and change what has been changed in the least disruptive way possible.

If it's lacking the above, then we really like, when we have iOS devices which at this point are really just some switches and not band routers, we always replace the start‑up configs. We are replacing the start up config. We don't deploy devices like that any more because they have become too hard to manage, require somebody to cut and paste from a configuration on the server. It's way too error‑prone.

So this is what the plain text file looks like. You can see it's really just a bunch of macro calls. So, you know, up here you are just setting up the device parameters, what is the platform. We have been doing XR and still called HFR. The main reason I show you this is to show you what it used to look like. So this is what it used to look like way back in the day. So the way this system evolved, when I first started working on it it was really like typing an entire router config into an entire text file. What we did over time, is we moved things into macros, so eventually this became a macro and the IP addressing became a macro and things slowly moved over time into the database. And one of the reasons I like to highlight this is to show that you can take an incremental approach, it's not an either/or proposition, that one day you are doing nothing and the next day some fully automated management system.

So the work flow, you have a user enter database change via web UI.  It could also be done via Ross QL which we generally don't use unless it's some kind of special situation. the user initiates the config build via make command on the configuration server. That takes and generates a config file and then the user runs the command called load could have been member ‑‑ and then it commits the configure. And in the case of JUNOS it does a commit check makes sure it's intactically correct and report back any errors to the operator.

So what can you do with this once you have it: Automatic BGP and max prefix updates, a lot of people are doing it without systems as in‑depth at this. It's obviously one of the big applications. That is where if you go in and make a manual configuration change the life of that is probably at most about 24 hours until our daily ACL changes happen. We, when we were using iOS we had a nice tool where if you were configuring a new peer or customer you could spit out an entire configuration file with all the relevant config for that customer and it would just be applied via copy mem or copy running config, which was nice, that eliminated a lot of the cutting and pasting problems that we had in the past. Mass update of RSVP‑TE LSPs, we don't use bandwidth because we have had this system. Also our network at this point is fairly stable from an LSP point of view. But if we immediate to make changes we can just do it in the database layer and quickly push it out to the devices.

The bulk move of interfaces and sub‑interfaces, we can move an entire card from one router to the other. It doesn't matter which platform it is on either end, it will figure out all the relative config bits so when we move from iOS to iOS XR we can move things from XR to JUNOS, it doesn't matter. The system takes care of all the configuration, you don't need to train everybody. You can see your other system with data, so we are pretty good at billing, which is nice. But one of the reasons is that the information comes out of here automatically, and into the billing system. We can see our NMS with this data so we don't have somebody adding device to the income and forgets to tad to NMS we never have these problems where things don't end up in our systems because they are all being driven out of the database and they need to be in the database to be added to the network. The last one is the one we are starting to explore now and I think where a lot of the power of the promise of SDN comes in. It's a simple interface for configuring complex services, so the first thing that we did was, we have some places in the network where we are using small Ethernet switches for aggregating sub 10 gig connections up to single 10 gig trunk. We created an interface where you can go in and in one configure the client port on the switch, the VLAN on the trunk and on the sub interface in the router. We have tools now where you can configure a lag of any size in one interface, you want to add 16 members, it will make sure all the settings are consistent, it will configure both ends so you can make sure your MTU consistent, your circuit IDs match up. That is really where we are starting to stretch this now as kind of the service layer.

So we are doing optical SDN as well. This is we are provisioning 10 gig waves in a Metro‑DWDM system. We are using the Cyan's planet operate. It's more of traditional SDN model where there is just an API that we are talking to and provisioning only at the service layer. Unfortunately, I cannot do a kind of A to Z configuration of the box programmically. In our experience so far, the optical vendors, it's really a mess. A lot of them are focusing on getting a good service layer but as far as bringing a device into the network for the first time, there is a lot of pain there, and yeah, the less said, the better.

And right now we don't have a tie between IP and the optical service layers and the tools, there is no ‑‑ in the data model, there is no tie. But we have hooks there so we can do that down the road. We are waiting on some internal development by another group to really flesh that out.

So what are the challenges for this? We need better support for concurrent operations. So when the system was designed, it was designed kind of the best way we knew how, but the problem is, today, if you build a configuration, it builds the entire network, well it may not but locks all of the configuration networks while you are building. As you can imagine, that can be problematic when you start to operate a network of our size. Even down to the device level now we are deploying AS R 9922s that have 100s of 10 gig. Ports so it's pretty easy to envision a situation where you have got two operators trying to configure on the same box at the same time. The brute force configuration management has its limitations, you essentially have to touch the entire device config at once. If you want to touch different services on the same box at the same time, you run into con currency problems.

Most of the big vendors, programme attic solutions, they are not ready, in particular Cisco seems to have had a number of start and stop efforts of different things and most of them are very focused on service provisioning so if you tell them you want to configure the con sol port programmically, most of the vendors are not ready to do that and we want to make sure the entire box is configured programme atically, it's very important to us that our services are configured consistently but it's very important to us that the device security and everything else is also configured consistently.

So that is great for me, I work for this gigantic company and probably have unlimited resources and you don't. So what about you? The first question of course, and this is the big one, does SDN have a place in carrier networks. I say yes but I am not talking about OpenFlow or PCE or all this operation in the control plain and the data plain. Maybe that has a place for us, the routing protocols work fine. You know, we don't ‑‑ we are not in a hurry to get get rid of MPLS or iBGP or BGP, they certainly have their shortcomings but none of them would be really settled by trying to separate that out. So what we are operating is really kind of what would you call hybrid SDN model. It's not full SDN, like I said, yet, what I consider full SDN, but there is still a huge amount of value. I don't know how we would operate the network without it, to be honest. And I think one thing that was said that I agree with 100 percent, this thing is coming down the track whether you like it or not. And so, obviously it's much easier to get on board now rather than wait until it hits you. The other thing, when I showed you the slide of the two text files you can do this incrementally, right, you can bite off, you can start biting off small chunks today and finding places in your network where you can automate. You don't have to go from zero to 100, I think for a lot of people it's very intimidating, first of all, you have to try to figure out, I think Shane said this, like what is SDN, and the reality is that it can be whatever you want it to be. Every vendor has an SDN story because everything runs software. But you don't have to go from zero to 100. There is ways you can chunk it up and get moving today.

So what do you need to know. There is no magic bullets, there is nothing you are going to go out and buy that will give you SDN. Maybe if you are doing a green field deployment you can deploy something but you are always going to have custom development work. Otherwise you get into the people soft problem so I think most of us have belt with people soft at some point or another and the problem with people soft is you spend a ton of money and you either conform your business practices to conform the way people soft works or you spend a ton more money to customise people soft to work the way that you want it to. You don't want to end up doing that with an SDN solution, to buy something and change the way you have to run your network to fit that. And it also requires a cultural shift. You have to bye‑bye in, I want to say I want to hire lazy engineers, I don't want working for me that enjoy doing the same rotas, over again. I don't care to find the guy that is the hot CLI jockey. To be honest, I hope the days of the cc I and everything die soon. I would much rather see people focusing on solutions but that is a big cultural shift for a lot of places and I think it's going to be very interesting to see how some of the larger carriers adapt to that.

So should build or buy? I say probably both. There is no right answer, of course. And by buy, I also mean using OpenSource components and things of that nature.

So you can obviously take stuff off‑the‑shelf and integrate it with your custom code. Either way, you are going to have to development costs but you are going to have to integrate it with everything else in your network so you are going to staff for that or outsource and the reality is it's very hard to find people that can work on this stuff. And if you go home‑grown, obviously that gives you the ultimately in flexibility; you are not locked into the vendor, you don't have any external dependencies, oh, well my SDN vendor, I need to deploy JUNOS 13.3, but the SDN vendor is only qualified up to 13.1. You don't run into those problems. Fur going to do it home‑grown you have to have people in house who can do it.

If you go off‑the‑shelf, there is obviously a bigger potential to run into this people soft problem. So the impacts on your organisational operational cultures, I have heard this, I gave a similar talk to this a few years ago at NANOG and a guy came up to me, this is great, that is what we believe in, our manager said no because if do you you will automate us out of the job. Which is ridiculous because I have yet to see the ‑‑ if you can get these stupid day‑to‑day road tasks off your plate you can move on to bigger and better things, hopefully. You do need a tight integration between your network and development staffs for the best results, maybe even a DevOps approach. You know, the software developers that work for me, three of the four were all network operators before they switched to kind of full‑time software development, and the fourth one was a system administrator. Obviously, you are going to become very dependent on your systems for operating your network so if you have a good relationship with your system administration team, that is crucial. It's something to be honest, that has been a bit of a pain point in the past for us and it's gotten a lot better but we are still dealing with some of that package. Things still need to be fixed when they break. The reality is with our network, I I could take my wife who is a very smart woman, a math teacher but doesn't know anything about networking. I could teach her how to turn up a router network, put it in the ‑‑ get iBGP up and running on it. I could teacher, but the problems is once it breaks she would have no idea how to fix it, and the problem is you can get the same problem with your network operators. Some of the functions become so easy and they break so rarely, to be honest, that when they do break sometimes people have a hard time re‑engaging their brain and figuring out how things work. In our network there is multiple layers of extraction between, we have two MPLS layers for pseudowires and traffic engineering and then down to BGP and iBGP so you get all these multiple layers. People might only have to really dig into it once every two years and so when it breaks, it's very important that you make sure that they can still remember how to dig in and break that. Or fix that.

And it's also very important that the operators understand how the tools work. We have issues that come up in the network and it's important for the operator to be able to distinguish, is this a problem with the tool or with the device or is it some other network problem. If they don't understand how the tool works, they are just going to end up sending a bunch of needless escalations up to your tools team which is going to do nothing but lengthen your outages.

Impacts on your equipment selection. So obviously integration with your SDN toolset becomes the most important. Some vendors willingly or unwillingly will pull themselves out of the pool of gear you can deploy. If you go with the off‑the‑shelf components, like I said, you can further narrow your choices, but the upside is, bringing new platforms in can be a lot easier. When we brought in iOS XR it was a minimal amount of training, because the /EPBG /TPHAOERBGS 95% of their work is driven by the tools so they don't need to learn RFP L or all of those nuances about the differences in the platforms. Show commands and those kind of things but they didn't have to learn an entire new configuration language so it makes bringing in new equipment a lot easier.

So there is some pitfalls. Supporting hacks I think ‑‑ I think hacks are bad overall, but they can really become land mines so some ‑‑ there is one outage in particular I can think of in our network, is probably the one that was our fault in the last five years and that was because of a hack that had been put in for a large internal customer and it sat there for roughly about five years, basically everybody forgot about it. We deployed a policy change and it conflicted with the hack and it wasn't caught. If you start deploying these hacks around the network they become very hard to try to integrate with your testing to make sure you are capturing them. Your costs can spiral out of control. You need that developmental expertise, that can become very expensive, especially if you are out soaring it. If you do put an IP layer on top of your network give people very destructive access to your network. And when you are building this kind of system, there has been points in the past where we have had to go back and do some fairly major changes to the data model to move forward and do the things we want to do. You have to be very careful, when you are designing things, to make sure you don't paint yourself into those corners.

Finally, in conclusion, like said, this stuff in one way or another is coming. So better now than later. There are more and more tools every day that are coming out, especially in the OpenSource community, that allow to you do this kind of stuff. So the last point is the most important, like you can do this stuff. There is things you can get out and do today. So please try.

AUDIENCE SPEAKER: Leslie Carr. First off, it's super sweet that you started doing network automation before even a lot of systems folks were doing systems automation. But a question is: Have you considered moving to one of the common OpenSource tools because I think that you were saying you have a problem finding people with expertise and I feel like a lot of your normal SIS add minutes who use these traditional OpenSource tools would be 95% of the way there, pup he will chef,

SHAWN MORRIS: I think if I was starting from scratch today that is definitely one of the directions I would look at for exactly that reason. We now have our system that we have built, and this also goes back briefly to kind of what was said, everything ends up being specific, everybody has their own quirks about what they have done and there are certain things we want to do that seems we kind of need to do the way we have done them. If I was starting from scratch today I think those are probably the place I would start.

JEN LINKOVA: Thank you very much, very interesting talk. Thank you for telling us about your approach for doing this. And I have two questions: First one is quite specific: How do you deal with the problem of ensuring that the changes you are pushing to the network is not actually destructive?

SHAWN MORRIS: To be honest, for the most part, so the tools are structured so that you can't do anything really stupid, right? Well, but you have to go out of your way. If you want to reimport BGP into your iBGP you have got to really pull the pin and stuff the grenade in your pants. There is not a lot of safeguards though. You get a diff that shows you what is going to change. For the most part we do try to hair smart operators so the hopefully they will actually look at the diffs before they push things. But there is not a tremendous amount of control about what gets turned into the network as long as it's coming out of the tools.

JEN LINKOVA: Thank you. The second one: So you mentioned this important problem, but sorry if I missed your solution. How do you deal with the problems with people ‑‑ that people might talk and distinct what is configured on the routers because they are not touching in network anywhere, they go once a year to troubleshoot some problem that they don't understand what we have configured here?

SHAWN MORRIS: To be honest, we don't have a tremendous solution for it. It's really trying to stress with the engineers to make sure that they are paying attention. We have discussed different things about trying to set up labs and breaking things and making people try to find what is broken and that kind of thing, but I don't have a great answer, but it is like I said, it's something you need to be cognisant of.

AUDIENCE SPEAKER: I was wondering when you run this configuration, do you push a specific times, so, for example, when you are ‑‑ when you know you are making changes or is it a regular thing like some of the other configuration tools like puppet?

SHAWN MORRIS: We have both, if somebody is making a change themselves they push it when they are ready and we have automated changes that push at all the ACL loads.

AUDIENCE SPEAKER: The reason for asking about the automated changes was, what happens if, for example, you have an issue, something breaks in the network, potentially unrelated, and you need to stop making changes so that you can debug or temporarily fix a problem that you can reintegrate back into the configuration?

SHAWN MORRIS: So in that case we are part of NTT which is gigantic but the entire business unit is relatively small so it's pretty easy to put out a stop message to the relevant groups and say, hey, this is what we are working on. If we absolutely have to we can lock the configuration tree to keep people from changing anything. I think there has been one or two cases where we have had problems with the build system, for example, that we didn't want people making changes and so we just locked the build system.

AUDIENCE SPEAKER: OK. Thank you very much.

RANDY BUSH: IIJ. I am somewhat amused to hear the relabelling of what we used to call configuration management is now called SDN. And the piece that I'm missing is the network model and the high level understanding and description of the network that devolves down to the M4, and I know the idiot who did the M4 first. And the ‑‑ and what sets off the loudest alarm is when you said there was a hack and you pushed down change from your system and it broke the hack. If you had an architectural model, if you had stuff that was transforming that into configuration so that ‑‑ I think you know where I am going ‑‑ how did that happen structurally in the architecture of the system?

SHAWN MORRIS: Well, we allowed it to happen. I mean, that is the reality of it.

RANDY BUSH: No, but what I mean; how did a hack get there, if your architectural model did not accept hacks?

SHAWN MORRIS: Well it does accept hacks unfortunately, but you worked here so you know what it's like. The reality was that we had ‑‑

RANDY BUSH: We won't get into how much of it is my fault.

SHAWN MORRIS: Yes. So the reality was that we had a large internal customer that had, lets say, a demand for a certain change and ‑‑

RANDY BUSH: No, no I don't mean ‑‑

SHAWN MORRIS: This becomes a policy thing, right, so it ‑‑ what we didn't do back then is say, OK, if we are going to do this then it should have been fully integrated back into the templates, right.

RANDY BUSH: OK. So there was no complete flow down to that.

SHAWN MORRIS: Correct. So after that outage we have gone back and said we are not doing this any more, essentially. If something like this needs to be done it needs to go back and be fully integrated

RANDY BUSH: Torn down and brought from the top?


RANDY BUSH: Can you say anything about the architecture of the network as seen by the top level of the tool when I go to want to describe changing a network, somehow? What kind of things do I talk about? I mean, I hope it's not ‑‑ you know, I hope it's high level. That is what I am hoping from from SDN and I keep having trouble finding.

SHAWN MORRIS: To be honest today, this tool is largely still built around touching a lot of individual pieces, and we are trying to start to build our own abstraction layer on top of that, which I think is what you are asking me. We tend to pull the data out and put it in so we can pull the data out and put it into carrot and mate, for instance, to visualise the network but we are in the process of building our own tools for doing our own visualisation and abstractions, especially at the service layer. And by the way, I am liberally laying on the SDN buzz word for configuration management.

AUDIENCE SPEAKER: Christian Petrasch, DENIC. You told us you have several changes of the ACLs during the day. Do you have an automatic testing environment for network configurations?

SHAWN MORRIS: So obviously any time we make changes in the code, so roughly we push quarterly changes, we do kind of quarterly versioning, we are working on things for doing more continuous integration as we commit things. Right now, we tend to do our testing around those quarterly bills. But once the ‑‑ for the most part, once the code is committed, we have had very few issues where something breaks within a release cycle.

SHANE KERR: OK. So I had to send people away from the mics. I had a lot of questions. So thank you very much, I think this is a really great talk.

And the reason I sent people away is because we are a bit over time. Thank you all for staying. One quick announcement, we have a winner from yesterday from rating the talks. Dana Slaffing has won Amazon voucher. We will see you all back here in 25 minutes or so, thank you.