This post will go over the planning of the ThousandEyes lab used in this series. To see past posts in this series expand the box below.
- Part 1 – The What and the Why
- Part 2 – Lab build
- Part 3 – Enterprise and Endpoint Agent Installs
- Part 4.1 – SNMP Monitoring
- Part 4.2 – Scenarios and Test Types
- Part 4.3.1 – Scenario 1 – Enterprise agent to agent test configuration
- Part 4.3.2 – Scenario 2 – Enterprise DNS test configuration
- Part – 4.3.3 Scenario 3 – Enterprise and Endpoint HTTP test configuration (Coming soon)
- Part – 4.3.4 Scenario 4 – Enterprise Page Load test configuration (Coming less soon)
- Part – 4.3.5 Scenario 5 – Enterprise Transaction test configuration and Endpoint Agent Browser Settings (Coming more less soon)
- Part – 4.4+ – Details TBD
There are some behind-the-scenes posts that go into more detail on how and why I took the approach that I did. Those can be found here:
- Behind the Scenes – The Lab Build <–You Are Here
- Ok, there’s only one so far, but I plan to add more where it makes sense.
If you are following the series, this post is strictly informational. It won’t contain any steps that need to be performed in the lab. The goal is to provide insight into why I made the design choices I did with the lab.
The details on the lab build can be found here: https://www.mytechgnome.com/2022/04/thousandeyes-walkthrough-part-2-lab.html
CML
- There are plenty of similar tools (GNS3, EveNG, etc) that are available, so why did I pick the paid tool? The simple answer is licensing. My understanding is CML is the only way to run virtual Cisco instances without running afoul of the EULA. Yes, I could have used non-Cisco routers, but since Cisco is a major vendor it seemed reasonable to go with it.
- The Personal version of CML has two flavors, Personal which allows 20 active nodes, and Personal Plus which allows 40 active nodes. I built the lab using 20 nodes because the Personal Plus is an extra $150, and because the additional nodes would increase the resource requirements. I wanted the lab to be as accessible as possible. It could easily be extended to 40 nodes or higher, but 20 is enough to get basic testing done.
- Even though the TE agents could be deployed to VMs, I wanted to use CML as a way to easily simulate scenarios where an engineer would need to do some troubleshooting. Within CML links can be configured with bandwidth limits, latency, jitter, and loss. The theory is that ThousandEyes should be able to detect and even alert on those conditions.
- I am using version 2.2.3, even though version 2.3 is available. The simple reason is that Cisco is still recommending version 2.2.3. There are some known issues with 2.3, which is why I’m not running that.
IOSv Routers
- Even though CML can run CSR 1000V and IOS-XR instances I decided to go with IOSv instances. This was because of resource requirements. The CSR 1000v and IOS-XR instances each require 3GB RAM, and with 14 routers that would consume an additional 35GB RAM over what the IOSv routers use. For the purposes of the lab, the IOSv can do everything needed without the overhead.
Ubuntu
- I wanted to keep as much of the lab in CML as possible, and running Ubuntu in CML aligns with that goal. Of the Linux flavors that are available out of the box in CML, Ubuntu is the only one supported by ThousandEyes.
- With Ubuntu being used in the CML lab it seemed reasonable to use Ubuntu for the Docker host as well.
Topology
- I’ll admit I spent a lot of time working through different topology options. At one point I had switches and HSRP in the design, but I decided to back away from layer 2 technologies to focus on layer 3. The primary use case for ThousandEyes is looking at WAN links, and with the node limit in CML, it made sense to drop the L2 configurations to make room for more L3 devices.
- I wanted to maximize the number of BGP AS configurations while maintaining multiple links, which is why there are 7 BGP AS configurations. By simply shutting down specific links traffic could hit 6 of the 7 AS networks. With some BGP reconfiguration that could be extended.
- The two “Client” networks are intended to be what a network engineer would have in their environment. Likely they’d have a lot more, but with the node limits having two networks is enough to test with. Each of the client networks has two Ubuntu nodes that are running the TE Enterprise agent. One of the Ubuntu nodes is also running Apache. (more on Apache shortly)
- In the “Public” network I wanted to add another BGP path outside the redundant ISP paths, and I wanted a service that was accessible. With this being treated as public I opted to not run a TE agent there.
- Access outside of the CML environment is done via the “External” network. ThousandEyes is a SaaS service, which means the agents all need to be able to connect to the TE portal.
- Even though the entire network is built using RFC 1918 addresses, the design is effectively using public addresses throughout the entire lab. The “Client” addresses are propagated through the ISP and public networks, which isn’t typical in IPv4 deployments. This was mainly choosing simplicity and efficiency. If the client networks were masked then something like a VPN would be required to link the two client networks. Though that better aligns with the real world, for the functional purposes of the lab it makes no difference. Both ends need IP reachability and adding more NAT and VPN configuration work doesn’t provide a significant improvement in how the lab operates.
External Routing and NAT
- On the external router, NAT is configured, which should allow internet access from the lab with no additional configuration needed. The 192.168.1.0/24 network is excluded from translation with the intent that devices on the LAN (Docker, Windows, and Raspberry Pi agents) would be able to connect directly to devices in the CML lab.
- For the LAN devices to reach the CML lab routes need to be added either to the LAN router or as static routes to each of the devices. Using the LAN router requires the fewest changes, and is the most extensible.
- Unfortunately not every environment is identical. I suspect that there may be some issues with getting the routing working properly. I spent a lot of time trying to decide if this routing solution was better than just using DHCP on the external router and doing full outbound NAT. I decided that having the external agents able to have full connectivity to the internal agents was worth the added complexity.
Services
- The Apache instances were set up just to create a simple webserver to establish HTTP connections. For transaction tests, I will be using external websites.
- Bind is deployed primarily for easy name resolution of the lab devices, and to have another service running inside the lab. Since ThousandEyes can do DNS tests it made sense to include.
External Resources
- The Docker, Windows, and Raspberry Pi agents are primarily just to provide the ability to test with those platforms. The Docker and Pi agents are functionally similar to the Ubuntu agents running in the CML lab. The Windows agent is an Endpoint agent, which brings a different set of functionality.
- I do expect that there will be improvements in test performance with these agents versus the ones in CML because there are fewer layers of abstraction. I can’t imagine an Ubuntu agent running on a minimum spec VM inside KVM, that is running on the CML VM inside Workstation is going to be the most efficient. Add in the software layers for the routers connecting those agents, and that only adds more potential performance impact.
- As mentioned previously, internet access is required for ThousandEyes agents to reach the SaaS platform. With that requirement in mind, it made sense to just use external websites for most of the testing instead of building elaborate web servers inside the lab.
Misc. Notes
- Everyone has their preferred numbering scheme. For this lab, I tried to come up with something that I could easily build on in a programmatic sense. Yes, for the router links I could have used /30 or /31, but in a lab, I’m not worried about address consumption. I built addresses based on the nodes being connected.
- I’m sure someone somewhere will be upset that I don’t have passwords on the routers. It’s a lab that I tear down frequently, and it’s inside a trusted network. The risk of an attack is minimal, and worth it to not need to log in to each device.
- The Ubuntu server version was the latest at the time of writing, and I went with Windows 10 to avoid some of the issues with getting Windows 11 deployed.
- With the complexity of the build in CML, I decided it was easiest to just publish the YAML code. Initially, I had intended to write up exactly how to build the lab, and provide configs for each device, but as I built it out it became clear that doing so would be quite cumbersome. Using the YAML file should give more consistent deployments, with less manual work to get the lab running.
- I’ve had several requests to incorporate AWS into this lab. Currently, that’s outside the scope of the roadmap I have for this series. The primary reason for that is because of the cost associated with AWS. Once I get through the posts I have planned for this series I plan to investigate if I can leverage the AWS free tier to get useful data.
- Despite most of the routers being in provider networks, each router has SNMP running. The reason I did this was to show how ThousandEyes can use SNMP to add additional context to data, and in some cases, it can be used to trigger alarms. In a real-world scenario you likely can’t get SNMP from provider networks, but you also likely have more than two network devices at a location. The decrease in realism is more than made up for by not having to build out a complete LAN environment.