Backgroound Image

ThousandEyes Walkthrough Behind the Scenes – The Lab Build

This post will go over the planning of the ThousandEyes lab used in this series. To see past posts in this series expand the box below.

ThousandEyes Walkthrough Table of Contents

There are some behind-the-scenes posts that go into more detail on how and why I took the approach that I did. Those can be found here:

If you are following the series, this post is strictly informational.  It won’t contain any steps that need to be performed in the lab.  The goal is to provide insight into why I made the design choices I did with the lab.

The details on the lab build can be found here: https://www.mytechgnome.com/2022/04/thousandeyes-walkthrough-part-2-lab.html

And here’s an overview of the objective of this series: https://www.mytechgnome.com/2022/03/thousandeyes-walkthrough-part-1-what.html

CML

  • There are plenty of similar tools (GNS3, EveNG, etc) that are available, so why did I pick the paid tool?  The simple answer is licensing.  My understanding is CML is the only way to run virtual Cisco instances without running afoul of the EULA.  Yes, I could have used non-Cisco routers, but since Cisco is a major vendor it seemed reasonable to go with it.
  • The Personal version of CML has two flavors, Personal which allows 20 active nodes, and Personal Plus which allows 40 active nodes.  I built the lab using 20 nodes because the Personal Plus is an extra $150, and because the additional nodes would increase the resource requirements.  I wanted the lab to be as accessible as possible.  It could easily be extended to 40 nodes or higher, but 20 is enough to get basic testing done.
  • Even though the TE agents could be deployed to VMs, I wanted to use CML as a way to easily simulate scenarios where an engineer would need to do some troubleshooting.  Within CML links can be configured with bandwidth limits, latency, jitter, and loss.  The theory is that ThousandEyes should be able to detect and even alert on those conditions.
  • I am using version 2.2.3, even though version 2.3 is available.  The simple reason is that Cisco is still recommending version 2.2.3.  There are some known issues with 2.3, which is why I’m not running that.

IOSv Routers

  • Even though CML can run CSR 1000V and IOS-XR instances I decided to go with IOSv instances.  This was because of resource requirements.  The CSR 1000v and IOS-XR instances each require 3GB RAM, and with 14 routers that would consume an additional 35GB RAM over what the IOSv routers use.  For the purposes of the lab, the IOSv can do everything needed without the overhead.

Ubuntu

  • I wanted to keep as much of the lab in CML as possible, and running Ubuntu in CML aligns with that goal.  Of the Linux flavors that are available out of the box in CML, Ubuntu is the only one supported by ThousandEyes.
  • With Ubuntu being used in the CML lab it seemed reasonable to use Ubuntu for the Docker host as well.

Topology

  • I’ll admit I spent a lot of time working through different topology options.  At one point I had switches and HSRP in the design, but I decided to back away from layer 2 technologies to focus on layer 3.  The primary use case for ThousandEyes is looking at WAN links, and with the node limit in CML, it made sense to drop the L2 configurations to make room for more L3 devices.
  • I wanted to maximize the number of BGP AS configurations while maintaining multiple links, which is why there are 7 BGP AS configurations.  By simply shutting down specific links traffic could hit 6 of the 7 AS networks.  With some BGP reconfiguration that could be extended.
  • The two “Client” networks are intended to be what a network engineer would have in their environment.  Likely they’d have a lot more, but with the node limits having two networks is enough to test with.  Each of the client networks has two Ubuntu nodes that are running the TE Enterprise agent.  One of the Ubuntu nodes is also running Apache.  (more on Apache shortly)
  • In the “Public” network I wanted to add another BGP path outside the redundant ISP paths, and I wanted a service that was accessible.  With this being treated as public I opted to not run a TE agent there.
  • Access outside of the CML environment is done via the “External” network.  ThousandEyes is a SaaS service, which means the agents all need to be able to connect to the TE portal.
  • Even though the entire network is built using RFC 1918 addresses, the design is effectively using public addresses throughout the entire lab.  The “Client” addresses are propagated through the ISP and public networks, which isn’t typical in IPv4 deployments.  This was mainly choosing simplicity and efficiency.  If the client networks were masked then something like a VPN would be required to link the two client networks.  Though that better aligns with the real world, for the functional purposes of the lab it makes no difference.  Both ends need IP reachability and adding more NAT and VPN configuration work doesn’t provide a significant improvement in how the lab operates.

External Routing and NAT

  • On the external router, NAT is configured, which should allow internet access from the lab with no additional configuration needed.  The 192.168.1.0/24 network is excluded from translation with the intent that devices on the LAN (Docker, Windows, and Raspberry Pi agents) would be able to connect directly to devices in the CML lab.
  • For the LAN devices to reach the CML lab routes need to be added either to the LAN router or as static routes to each of the devices.  Using the LAN router requires the fewest changes, and is the most extensible.
  • Unfortunately not every environment is identical.  I suspect that there may be some issues with getting the routing working properly.  I spent a lot of time trying to decide if this routing solution was better than just using DHCP on the external router and doing full outbound NAT.  I decided that having the external agents able to have full connectivity to the internal agents was worth the added complexity.

Services

  • The Apache instances were set up just to create a simple webserver to establish HTTP connections.  For transaction tests, I will be using external websites.
  • Bind is deployed primarily for easy name resolution of the lab devices, and to have another service running inside the lab.  Since ThousandEyes can do DNS tests it made sense to include.

External Resources

  • The Docker, Windows, and Raspberry Pi agents are primarily just to provide the ability to test with those platforms.  The Docker and Pi agents are functionally similar to the Ubuntu agents running in the CML lab.  The Windows agent is an Endpoint agent, which brings a different set of functionality.  
  • I do expect that there will be improvements in test performance with these agents versus the ones in CML because there are fewer layers of abstraction.  I can’t imagine an Ubuntu agent running on a minimum spec VM inside KVM, that is running on the CML VM inside Workstation is going to be the most efficient.  Add in the software layers for the routers connecting those agents, and that only adds more potential performance impact.
  • As mentioned previously, internet access is required for ThousandEyes agents to reach the SaaS platform.  With that requirement in mind, it made sense to just use external websites for most of the testing instead of building elaborate web servers inside the lab.

Misc. Notes

  • Everyone has their preferred numbering scheme.  For this lab, I tried to come up with something that I could easily build on in a programmatic sense.  Yes, for the router links I could have used /30 or /31, but in a lab, I’m not worried about address consumption.  I built addresses based on the nodes being connected.
  • I’m sure someone somewhere will be upset that I don’t have passwords on the routers.  It’s a lab that I tear down frequently, and it’s inside a trusted network.  The risk of an attack is minimal, and worth it to not need to log in to each device.
  • The Ubuntu server version was the latest at the time of writing, and I went with Windows 10 to avoid some of the issues with getting Windows 11 deployed.
  • With the complexity of the build in CML, I decided it was easiest to just publish the YAML code.  Initially, I had intended to write up exactly how to build the lab, and provide configs for each device, but as I built it out it became clear that doing so would be quite cumbersome.  Using the YAML file should give more consistent deployments, with less manual work to get the lab running.
  • I’ve had several requests to incorporate AWS into this lab.  Currently, that’s outside the scope of the roadmap I have for this series.  The primary reason for that is because of the cost associated with AWS.  Once I get through the posts I have planned for this series I plan to investigate if I can leverage the AWS free tier to get useful data.
  • Despite most of the routers being in provider networks, each router has SNMP running.  The reason I did this was to show how ThousandEyes can use SNMP to add additional context to data, and in some cases, it can be used to trigger alarms.  In a real-world scenario you likely can’t get SNMP from provider networks, but you also likely have more than two network devices at a location.  The decrease in realism is more than made up for by not having to build out a complete LAN environment.
I’m sure there are plenty of things that I forgot to include here, and likely some good ideas that I didn’t even think about.  If you have any questions on the lab design please leave a comment below, or you can reach me on Twitter – @Ipswitch

vSphere Lab Build Out – The Domain Controller Configuration

For the Domain Controller build the entire process is much easier and quicker when working from PowerShell instead of the GUI.  It also makes it more repeatable, which is awesome for labs.

The first steps are the basic config of the server.  Below is each command needed, with the variables in red.  Change what you need, then paste the commands into PowerShell.

Set the computer name: 

Rename-Computer LabDC

Enable Remote Desktop access (optional)

Enable-NetFirewallRule -DisplayGroup “Remote Desktop”

Set-ItemProperty -Path ‘HKLM:SystemCurrentControlSetControlTerminal Server’ -name “fDenyTSConnections” -value 0 

Disable DHCP, set the IP address, DNS, and default route:

Set-NetIPInterface -InterfaceAlias Ethernet0 -AddressFamily IPv4 -Dhcp Disabled 

New-NetIPAddress -InterfaceAlias Ethernet0 -AddressFamily IPv4 -IPAddress 192.168.1.210 -PrefixLength 24 

Set-DnsClientServerAddress -InterfaceAlias Ethernet0 -AddressFamily IPv4 -ServerAddresses 8.8.8.8 

New-NetRoute -AddressFamily IPv4 -InterfaceAlias Ethernet0 -DestinationPrefix 0.0.0.0/0 -NextHop 192.168.1.1

Install the AD, DNS, iSCSI, and Remote Server Admin Tools.
Install-WindowsFeature -name AD-Domain-Services,DNS,FS-iSCSITarget-Server,RSAT-ADDS

Reboot to apply the name change:

shutdown -r -t 0

Log into the server again, and create the domain:

Install-ADDSForest -DomainName Lab.local -InstallDNS

When prompted for the AD Restore Mode password enter the password, and then confirm it.  After that, accept the prompt by pressing the “A” key and hitting Enter.  Wait, while the new domain is configured.  When the process completes the server will automatically reboot.

The final task will be getting DNS configured with a reverse DNS zone, and records created for the various devices that will be deployed.

Add-DnsServerPrimaryZone -NetworkID “192.168.1.0/24” -ReplicationScope “Forest” 

Add-DnsServerResourceRecordA -Name “ESX1” -ZoneName “Lab.local” -IPv4Address “192.168.1.211” -CreatePtr 

Add-DnsServerResourceRecordA -Name “ESX2” -ZoneName “Lab.local” -IPv4Address “192.168.1.212” -CreatePtr 

Add-DnsServerResourceRecordA -Name “vCenter” -ZoneName “Lab.local” -IPv4Address “192.168.1.213” -CreatePtr 

Add-DnsServerResourceRecordA -Name “vRO” -ZoneName “Lab.local” -IPv4Address “192.168.1.214” -CreatePtr 

Add-DnsServerResourceRecordA -Name “vLCM” -ZoneName “Lab.local” -IPv4Address “192.168.1.215” -CreatePtr

That concludes the initial DC config for the environment.

vSphere Lab Build Out – The Domain Controller Deployment

When building out a lab the first thing I do is build out a Domain Controller and DNS server. I can then use AD for credential management, and the DNS functionality is helpful as well.  I also use that server to create an iSCSI target for my hosts.

1. Virtual Environment

The first step is to have your virtualization environment ready to go.  It’s easy enough to next-next-finish your way through the VMware Workstation install, so I won’t detail out those steps.

2. Download Windows ISOs

You can download the Server 2019 ISO here: https://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-2019

Select ISO, fill out the info required, and then hit continue.  Select your language, and then start the download.

3. Create the Lab Domain Controller VM

  1. In VMware Workstation press CTRL+N to open the New Virtual Machine Wizard, and make sure Typical is selected, then click Next
  2. Select the option for Installer Disc Image File, and browse to the location you downloaded the Server 2019 ISO to then click Next
  3. Since this will be using the evaluation license leave the product key blank, enter a name and password, and then click Next.
  4. Accept the prompt about not having a product key
  5. Enter the name and location for the VM, and click Next again
  6. Use the default hard drive size of 60GB (another drive will be added later for the iSCSI target storage), and click Next
  7. Click Customize Hardware…
  8. Set the VM hardware
    1. Set the CPU and RAM to what you’d like.  I used 2 vCPUs and 8GB RAM on my VM.
    2. Change the Network Adapter to Bridged
    3. Click Close
  9. Uncheck the box for Power on this virtual machine after creation and click finish.
  10. Now to add a the hard drive for the iSCSI target and remove the floppy drive.  In the library view right-click on the VM and click Settings
    1. Find the Floppy drive and click Remove (NOTE: If you don’t remove the floppy drive the OS install will encounter an error and fail), then click Add
      1. Select Hard Drive and click Next
      2. Leave the default drive (mine happens to be NVMe) and click Next
      3. Leave the default option to create a new drive and click Next
      4. Enter the size for the drive (I used 750GB) and click Next
      5. Leave the default file name and click Finish
    2. Click OK to finish the hardware changes
  11. Power on the VM

4. Install the OS to the Lab DC

NOTE: While in the VM you will need to press Ctrl+Alt to release the cursor to get to your desktop
  1. While the VM is booting you might see a prompt to press a key to boot from CD.  If that happens click into the window and press a key.
  2. Select the language, and keyboard settings
  3. Click Install Now
  4. When prompted to select the OS choose Windows Server 2019 Datacenter Evaluation (Desktop Experience) because we like graphical interfaces, and click Next
  5. Read through all of the licenses terms, and if you accept the terms check the box to accept them and click Next
  6. Select the Custom install option
  7. Select Drive 0, this should be the 60GB drive, and click Next
  8. Wait for the install to complete.  This might take some time.
  9. When the install is complete it will prompt for a password.  Set that and click Finish.
  10. The last thing to do for the VM deployment is to install VMware Tools.
    1. Log into the VM using the password set previously
    2. Right click on the VM in the Library an select Install VMware Tools
    3. Navigate to the D: drive and double click it.  That should kick off the Autorun for the installer.
    4. Follow the defaults for the install.  Next > Next > Install > Finish and then click Yes when prompted for a reboot.
The DC configuration will be detailed out in another posting in this series.

Cisco ISR Project – vWAAS deployment (14 of ?)

(I just noticed that I forgot to publish this, so anyone reading my posts on IWAN deployment… Sorry this one’s a few years late…)

To get the WAAS deployment done there are a few prerequisites:

  • Virtual Central Manager (vCM) deployed (at HQ)
  • vWAAS appliance deployed (at HQ)
  • vWAAS appliance deployed (at branch)
  • WAN connectivity between branch and HQ

A couple things to be aware of right off the bad:

  • Default username is: admin
  • Default password is: default
  • Telnet is enabled by default, and SSH is disabled.
    • To enable SSH run these commands from a config prompt (make sure hostname and domain are set before running)
      • ssh-key-generate
      • sshd enable
    • Telnet can be disabled, however, it seems the management software 
  • When logging into the web interface if there is a prompt to select an SSL certificate, click Cancel.  That should bring up the login page.

After the OVA has been deployed you should be able to log into the appliance and it should automatically start the device configuration.  If not simply enter the ‘setup’ command.

The setup between the vCM and vWAAS is pretty similar, so I’m just going to go over the vWAAS as there are more of those.  However, the vCM does need to be configured before the vWAAS, as the vWAAS needs to connect to the vCM.

WAAS setup

The setup is text-based, and pretty straightforward.  One thing to be aware of is if the CMS service fails to start (I set up vWAAS up without setting the correct vNIC settings) you can run the command ‘cms enable’ from a config prompt.  That should force the vCM to start, or force a vWAAS appliance to register with the vCM.

After completing the setup a window will pop up with a list of commands to configure WCCP on the router.

WCCP template

To make things easier, here’s a text version of the commands:

ip wccp version 2

ip wccp 61 (optional:waas-wccp-redirect-list) 

ip wccp vrf IWAN-PRIMARY/SECONDARY 62 (optional:waas-wccp-redirect-list)  

interface (Router LAN interface(s)) 

     ip wccp 61 redirect in 

interface (Router WAN interface(s)) 

     ip wccp vrf IWAN-PRIMARY/SECONDARY 62 redirect in

interface (Router NM-WAE interface) 

     ip wccp redirect exclude in

(optional: 

  ip acces-list extended waas-wccp-redirect-list 

       acl1 

       acl2 

       …. 

       aclN 

)

One thing that isn’t covered in this default config is the ISR uses VRFs for the WAN interface(s).  For the WAN interface enter the correct VRF and then the commands should work.

Links:

WAAS: http://www.cisco.com/c/en/us/td/docs/app_ntwk_services/waas/waas/v611/configuration/guide/cnfg/traffic.html

Prime: http://www.cisco.com/c/en/us/td/docs/net_mgmt/prime/infrastructure/3-0/user/guide/pi_ug/WAAS.html

VMware Horizon View – Service not binding to 443 after SSL certificate renewal

Here’s a quick and easy one.  Since this has burned my twice (and caused more hours of troubleshooting than I care to admit) I’m going to put it here in hopes that I remember it next time.

The short version: When importing the certificate make sure to check the box to make the SSL certificate exportable.

Continue reading “VMware Horizon View – Service not binding to 443 after SSL certificate renewal”

Cisco Prime Infrastructure VM error – INIT: Id “S0” respawning too fast: disabled for 5 minutes

If you run Cisco Prime there’s a chance you’ve seen the error “INIT: Id “S0″ respawning too fast: disabled for 5 minutes” come up on the console.  If not, this is what it looks like:

INIT: Id “S0” respawning too fast: disabled for 5 minutes

It doesn’t seem to cause any issues other than noise on the console screen, but that’s too much annoyance for me.  It seems there’s an easy fix.

  1. Shut down the Prime VM
  2. In VMware go into the VM settings
  3. Add a serial port (I set it to output to a .txt file in the VM’s folder)
  4. Restart the VM.

It seems that this is an issue with the serial port not being seen when expected.  You could also attempt to remove the serial interface from the OS, but I thought adding it to the VM was much easier.

Cisco ISR Project – Cisco Prime Infrastructure deployment (4 of ?)

After the OVA is deployed it’s time to set up Cisco Prime.  Prime will be the monitoring and management software for the IWAN deployment, so it’s a logical place to start.

Before starting the process, we will fix an issue discussed in my post here: https://www.mytechgnome.com/2016/02/cisco-prime-infrastructure-vm-error.html

Edit the settings of the VM and add a serial port.  (I just set it to output to a .txt file in the VM’s folder)

Time to power up the VM and connect to the console.

Prime Setup

To begin the setup, type “setup”

Prime config

Then you will get a series of prompts to configure the device.  Most are self explanatory, except the timezone.  Cisco has a list of accepted timezone names, which can be found here: http://www.cisco.com/c/en/us/td/docs/net_mgmt/prime/infrastructure/3-0/user/guide/pi_ug/timezones.html  It’s also worth noting that the ‘admin’ account entered here is for CLI access.  A web user is added later.

Prime config

The setup process will enable the network interface and run a few tests.  After that you will receive a notification that the install completed and it will ask if this is for an HA node.  I’m not setting up an HA node, so I entered “no” and continued on.

Prime Config

The default admin account on the web interface is ‘root’ so here you set the password.  After that you’ll get a prompt confirming that all is well.  After confirmation the server will finish the setup script and reboot.  This process takes a while, like 10-15 minutes.

While this is working, it would be a good opportunity to get the license key for Prime downloaded, as well as the patches and tech pack.  There is a bug in the licensing that will report the license is invalid (https://tools.cisco.com/bugsearch/bug/CSCuw89435)

The patches and tech pack can be found here: https://software.cisco.com/download/release.html?mdfid=286285348&flowid=76142&softwareid=284272933&release=3.0.2&relind=AVAILABLE&rellifecycle=&reltype=latest (note that they need to be installed in order – 3.0.2, tech pack, 3.0.2 update 2)

The device pack can be found here: https://software.cisco.com/download/release.html?mdfid=286285348&flowid=76142&softwareid=286208063&release=3.0.3&relind=AVAILABLE&rellifecycle=&reltype=latest

When the Prime startup is complete you should be able to access it from a web browser.

Prime web UI login

Remember that the login name is ‘root’ and the password is what you set for the root account, not the admin account.

After logging in you will see an icon in the top left to open a menu.  Click that, then Administration > Software Update.

Prime Menu

 In the Software Update window there is a link to upload files.  Click that, browse to the update you want, and then upload it.

Prime Software Update

After the file is uploaded you will have an Install button next to the file.  Click Install and it will confirm the install and inform you if a reboot is required.

If a reboot is required go back to console of the Prime VM and log in.  To restart Prime there are two commands needed:

ncs stop

ncs start 

And the link to the Cisco document on restarting Prime: http://www.cisco.com/c/en/us/td/docs/net_mgmt/prime/infrastructure/2-2/administrator/guide/PIAdminBook/maint_sys_health.html#pgfId-1088333

The restart will ofcourse take about 10 minutes, and you will need to do it multiple times to get the patches installed.

When all the patches are installed then the license(s) can be installed.  However, before starting that, grab the UDI of the VM.  Click the menu button in the top left, then Administration > Appliance

Prime Appliance settings

The UDI is in the right column, in the middle.  You will want to make note of the product ID and serial number.  Those may be needed to license a PAK.

After downloading the licenses from the PAK registrations (they may be in .zip files, and if so, they need to be extracted) the keys can be installed.  Again, open the menu at the top left, the Administration > Licenses.

Prime Appliance settings

To load a license click Files > License Files on the left side.  Then click Add and browse to the license file (it should be a .lic file).  Repeat those steps to install all the Prime licenses.

That will get Prime installed, patched, and on the network.  Adding devices into Prime will be covered later.

Cisco ISR Project – Deploying the OVAs (3 of ?)

The next stage of the process is relatively straight forward.  The deployment of the OVAs for the virtual appliances.

First and foremost is the resource requirements.  Each OVA will be unpacked into a VM in the environment, so we need to make sure there are sufficient resources.

Name vCPU GB vRAM GB Disk (thin) GB Disk (thick) Notes
vCM 100 2 2 1.6 254
vWAAS 2500 4 8 1.5 754
vNAM 2 4 100 Thin not available
LiveAction 4 16 230 Thin not available
Prime Infrastructure – Express  4 12 300 Thin TBD
Prime Infrastructure – Express-Plus 8 16 600 Thin TBD
Prime Infrastructure – Standard  16 16 900 Thin TBD
Prime Infrastructure – Professional 16 24 1200 Thin TBD
CSR 1000V – Small 1 4 0.6 8.3
CSR 1000V – Medium 2 4 0.6 8.3
CSR 1000V – Large 4 4 0.6 8.3
CSR 1000V – Large w/ DRAM upgrade 4 8 0.6 8.3 Requires DRAM SKU

   

For Cisco Prime, the Scaling information can be found in the Quickstart guide here: http://www.cisco.com/c/en/us/td/docs/net_mgmt/prime/infrastructure/3-0/quickstart/guide/cpi_qsg.html#pgfId-67786

For the OVA deployment it should be pretty self-explanatory.  From the vSphere client click File – Deploy OVF Template.

From the VMware vSphere Web Client select your cluster, then on the top bar click the Actions drop down and select Deploy OVF Template.

Now it’s just a matter of following the prompts based on the environment.  Find the OVA file, name the VM, set the appropriate host, network, and datastore, and the rest of the things the wizard asks for.

The CSR will prompt for a bunch of information for the setup.

CSR OVA properties

The PNSC and Intercloud settings can be left blank.  The rest depends on your environment.  I would recommend enabling SSH, so you will need to have a domain name configured for the key generation.