Author Archives: paul

AI Noise Suppression is an Amateur Radio breakthrough

The Noise Problem

Background noise which is usually present in DX (distant) communication over amateur radio is challenging and creates listening fatigue. The noise is like listening to an AM radio station in a distant city fading in and out of the static. Amateur radio noise reduction has gone through multiple generations of technology to reduce noise, but eliminating noise has been elusive. Each new generation of noise reduction technology creates a significant demand from operators to upgrade their radios. Individuals who participate in amateur radio contests spend a lot on the components of their radio station to make as many DX contacts as possible during a contest. Contesters would have an easier time to score points with noise elimination. Noise also contributes to some operators’ focus on local communication since it typically utilizes noise-free and high-fidelity FM modulation which is easy to hear.

The AI Claim

I was skeptical when I read “Pretty awesome #AI noise canceling” from TheOperator on X (Twitter) earlier this week. I decided to try it out. It only took 10 minutes to configure the “RM Noise” Windows program and virtually connect it to my HF FlexRadio “server” with SmartSDR DAX. The process was easy, including registering my call sign with RM Noise.

RM Noise working with the FlexRadio SmartSDR to eliminate noise

FlexRadio is a state-of-the-art Software Defined Radio (SDR) – the Cadillac of HF amateur radios. It cost more than the first used car I bought. It’s not much more challenging to connect RM Noise to other HF Amateur Radios since the popularity of WSJT digital modulation uses the same radio interface.

FlexRadio receiving the Noon-Time Net

How Did it Work?

I was shocked listening to a net (amateur radio meeting) using RM Noise. Noise was eliminated without buying a new radio. I could hear a conversation with a weak signal barely above the noise floor that was absent on the pan adapter. I now believe a comment I read stating that they can no longer tolerate their radio without it.

Today was the first time I listened to my HF radio all day. Typically, I would get tired from the noise after an hour. I didn’t experience any noise fatigue. The sound quality is similar to listening to a local AM radio station. This is surprising since the dominant single-sideband (SSB) communication on HF rarely sounds as good as AM modulation.

I already found that RM Noise is indispensable. There are opportunities to improve it further: Weak signals are quiet. Your radio’s automatic level control (ALC) doesn’t “work” after RM Noise filtering. My FlexRadio ALC keeps a consistent audio level, which may include a lot of noise. With the loud noise component removed, only a quieter and understandable audio signal is left. Filtering adjacent radio signals must be performed on the radio itself since the software doesn’t include this functionality.

Now I understand Jensen Huang’s words from earlier this year, “AI is at an inflection point, setting up for broad adoption reaching into every industry.” , i.e., including Amateur Radio.

How does it work & the importance of AI model customization

I’ve previously experimented with NVIDIA Broadcast, which runs on my local PC with an NVIDIA RTX 3080. The 3080 includes AI functionality. Broadcast provided similar results to RM Noise for strong signals. Broadcast offered limited value since the weaker signals were dropped as noise. The difference between Broadcast and RM Noise is AI model customization. RM Noise was trained with a customized noise reduction model with the traffic heard on amateur radio phone (voice) and CW (morse code) traffic. If NVIDIA created an “Amateur Radio Noise removal” effect, Broadcast could work the same as RM Noise.

NVIDIA Broadcast connected to the FlexRadio to remove noise
NVIDIA RTX 3080 performance while removing noise

AI Runtime

RM Noise is a cloud-based application that uses a Windows client program to send and receive audio. The only hardware requirements are: “A Windows computer with internet access and the audio line in connected to a radio” according to their website.

The noisy audio from the radio is sent to a RM Noise dedicated inference server over the Internet for noise reduction. Besides returning the AI-cleaned audio to the end user in real-time, the audio could be retained to improve the AI model. As more people use the model, the data scientist has more audio to improve the customized noise reduction AI model.

RM Noise Cloud Service Status

AI Model Demystification – 40,000 foot view

AI deployment in Amateur Radio is bleeding edge and not understood by most operators. RM Noise uses an AI Model to make decisions with pattern recognition on what sounds are noise and need elimination. This core of RM Noise is a predictive model based on “listening” to a lot of amateur radio traffic.

Picture of Mount Saint Helens from my seat on Alaska Airlines which was close to 40,000 feet

My description below is based on AI fundamentals I learned while working with Data Scientists at Microsoft developing Windows 10. The 40,000-foot view of the RM Noise AI platform includes data collection, predictive model development, runtime deployment in the cloud, and continuous improvement.

Data Collection: A LOT of typical amateur radio traffic, both phone and CW, is collected in digital form. The data requires curation. This curation includes preprocessing, structuring, and cleaning. They manually inspect every recording prior to including it in the training dataset. Various data are needed, examples include different propagation conditions, spoken languages, accents, radios, QRM, QRN, CW paddle, CW straight key, CW speeds, and bandwidths.

Predictive Model Development: Existing AI models and architectures are explored with the data collected. Experiments with statistical techniques are used to measure how accurately the model’s predictions match the desired noise reduction. If the experiment bombs, the data scientist will try a new model or architecture and validate that the data was curated correctly. Potentially, more diverse data is required. Regardless, many experiments are conducted to improve the model. This is how the model is trained with actual amateur radio audio traffic. I didn’t understand when I took statistics, and quantified business analysis classes would be prerequisites for understanding AI.

My textbook from Quantified Business Analysis Class at Marquette Business School

Cloud Runtime Deployment: Once the RM Noise data scientist is satisfied with the results, the AI model runtime, also known as an inference server, is deployed on the Internet. Amateur Radio operators with the RM Noise Python-based Windows PC program will send all of their radio audio to the inference server on the Internet, returning cleaned audio.

Continuous Improvement: RM Noise accepts audio recordings for tests and problems. This data goes back to all of the previous steps already described to include in the model. The noise reduction service continually improves over time.

Whats Next?

Amateur Radio AI noise elimination is at the beginning of the change adoption curve. Innovators are starting to use it and enjoy the value it brings. There is also a need for local AI noise elimination for Amateur Radio field day and other places where radio operators don’t have Internet access. I expect these AI models to become embedded in local processing accessories and embedded inside the radio. I’m glad I took the time to investigate the RM noise, the most significant innovation in Amateur Radio this year I’ve experienced.

Why did I buy & build a VCF home lab

At last week’s VMware Explore customer conference, I was asked why I deployed VMware Cloud Foundation (VCF) at home. That question reminded me of the following:
“When people ask me what ham radio is all about, I usually respond with ‘The universal purpose of ham radio is to have fun messing around with radios.’” Witte, B. (2019). VHF, Summits, and More. Signal Blue LLC.

I’d say the same motivation from my radio hobby extends to my computer hobby, which I became interested in as a teenager.  At Washington High School of Information Technology, the first large minicomputers I had exposure to was the Digital Equipment Corp PDP 11/70 & VAX 11/780.

My PDP 11/70 fully functioning replica kit. Looking to add an actual VT-100 terminal!

VCF is a modern “minicomputer” that manages a distributed fleet of servers in a cloud model. It’s not unlike having a micro version of your own AWS data center or VMware’s version of AWS Outposts. I became excited when VCF was launched in 2016 because I understood it was the future of VMware and that it is a relatively easy cloud to deploy and manage. When I read about an “affordable” VCF Lab Constructor (VLC) tool, part of VMware Holodeck, in 2020, I decided to build my own.

Large ex-public cloud server with 768GB RAM & 32 cores in my garage running nested VCF

VCF learning lab

I use VCF to get hands-on experience using the latest VMware software and to satisfy my curiosity. The VMware product line is broad, and my role requires understanding the entire portfolio. Since virtualization is a crucial building block, many concepts are abstract when first learning about them. These abstract concepts became something I recognized after getting my hands dirty by deploying and using the software after training.

VCF with VLC reduces my personal investment in physical computing infrastructure and hardware required to deploy and operate the software through virtualization. Less hardware saves energy and reduces my electric bill. Due to the automation and ability to define your data center configuration (a/k/a Infrastructure-as-Code IaC) through configuration files, all deployment information is captured in files. These files include the VCF Deployment Parameter workbook and multiple JSON files. The configuration includes nested hosts, compute virtualization, virtualized networking with BGP routing for connectivity to my home LAN, and virtualized storage. Anyone who has designed and configured virtualized networking will appreciate the time savings of re-using a virtual network configuration and an edge off-ramp to your physical network and the Internet.

VCF deployment parameters workbook
additional IaC example to configure nested hosts in workload domain

It’s easy to save my previous work or have a fallback plan before majorly changing my private cloud. Examples include upgrading from VCF 4.5 to 5.0 or switching a workload domain from Kubernetes with VMware Tanzu to an entirely different workload, such as advanced networking virtualization with NSX IDS/IPS. This is accomplished by shutting down the VCF environment and copying the files comprising the nested host. These nested hosts are VMs. After the nested host files are safely stored, I can start fresh using the previous IaC configuration as a starting point. Virtualization, automation, and IaC free a lot of time and resources to make my learning more efficient.

Holodeck allows far greater capacity in a lab environment with the capability of quickly saving & switching workload domains. This image shows 5 different examples of workload domains for hands-on learning

VCF provides an integrated private cloud software suite with custom workload domain and lifecycle domain management. I wonder how anyone could fully understand how VCF works in a production environment without hands-on experience. Without VMware Holodeck, the investment to deploy VCF for learning would be an order of magnitude more expensive and out of reach for me.

The file system on my large server – each nested host is represented by files for each VM

What about Production workloads?

I also have a traditional VMware private cloud home lab. With this lab, I understand how the VMware stack runs directly on bare metal with a physical switched network, vSAN, and NAS. Second, this home lab has taken on amateur radio workloads in production 7×24. These production workloads provide me with the experience of managing and operating an environment I can’t just turn off and wipe out. I’ve learned the discipline of managing vSAN, backups, and a graceful UPS shutdown with production constraints.

Production workloads in my traditional VMware home lab
Traditional VMware Home Lab in my office closet

Robot DNS server died

My secondary DNS server died after four years of service. The DNS has 118 host records, and 45 of those records are for my VMware Cloud Foundation Server.

Atomic Pi DNS server – RIP

The Atomic PI – Intel PC was designed for a home robot that never made it to market. In 2019 I put it in service running Ubuntu Linux, providing BIND and NTP for my home. It was an experiment with a $35 PC, $4 plastic Walmart case, and a $9 power supply from Amazon which was an unheard-of price point in 2019.

Dell/Wyse 3040 Thin Client PC Replacement

I replaced the DNS server with a used $39 Dell Wyse 3040 thin client PC from eBay, which is typically used for VDI with a VMware Horizon client or Citrix Virtual Desktops. I’ve been pleased with the first Wyse 3040 I use as an amateur radio repeater/hotspot. The hardware specs are similar to the Atomic PI, and I splurged for twice as large 16GB flash storage model. It’s smaller and comes in a real case.

$39 replacement DNS Server

UEFI Challenge

The thin client PC requires UEFI, which requires an additional step to install the operating system. I used Ventroy to build the usb drive image with UEFI.

Ventroy to build USB boot image for Ubuntu Linux

Ventroy is a 2-step process. First, Ventroy will prepare the USB drive and create an empty partition. The second step is to copy the Ubuntu image into the empty partition. I was able to take the USB drive I prepared with Ventroy and boot Ubuntu Linux and deploy it. After installing and configuring BIND, my secondary DNS server is up and automatically replicating entries from my primary DNS.

VMware Cloud Foundation (VCF) 5.0 home lab upgrade

I started my VMware Cloud Foundation (VCF) home lab journey three years ago on version 4.01 and shared my experience with a blog post. I’ve upgraded my server and deployed nested VCF 5.0 on ESXi 8.0 Update 1 this month. Currently, I have a 4-host standard management domain and a 3-host workload domain deployed in VCF.

The latest VCF and ESXi, NSX Data Center, SDDC Manager, and vCenter versions deployed

Other upgrades included expanding my host RAM to 768GB from 512GB, replacing my primary storage with a 4TB SSD Samsung 870 QVO SDATA III, and fixing my 10GBase-T SFP shielded Cat 6a connection.

Physical network issue resolved: SFP+ 10G transceiver and wiring to server in garage

I discovered that one of my 24 DIMM memory sockets was bad during the memory upgrade. Luckily I found a new motherboard replacement on eBay for $105 shipped, which resolved the problem.

BIOS error due to faulty DIMM slot on motherboard requiring replacement

Stability and Performance Improvements

These changes have improved the stability and performance of VCF on a single host using the VMware VCF Holodeck Toolkit to deploy. The toolkit provides guidance and tools, including VCF Lab Constructor (VLC), to deploy nested VCF hands-on-lab environments on a standalone ESXi host. I have yet to experience the network connection dropping or the server hanging from losing local storage. The additional RAM and latest versions of VMware software have made VCF seem as performant as my primary home lab, a traditional vSphere cluster of 3 physical servers.

One more change…

I also changed the sizing and architecture in the Deploy Parameters in the VCF Deployment Parameter Workbook. The VMware Cloud Builder appliance uses this information to create the management domain. Circled below is where I increased the vCenter Server Appliance Size & NSX Virtual Appliance Size back to the default and chose a standard instead of consolidated VCF architecture. With the RAM upgrade, I was okay with utilizing the limited memory for the recommended defaults. I no longer have resource exhaustion errors in NSX Manager, SDDC Manager, or vCenter with all of these changes.

Deploy Parameters in the VCF Deployment Parameter Workbook

New VCF Adventures

With the more reliable and better performing VCF infrastructure, I’m ready to deploy the latest versions of NSX Edge Appliances, Tanzu, NSX Advanced Load Balancer, and Aria Enterprise Suite on top this fall.

Physical Host running ESXi 8.0 Update 1 where nested VCF is deployed
Each nested host for VCF deployed as a VM on the physical host, plus a jump host for deployment, UPS shutdown agent, and VyOS router appliance for BGP routing into my home LAN.

How I passed my Amateur Radio test – easier than I imagined

How it started:

After starting 40 years ago I finally passed my FCC Amateur Radio technician exam. In my early teens my brother was going for his ticket and I borrowed his morse code practice cassettes. I gave up on learning morse code due to getting distracted from by my PC @ home. Many years ago the FCC eliminated the morse code requirement removing that hurdle. My brother has been an amateur radio operator for at least 40 years.

Recently I was out of cellular phone coverage at Mount Rainier National Park which provided motivation to take the exam. The exam preparation was far easier than I imagined. In the computer industry, certification test questions are secret and highly guarded. In contrast, all of the Amateur Radio exam questions pool are published which is over 400.

View from Ohanapecosh campground, Mount Rainer National Park

My Method for Passing the Exam

Following is the straightforward approach I followed which concentrated my effort in understanding the material and passing the exam. I’m planning on upgrading to a General license in the near future by going through the same approach. I recently found this content and I don’t receive any compensation from any of these folks.

  1. Watched the Youtube recording of the Amateur Radio Technician material review class from the 2021 Trenton Computer Festival.
  2. Bought and read the following book twice: Technician Class 2018-2022: Pass Your Amateur Radio Technician Class Test – The Easy Way
  3. Took the sample exam through the following app approximately 15 times: HAM Test Prep: Technician
  4. Booked and took the exam online once I was ready. Cost: $15. Auburn University Amateur Radio Club provides a great public service. They use Zoom to monitor the test taker. I missed only 1 question and received my license from the FCC the next day.

Whats Next?

In addition to using amateur radio to communicate from remote locations, I’ll be exploring all of the remote computer communication solutions available.

Modern Day Icarus Story – Part 2 NSX Manager Restore

My previous blog described how I was lucky that the Linux filesystem check (fsck) command repaired my critical vCenter server VM which manages my home lab. My VMware NSX-T Manager version 3.1.2 VM also suffered a corrupt file system due to the physical switch failure. This failure halted the appliance. NSX-T is a critical networking infrastructure component in my home lab supporting multiple virtual network segments, routers, and firewalls.

corrupt NSX-T Manager VM

If this was a production deployment of NSX-T recovery isn’t necessary. VMware has made it crystal clear that NSX Manager requires 3 nodes and it is recommended that they are placed on different hosts. These 3 nodes are separate instances of the NSX Manager VM each with a distributed & connected Corfu database. Each node has the same view of the NSX-T configuration and they are always synchronized. NSX-T Manager continues to operate even if one of node fails. However, I only had a single NSX-T Manager node deployed since this is a home lab learning environment. The high availability easy button provided by NSX-T didn’t exist since I didn’t follow VMware’s guidance of deploying 3 nodes. Recovery was necessary for my NSX-T deployment.

NSX-T Manager fsck

I followed Tom Fojta’s “Recovering NSX-T Manager from File System Corruption” blog to recover the NSX-T Manager file system. This was more complicated than repairing the vCenter file system covered in Part 1 of this blog series since the Linux kernel was unable to start due to the root file system corruption.

Recovering the file system following the steps in Tom Fojta’s blog

This time recovering the file system didn’t work. Linux successfully booted and NSX-T Manager started. When I checked the NSX-T Manager cluster status the state would remain in the dreaded UNAVAILABLE state. I was hoping to see the output shown below which is from a healthy NSX-T Manager. I reviewed the NSX-T logs but the problem eluded me.

example of a healthy NSX-T cluster

I decided to stop troubleshooting and attempt restoring the NSX-T configuration from my backup.

Restoring the NSX-T Backup

Restoring the NSX-T backup is straightforward. My first step was to start all edge appliance VM’s from the previous deployment. I didn’t find this step documented but after my second attempt I learned that this is the easiest way to restore the entire NSX-T environment. If the edge appliance VM’s are gone or corrupted they can be redeployed from NSX-T manager after restoring the backup.

I keep a OneNote filled with my entire NSX-T configuration including the NSX-T Backup Configuration. The correct parameters and passphrase must be provided to restore a backup. I also keep a copy of the NSX-T Unified Appliance OVA deployed in my home lab. By keeping a copy of the deployed OVA the backup is tied to the same version of the appliance.

NSX-T backup configuration from my OneNote notebook

The second step is to deploy the NSX-T Unified Appliance OVA and start the VM. After the NSX-T Manager UI is active, it is necessary to re-enter all of the backup configuration parameters used in the backup. Once the backup configuration is entered, the backups available to restore are shown below.

NSX-T Configuration backups available

Once the NSX-T backup is selected for restoration the following steps are displayed:

Explanation of the NSX-T configuration restore process from the NSX-T Manager UI

The following restore status is shown with a progress bar.

Restore process status

After the NSX-T Manager UI reboots the following completion message is displayed. Total restore time was 42 minutes where I only had to watch the progress unfold.

Restore process completed

Success

This was the first time I attempted an NSX-T restore from my backup. I’m glad I went through the steps to configure a sftp server to hold my backup on a unique storage device. This was a big time saver. I could have also corrupted my NSX-T configuration backup with the physical switch failure if I had placed the backup on the same NAS NFS server. With my VMware home lab restored I can get back to work on my original goal of deploying HCX.

NSX-T up and running

Modern Day Icarus Story – Part 1 fsck

Configuring VMware HCX in my home lab to migrate VM’s between two VMware vCenter clusters was my goal this week. HCX simplifies application mobility and migration between clouds. Last week I successfully paired both sites and I was ready to extend the network.

I discovered that my target site was inaccessible on Monday morning. I was disappointed since this worked last week. The troubleshooting process pointed to my tp-link T1700G-28TQ switch in my home lab as a possible culprit. After ping failures, I unplugged the Ethernet cable connected to my target site router and to my surprise the link light stayed on instead of going out. Quickly I discovered that the management plane of the switch crashed but the data plane was still switching some but not all traffic. I rebooted the switch and the networking problem was solved. I successfully logged into the HCX target site but I started to feel the heat from the sun melt the wax in my wings.

tplink switch at top of rack

I didn’t expect I would run into new problems at the source site that after I solved the target site networking problem. The management UI for both NSX-T and vCenter Server at the source site weren’t accessible. I started to loose altitude from some feathers coming off my wing once I saw the dreaded write failures from their Linux console on both VMs. My home lab uses both VMware vSAN and NFSv3 on a QNAP NAS for storage. These critical VM’s were stored on the QNAP NAS. This NAS has one network path through the failed switch. I wouldn’t of have any issues if I stored these VM’s on vSAN since these servers are connected to two switches for redundancy in case of a single failure. After rebooting both management VM’s I saw that the file systems were corrupted and the VM’s were halted.

VMware vCenter file system errors on console

I knew I wouldn’t crash and drown in the ocean below like Icarus when I was able to successfully boot the VM and access the vCenter Server UI after cleaning the filesystem. I followed VMware knowledge base article 2149838 which described the recommended approach with e2fsck.

Prior to taking an in-depth enterprise Linux class I would have been anxious editing the grub loader to change the boot target and clean the file system. However these steps were now second nature to me since I had to do these steps by memory to pass the associated hands-on Linux certification from the class.

I haven’t managed my home lab like an enterprise environment by taking shortcuts to save time and money. I was lucky that fsck worked since I didn’t have a vCenter or a distributed virtual switch (dvs) backup. Due to this hard lesson I configured a vCenter backup schedule and exported the dvs configuration. My next blog will go over the steps I took to recover the NSX management console and VM.

vCenter Backup UI

ESXi on ARM: Deploying my smallest home lab server after my largest server deployment

Over the summer I deployed a large enterprise SuperMicro Server with a half terabyte of RAM and 36 cores provided by 2 Intel Xeon E5-2683 v4’s. I deployed a nested VMware Cloud Foundation 4 with Tanzu Kubernetes Grid on this system and I’m still learning. My last blog post links to a YouTube presentation on my experience

The massive nested VCF server in my garage

I learned through Twitter yesterday morning that VMware released an ESXi on Arm Fling free technology preview. I ordered a new Raspberry Pi 4B with 8GB of RAM from Amazon in the morning and had ESXi live on the system by the end of the day. The Raspberry Pi is close to the size of one of the Intel Xeon processors in the SuperMicro VMware Cloud Foundation Server I deployed over the summer. The electrical power requirements for the Raspberry Pi is insignificant compared to the SuperMicro enterprise server running VMware Cloud Foundation.

Raspberry Pi 4B 8GB RAM circled in red inside my half rack

Kit Colbert published a blog last week describing use cases for this game changing technology. In addition, he presented on it in the “The Datacenter of the Future [HCP3004]” session last week during VMworld. VMworld session recordings are available through the vmworld.com site and registration is free. I was excited to gain hands on experience.

WOW – this technology is amazing. After deploying ESXi to a USB memory stick, I connected this host to my vCenter server. Next I created and connected an NFS datastore from my QNAP NAS to the host.

ESXi on ARM alive in my vCenter

I pulled out my iPad mini and saw the new host in the vSphere Client fling.

rpi1-esxi is the Raspberry PI on the vSphere Client Fling for iPad

I downloaded and deployed the ARM version of Ubuntu 20.04 and RHEL 8.2 as VM’s. I compiled VMware Tools on the Ubuntu VM and installed the GUI (graphical.target) to both Linux VM’s. I still have a little memory to spare on the Raspberry Pi for another VM. Both of the VM’s were responsive even with the GUI. It is hard to tell that ESXi and Linux is running on ARM since the Operating Systems are unchanged. The largest obstacle is the ARM requirement for software. I now understand through this experience why Apple is rumored to release a MacBook with an ARM processor.

Ubuntu VM on left, Red Hat VM on right

Over the summer I learned about the reimagined VMware Cloud Foundation from the top down with Tanzu Kubernetes Grid, NSX-T, vSAN, and SDDC Manager. Now with ESXi on ARM I am learning about the next chapter of VMware Cloud Foundation. If you always wanted a VMware vSphere home lab this is the most inexpensive path to get started.

VMware Cloud Foundation 4.01 VMUG presentation

I’ve been busy this summer deploying VMware Cloud Foundation 4.01 (VCF) at home.

Deployment on SuperMicro 6018U-TR4T SuperServer in my garage

Yesterday I presented my experience on VCF hosted in my home lab on a single nested host. After I read the VMware blog post in January I couldn’t wait to deploy with the VLC software. Click for a recording of my presentation & demo at the Seattle VMware User Group (VMUG).

Following are the links from my presentation:

A future goal is to develop a blog series on the presentation and what I learned.

Don’t let the Bear Hibernate

Black bear increasing energy stores in the authors backyard for hibernation last winter

I’ve continued contributing computing resources non-stop to science researchers since my March post. A byproduct is learning how my home lab operates at full throttle and the energy implications. My last blog discussed some of my original sustainability learnings.

Impact to host @ 95% throttle

VMware vRealize Operations Manager providing additional compute capacity

I drove CPU usage to approximately 95% when I started to donate all of my excess compute capacity. Shortly after operating at full throttle, an alert popped up in VMware vRealize Operations Manager 8.0 console. This alert provided proactive performance improvement recommendation – and an idea for this blog post.

VMware vRealize Operations Manager 8.0 alert. VMware knowledge base article explaining how to fix the issue shown above.

I learned that the most energy efficient setting for my home lab servers was to turn off all of the processor energy savings features. This lesson was counter intuitive. Once my home lab was operating at full utilization the servers wasted processing power and energy by attempting to turn on power saving features. The default server configuration assumed that the current task was a momentary spike in demand. Once the sprint was over, the processor would start shutting down excess capacity. Due to the high utilization, another spike in demand quickly arrived and the processor would switch to maximum capacity. Now the processor would need to ramp up. This incorrect assumption led to a reduction in processing capacity and slowed the scientific research workload. The energy consumption didn’t decrease but the amount of work completed was reduced.

Who Should Sleep?

Sleep states and hibernation for bears and computers are necessary to save energy stores when nothing is happening. Both species go through a “waking-up” state which takes time and energy. Our Pacific Northwest bears benefit from powering off unnecessary functions in the winter but a server processor at full capacity does not. This only slows down the workload while wasting energy which isn’t a sustainable solution.

Turning Off Power Saving Features

The 3 SuperMicro SuperServer E300-8D’s in my home lab have rudimentary power management features. The p-state and c-state features allow processors to shutdown excess capacity. This feature is similar to an energy efficient pickup truck engine which turns off pistons that aren’t needed at highway cruising speed. Following are the default AMI BIOS p-state and c-state settings for these servers. I have disabled both settings that are highlighted.

Sustainable Configuration

The alerts stopped once I configured the servers for compute intensive workloads running non-stop. Enterprise servers are complex and default settings reduce the time and understanding required to stand-up infrastructure. VMware vRealize Operations Manager highlighted this mis-configuration which I wouldn’t have found otherwise. This is one example of many where this tool has pointed out hidden problems and taught me something new. I never expected that turning off all power management features is the most sustainable option.