Tipalti’s payment-processing engine, a towering .NET Framework 4.7 monolith that once powered millions of transactions from bare Amazon EC2 instances, has undergone a quiet but radical transformation. The finance automation firm lifted the application into Windows Server containers, deployed it on Amazon EKS, and wired it to autoscale with RabbitMQ—slashing infrastructure costs by 60% while gaining operational dexterity that its old setup never allowed.

For years, the monolith ran on a handful of large EC2 Windows VMs, manually scaled, patched, and nursed by a dedicated operations team. That model was expensive, brittle, and increasingly out of step with a world moving toward cloud-native patterns. Moving a .NET Framework 4.7 application—a version that predates the modern, container-friendly .NET Core—to Kubernetes on Windows nodes required navigating a thicket of technical constraints. But Tipalti’s team pulled it off, offering a template for other enterprises still dragging monolithic Windows workloads into the cloud era.

The Monolith That Wouldn’t Die

Tipalti’s core payment engine is a classic enterprise workhorse: built in C# on .NET Framework 4.7, it handles invoice processing, cross-border payments, tax compliance, and reconciliation. It is stateful, memory-hungry, and tightly coupled to Windows APIs and IIS. For years, it lived on a set of m5.xlarge EC2 instances, each hosting the entire application. Scaling meant launching more instances behind an Application Load Balancer—a manual, slow process that could not react to sudden spikes in payment requests.

“We were burning money on idle instances just to stay safe for peak loads,” said a senior infrastructure engineer at Tipalti, speaking on background. “And when something broke, troubleshooting meant remoting into Windows boxes and hunting through scattered logs.”

The team considered rewriting the monolith in .NET Core, but that would have taken two years and carried unacceptable business risk. Instead, they decided to containerize the existing .NET Framework 4.7 codebase, host it on Windows containers, and orchestrate everything with Kubernetes. The goal: decouple deployments from infrastructure, enable granular autoscaling, and slash the cost of idle capacity.

Why Windows Containers on EKS?

Amazon EKS added production-grade Windows container support in 2019, but enterprise adoption has been slow. The learning curve is steep: Windows container images are enormous, often exceeding 10 GB, and require careful tuning to avoid long pull times and node startup delays. Kernel compatibility between container host and image must be exact—a Windows Server 2019 container won’t run on a 2022 host and vice versa.

Yet for Tipalti, EKS was the logical choice. They already ran Linux-based microservices on EKS, so extending that control plane to Windows nodes unified their cluster management. EKS automates the heavy lifting of the Kubernetes control plane, handles security patches, and integrates with AWS IAM, VPC, and CloudWatch. Crucially, EKS managed node groups with Windows support meant they could treat fleets of Windows EC2 instances as disposable farm equipment rather than pets.

To containerize the .NET Framework application, they built a Windows Server 2019 base image with all required IIS features, ASP.NET 4.7 runtime, and custom dependencies baked in. The Dockerfile pulled the compiled application artifacts from an S3 bucket, configured IIS application pools, and set up health checks. The resulting image weighed in at 12 GB—a heavyweight by Linux standards but manageable once cached on EKS nodes.

Taming the Giant with Kubernetes

Once the container image was ready, the team defined a Kubernetes Deployment for the payment engine pod. Because the application is stateful—it caches payment rules and maintains in-memory sessions—they opted for a deployment strategy with a single replica per pod, avoiding sticky sessions. Instead, they moved session state to an external Redis cache, making pods truly stateless from the application’s perspective.

Windows pods on EKS come with resource constraints. The minimum CPU request for a Windows pod is 1 vCPU, and memory requests must align with the node’s available resources. Tipalti chose c5.2xlarge instances as worker nodes, each providing 8 vCPUs and 16 GiB of RAM. They overprovisioned slightly to allow Kubernetes to evict and reschedule pods without hitting resource starvation.

A critical early challenge was the time to pull the 12 GB image when a new node launched. To mitigate this, they used EKS-optimized Windows AMIs that included a pre-cached image layer, slashing pull times from 15 minutes to under 3 minutes. They also configured pod disruption budgets to ensure zero-downtime rollouts.

Autoscaling with RabbitMQ: From Hours to Seconds

The old system scaled on a fixed schedule. The new system scales on real demand, using RabbitMQ queue depth as the signal. Tipalti’s payment engine consumes messages from a RabbitMQ exchange for each payment batch. When queue depth exceeds a threshold, the Kubernetes Horizontal Pod Autoscaler (HPA) kicks in, adding pods. When queues drain, pods are scaled back down.

This queue-driven autoscaling is implemented via a custom metrics pipeline. RabbitMQ exposes message counts through its HTTP API; Prometheus scrapes those metrics and feeds them to the Kubernetes custom metrics adapter. The HPA then targets a pods-per-queue-depth ratio, ensuring that no payment sits in the queue for more than a few seconds before processing.

“We saw our peak-to-trough instance count swing from 8 to 2 pods within minutes,” the engineer noted. “Previously, we would have kept 6 instances running all day just to handle a 2-hour spike.” This elasticity directly translates into cost savings: during off-peak hours, the cluster scales down to a footprint 70% smaller than the old static fleet.

Centralized Logging and Observability

Kubernetes gave Tipalti a uniform logging layer. Instead of hunting through individual Windows event logs, they deployed Fluentd as a DaemonSet on each Windows node, tailing IIS logs, application logs, and Windows event logs, and forwarding them to Amazon CloudWatch Logs. Structured logging was enforced in the application code, ensuring that every payment transaction produced a JSON log line with trace IDs.

Prometheus and Grafana dashboards visualize RabbitMQ queue metrics, pod CPU/memory, and application-level latency percentiles. Alerts are routed to PagerDuty when queue depth exceeds safe limits or pod restart counts spike. This observability stack virtually eliminated the need for engineers to remote into individual containers, dramatically reducing mean time to resolution for incidents.

Crunching the Numbers: 60% Cost Reduction

How did they achieve a 60% reduction in compute costs? Three factors collided:

  • Right-sizing: The old EC2 instances were oversized to handle peak loads, with an average utilization of 30%. Kubernetes pods are right-sized to the application’s actual resource needs, and HPA ensures only the necessary pods run at any time.
  • Spot instances and savings plans: EKS managed node groups support a mix of On-Demand and Spot instances. Tipalti shifted 70% of Windows worker nodes to Spot, with fallback to On-Demand via capacity-optimized allocation strategies. Combined with Compute Savings Plans, the effective instance cost dropped by over 40%.
  • Shared node overhead: Running multiple pods on a single node amortizes the Windows OS licensing cost. A single c5.2xlarge instance can host up to 6 pods (limited by ENI attachment rules), sharing the Windows license fee across workloads.

Infrastructure as code, via Terraform, ensures that the entire EKS cluster, node groups, and RabbitMQ deployment can be recreated in minutes. This reprovisioning agility allowed the team to experiment with instance types and pricing models that were too risky to try manually.

Operational Wins Beyond Cost

The migration also delivered less quantifiable but equally important benefits:

  • Immutable deployments: Every code change now flows through a CI/CD pipeline that builds a new container image, scans it for vulnerabilities, and rolls it out via Kubernetes rolling updates. Rollbacks are instant if smoke tests fail.
  • Security hardening: The Windows container host is patched automatically as part of EKS managed node group updates. Pods run with restricted security contexts, and the application no longer requires local admin rights—a long-standing thorn in the side of the security team.
  • Disaster recovery: Because the entire state is externalized to Redis, and the database is Amazon RDS Multi-AZ, restoring the payment engine after a zone failure is a matter of pointing a new EKS cluster at the same RabbitMQ broker and Redis endpoint. Recovery time was slashed from hours to minutes.

Lessons Learned for the .NET Framework Crowd

Tipalti’s journey surfaces several hard-won lessons for teams contemplating a similar move:

  1. Windows container images are not free. At 12 GB, the image size impacts CI/CD pipeline speed, network transfer costs, and node warm-up time. Invest in image-layer caching and a registry close to your EKS cluster.
  2. Persistent storage remains tricky. While the payment engine became stateless at the application level, IIS worker processes still need writable temp spaces. Tipalti used Amazon EBS volumes mounted via Kubernetes persistent volumes to handle temporary file needs, but they encountered Windows container file permission quirks that required custom PowerShell scripts in the entrypoint.
  3. Kubernetes networking on Windows is different. Windows pods use a different CNI plugin (Windows VPC CNI) and have strict limits on the number of pods per node based on ENI attachment rules. The team hit pod limits early and had to adjust node group scaling triggers to avoid scheduling failures.
  4. Don’t fight the monolith’s nature. The team considered breaking the monolith into microservices but quickly abandoned the idea after analyzing the tight coupling. Instead, they treated the entire application as a single Kubernetes pod, focusing on infrastructure agility rather than code refactoring. This pragmatic choice delivered 80% of the benefit with 20% of the effort.

What’s Next: Toward .NET 8 and Beyond

Tipalti is not stopping here. With the payment engine now running in containers, they plan to incrementally port components to .NET 8 when the business cycle allows. The containerization move has already forced the team to modularize configuration, externalize state, and adopt CI/CD practices that will make a future re-platform smoother.

For now, the .NET Framework monolith hums along inside Kubernetes, proving that legacy Windows applications don’t have to be anchors. With the right orchestration, they can become agile, cost-efficient, and even—dare we say it—cloud-native.