Rendering today’s hottest animated films requires skilled artists, massive compute time, and lots of power. We here at Arc Productions started to get the idea that powering our downtown Toronto studio might be a problem when we had to give up a portion of our parking lot for a new one-megawatt transformer. About a year later, our power bills exceeded $20,000 per month, and we knew we had to make measurement and reduction a core focus.
We never thought that installing a software solution in our datacenter and working with TSO Logic would not only lead to cutting our “render farm” energy costs, but also provide us more dynamic management of our datacenter operations.
Arc Productions is one of North America’s leading CG animation and visual effects studios. The Arc team has gained recognition in the industry for its animation in the popular feature films Gnomeo & Juliet and 9, and the visual effects work in The Amazing Spider-Man and Halo 4: Forward Unto Dawn. With more than 250 artists and the latest tools available, Arc Productions brings to life the creative visions of major Hollywood studios, including our own.
As you can imagine, the work we perform is computationally intensive. Each frame of a shot, be it animation or visual effects, requires the generation of images using sophisticated computer models. Rendering of some frames can take many hours or even days of compute time.
To support this work, we utilize 45,000 square feet of workspace, within a 600-server datacenter containing 5,400 power-hungry cores that give us the rendering capacity we need. Our business is characterized by tight deadlines, and our server infrastructure must be available on demand with high-uptime requirements.
On the IT side, we have a high-uptime requirement, which leads to significant costs; therefore, we wanted to learn more about how we were using power to see if we could identify ways to have a lean, efficient, and cost-effective operation without sacrificing performance.
We were introduced to a software solution from TSO Logic that is known as an “application aware power management” solution.
We started out by running their product in Metrics mode and gathering detailed analytics about the incoming workload, applications, and jobs being processed in our datacenter. It then detailed the power costs on a job-by-job basis, and showed us exactly how much of our power costs were being spent powering servers while actively used versus how much was being wasted by powering idle servers that were sitting in reserve capacity. The nice thing about the software was that it was installed on a single server and didn’t involve using agents, which we thought might be disruptive and risky.
The results we found right off the bat were pretty eye opening. One of our first discoveries was the fact that more than half of our servers were idle 69 percent of the time -- yet they were running at full readiness, wasting a significant amount of energy and money. We realized by power controlling these servers when they were not required, we could reap significant annual savings on our 600-server farm.
Next, we began testing TSO’s ability to power control our servers. We started with a small segment in off hours, and before long, they were running 24/7 on our farm. Using the systems settings we applied our own service-level policy and TSO enforced it. By doing this, we reduced our render-farm power cost by an impressive 56 percent while still maintaining the same performance levels. When our servers were needed, they could support the demand, and when they weren’t, we were saving money. It just made sense.
Once we had a clear picture of our energy consumption patterns, we also realized the impact our power usage had during daily peak periods -- which often resulted in peak demand charges from our utility. We were able to shift our workload by allocating low-priority business applications to be processed by our servers during off-peak hours to eliminate peak demand charges.
Through the use of the analytics available with this software, we were able to discover 120 underutilized legacy servers in our datacenter. These servers represented 20 percent of our total server farm, but were processing less than 5 percent of the total workload. We then decided as a company to make a CAPEX spend to replace these servers. Because these were newer, faster servers with better specifications, we were able to operate at 400 percent times the capacity of the older equipment even if the old equipment was running at 100 percent utilization. We would not have seen how low the utilization was without TSO Logic.
On the business side, we now understand exactly how much energy is being used for individual projects and tasks. This scale of visibility allowed us to further understand our business' operational costs at a micro-scale by dissecting our servers' variable energy costs and attaching them to individual business functions.
The most surprising thing about the whole experience was finding out just how much opportunity there is to lower costs in a standard datacenter setup. It is truly a target-rich environment when you have the right analytics and tools. We now receive insights and savings that we’ve never had before -- in a variety of different areas, not just power consumption. We achieved this without disrupting our infrastructure or day-to-day operations.
— Terry Dale is the Vice President of Infrastructure at Arc Productions, a CG animation and visual effects studio with a well-earned global reputation for quality, efficiency, and reliability.