Social Media

Enter your email address:

Delivered by FeedBurner

Search
  • Contact Me

    This form will allow you to send a secure email to the owner of this page. Your email address is not logged by this system, but will be attached to the message that is forwarded from this page.
  • Your Name *
  • Your Email *
  • Subject *
  • Message *
Navigation

Entries in Google (337)

Wednesday
Nov052014

The Cloud Battle, A War to Sell Data Center Bits - Amazon, Google, Microsoft

This time of year is turning into a Cloud Battle, a war between Amazon, Google, and Microsoft to deliver bits as a service from data centers. iPhone vs. Android is a battle of mobile bits.  OS X vs. Windows 7/8/10 is a battle of desktop bits.  The Cloud is a battle to deliver bits as a service from data centers.

Microsoft had their cloud, and Google just finished theirs.  Next week is AWS Reinvent.  The media covers the battles.

Google's Newest Attack On Amazon

When I read so many of the media articles though I think they are focused on how big fleet is or the latest technology.  Huh?  Like this article makes the point of measuring the naval power by the tonnage of the fleet misses the point.

Measuring Naval Power: Bigger Ain’t Always Better

...

Navies were largely symmetrical in those thrilling days of yesteryear. That simplified matters. Size was a decent proxy for fighting power when battle fleets made up largely of capital ships bearing big guns squared off. That was before the era — an era that persists to this day — when small craft could carry armament comparable to that of capital ships. A destroyer couldn’t tote big guns back then. A lowly missile boat or sub can fire munitions comparable to those of a capital ship today — and to the same deadly effect.

I have got a chance to close hand see how executives at Google (Urs Hoelzle), Amazon (Werner Vogel), and Microsoft (Scott Guthrie) perform at Gigaom Structure on stage and behind.  It’s kind of like seeing the Generals/Admirals of the military.

This is not a simple battle where more servers and more MW of data center capacity win the war.  How well your team operates using the technology which in the case of the bits (software) was created by other team members is so important.

I think I could write a whole book on the battles between between Google, Amazon, and Microsoft. In fact, I am sure there is someone who has already made a book proposal for this.  Unfortunately or fortunately, I am too busy working on other things to document things in an entertaining way to sell a book.  What I can do is watch as an observer to see strategies being played.

The Cloud Battle may be one of the most interesting technology wars fought with billions of dollars of data centers and IT equipment and 10,000s of development staff, reaching around the world.

Below is Google’s Points of Presence.

NewImage

Oh, one point I do want to make that I forgot is.  Just like Sun Tzu the Art of War Point 18.  “All warfare is based on deception”  The good know how to deceive the enemy and they can use the media to spread the deception.  Don’t believe everything you read.

18. All warfare is based on deception.

Monday
Nov032014

Google's Focus on Performance improves Data Center PUE 8 - 25%, finding the hidden story in the data

Google announced its use of Machine Learning to improve its data center PUE in May 2014 and I posted on the release.  At 7x24 Exchange Fall 2014 event, 25 years of 7x24 Exchange were celebrated and Google’s Joe Kava, VP of Data Centers presented on “Google - beyond the PUE Plateau.”  The keynote is one of the more interesting and insightful presentations made as Google shared information on its experience deploying Machine Learning to its data center fleet.  One of the questions from the audience was “how was the first data center chosen to use Machine Learning?”  A special guest in the presentation was the data center mechanical engineer who spearheaded the project, Jim Gao.  His answer.  The data center that has most clean data to work with.

NewImage

Jim Gao and Joe Kava, 7x24 Exchange Fall Conference.

So what can this 25 year old mechanical engineer do with Machine Learning?  Below is data showing PUE, Wet Bulb, and Cooling Temperature across a range. The Blue areas are good, green not as good, yellow and red are bad.

NewImage

Some of you may be saying big deal.  I can figure out how to run the mechanical systems with a low PUE at a given wet bulb temperature to hit a given cooling temperature.  Well the above was a graph to illustrate what can be seen looking at performance data.  What is beyond our ability to see is working out the best way to run your mechanical systems with 19 Input Variables.  The below are the 19 inputs to the Predictive PUE Machine Learning system to figure out the lowest energy consumption.

NewImage

FYI, this predictive PUE system does not have autonomous control over mechanical systems.  It does provide information to the data center facility engineering teams on how they can improve PUE performance.  The predictive PUE model is 99.6% accurate.  Jim and Joe discussed how Google looked for a high degree of confidence in order to trust the numbers, and the human operators are an important part of the process like UPS drivers on their route.  UPS is infamous for creating better routes for its drivers, but I bet they were not even close to the % savings Google achieved.

So how good are the results?  Google achieved from 8% to 25% reduction in its energy used to cool the data center with an average of 15%.  Who wouldn’t be excited to save an average of 15% on their cooling energy costs by providing new settings to run the mechanical plant?  Below is an example of what was historical PUE (blue) and New PUE (green) for a site.

NewImage

One of the risks Google took in this presentation is they let a 25 year old mechanical engineer get on stage.  Was the risk of the kid presenting?  No, Jim was as polished as many who have presented for years.  The risk was everyone at 7x24 Exchange knew who Jim was and they could try and see if he would consider leaving Google.  :-)

NewImage

The idea of using Machine Learning in data centers is new and have shown what can be discovered in the data.  It’s like there was a hidden story there waiting to be told.  Does you data center staff look for hidden stories in the data?  Shouldn’t you if you can save between 8-25% of the energy in systems.

Tuesday
Oct282014

15+ years of Google Data Center Executives

I wrote a popular post on 10 years of Microsoft data center executives.  Writing about Google’s data center executives is a good follow up.

Google’s current Data Center executive leadership are Urs Hoelzle, Ben Treynor, and Joe Kava.  Urs has no LinkedIn profile, but he does have a wikipedia post and has been with Google since the beginning being Google employee #8.  The data center group is part of Ben Treynor’s organization. Ben joined Google in 2003.  VP of Data Centers is Joe Kava, joining Google in 2008.  

Urs posted on its Google datacenter in 1999.

Shared publicly  -  Feb 4, 2014
 
 
15 years ago (on Feb 1st, 1999) I first set foot in a Google datacenter. Well, not really -- in the Google cage in the Exodus datacenter in Santa Clara.  Larry had led me there for a tour (I wasn't an employee yet) and it was my first time in any datacenter.  And you couldn't really "set foot" in the first Google cage because it was tiny (7'x4', 2.5 sqm) and filled with about 30 PCs on shelves.  a1 through a24 were the main servers to build and serve the index and c1 through c4 were the crawl machines.
 

It is not easy to find who were people who were data center executives from 1999 to 2003.  Ben Treynor in 2003 was the start of the site reliability engineering at Google and according to Ben’s linkedin profile he picked up the data center group in 2010 and in 2014 is responsible for the Google Cloud.

Vice President, Engineering

Google

October 2003 – Present (11 years 1 month)Mountain View, CA

Responsibilities:
Site Reliability Engineering: 2003-present
Global Networking: backbone, egress, datacenter, and corporate: 2004-present
Global Datacenters: construction, engineering & operations: 2010-present
Global Servers: operations 2010-present
Google Cloud: 2014-present

Joe Kava has been the consistent presenter from Google on what is happening in the data center group, presenting at 7x24 Exchange, Uptime Symposium, Datacenter Dynamics, and many other industry events.

Vice President - Data Centers

Google

April 2008 – Present (6 years 7 months)Mountain View, California

Responsible for design, engineering, construction, operations and sustainability for Google's global data centers.

Wednesday
Oct222014

Comparing Microsoft's VP of Cloud Infrastructure to Google's VP of Data Centers via LinkedIn Profiles

Microsoft has put a new VP in charge of its Cloud Infrastructure group retiring the role of VP of Global Foundation Services.  GFS’s logo looked like this.

NewImage

Global Foundation Services (GFS) is the engine that powers Microsoft's cloud services. Learn more.

When I Google Search “Microsoft Global Foundation Services” what shows is Microsoft Cloud Platform with little trace of Global Foundation Services and the words Global Foundation Services (GFS) are gone.

NewImage

So the changes have started in Microsoft’s data center group.  What changes are there in the future?

One way to look at what the future will be like is to compare the new Microsoft VP's public profile vs. a competitor.  I could pick Amazon as competitor, but Google is bigger in terms of a data center presence.  So let’s look at Microsoft’s Suresh Kumar, VP of Cloud Infrastructure and Operations vs. Google’s Joe Kava, VP of Data Centers.  The below is from their LinkedIn profiles as of Oct 21, 8:30p.  I am referencing the date and time of this post as things may change as profile get modified.  2 days ago Sumar’s picture was this.

sureshSuresh Kumar, via LinkedIn

 

 

 

 

 

 

Now on LinkedIn Suresh’s photo is below.

NewImage

Both Suresh and Joe have 500+ connections.

On Suresh’s profile his top skill at 27 in e-commerce.  Joe’s top skill at 117 is Strategy.

Joe has 66 for Data Centers.  Suresh has 0.

Here is Suresh’s top 10 skills.

NewImage

Here is Joe’s top 10 skills.

NewImage

The one area where Suresh and Joe are close is 11 and 14 for Cloud Computing.  

NewImageSuresh

NewImageJoe

When you look at the above numbers who would you choose to build your Cloud/Data Center Infrastructure?  This has been an interesting way to look at two different executives using LinkedIn profiles.  With fresh eyes I went and looked at my skills listed on my LinkedIn profile.  You may want to as well and think about how your skills are listed.

Oh the other area Suresh and Joe are equal is it looks like both of them now have photos that their corporate PR groups say is OK to have on a public facing site.

NewImage

Joe Kava, via LinkedIn

 

 

Sunday
Oct122014

Two Ways to Save Server Power - Google (Tune to Latency) vs. Facebook (Efficient Load Balancing)

Saving energy in the data center is more than a low PUE.  Using 100% renewable power while wasting energy is not a good practice.  I’ve been meaning to post on what Google and Facebook have done in these areas for a while and have been staring at these open browser tabs for a while.

1st is Google in June 2014 shared its method of turning down the power consumption of a server as low as they could as long as it met performance latency.  The Register covered this method.

Google has worked out how to save as much as 20 percent of its data-center electricity bill by reaching deep into the guts of its infrastructure and fiddling with the feverish silicon brains of its chips.

In a paper to be presented next week at the ISCA 2014 computer architecture conference entitled "Towards Energy Proportionality for Large-Scale Latency-Critical Workloads", researchers from Google and Stanford University discuss an experimental system named "PEGASUS" that may save Google vast sums of money by helping it cut its electricity consumption.

NewImage

The Google paper is here.

We presented PEGASUS, a feedback-based controller

that implements iso-latency power management policy for

large-scale, latency-critical workloads: it adjusts the powerperformance

settings of servers in a fine-grain manner so that

the overall workload barely meets its latency constraints for user

queries at any load. We demonstrated PEGASUS on a Google

search cluster. We showed that it preserves SLO latency guarantees

and can achieve significant power savings during periods

of low or medium utilization (20% to 40% savings). We also es-

tablished that overall workload latency is a better control signal

for power management compared to CPU utilization. Overall,

iso-latency provides a significant step forward towards the goal

of energy proportionality for one of the challenging classes of

large-scale, low-latency workloads.

Facebook in Aug 2014 shared Autoscale its method of using load balancing to reduce energy consumption.  Gigaom covered this idea.

The social networking giant found that when its web servers are idle and not taking user requests, they don’t need that much compute to function, thus they only require a relatively low amount of power. As the servers handle more networking traffic, they need to use more CPU resources, which means they also need to consume more energy.

Interestingly, Facebook found that during relatively quiet periods like midnight, while the servers consumed more energy than they would when left idle, the amount of wattage needed to keep them running was pretty close to what they need when processing a medium amount of traffic during busier hours. This means that it’s actually more efficient for Facebook to have its servers either inactive or running like they would during busier times; the servers just need to have network traffic streamed to them in such a way so that some can be left idle while the others are running at medium capacity.

Facebook posts on Autoscale here.

Overall architecture

In each frontend cluster, Facebook uses custom load balancers to distribute workload to a pool of web servers. Following the implementation of Autoscale, the load balancer now uses an active, or “virtual,” pool of servers, which is essentially a subset of the physical server pool. Autoscale is designed to dynamically adjust the active pool size such that each active server will get at least medium-level CPU utilization regardless of the overall workload level. The servers that aren’t in the active pool don’t receive traffic.

Figure 1: Overall structure of Autoscale

We formulate this as a feedback loop control problem, as shown in Figure 1. The control loop starts with collecting utilization information (CPU, request queue, etc.) from all active servers. Based on this data, the Autoscale controller makes a decision on the optimal active pool size and passes the decision to our load balancers. The load balancers then distribute the workload evenly among the active servers. It repeats this process for the next control cycle.