Saturday, September 17, 2016

Managing Docker containers using Terraform on Windows 10

It started with evaluating Apcera, a container management platform, at work.  While Apcera "can" run on BareOS(ie one needs to install Ubuntu 14.04, and run some scripts), it is primarily made to run on top of an IaaS provider like OpenStack or AWS. After successfully installing Apcera on BareOS (see https://github.com/baboune/UnofficialApceraDoc/), my next goal was to set it up on OpenStack to troubleshoot some cloud-init issue that a colleague was experiencing.

When installing towards an IaaS provider, Apcera relies on Terraform. But it only supports Terraform 6.16, and not the latest version 0.7.3  and 0.7 is a major revision change with breaking compatibility changes.  Needless to say that this was not in the Apcera documentation and that a certain time was lost figuring out why the provided terraform files were failing (parsing errors, and Integer values are not supported (issue 6254)).

So, I decided to take a look at Terraform over the week end, and learn about it a little.

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. It can manage existing and popular service providers as well as custom in-house solutions. It is a infrastructure as code, execution plan, resource graph, and change automation tool.  It stops where Configuration Management tools like Puppet, Ansible and Chef starts.

One of the supported providers by Terraform is Docker. And, a priori, it looked simple enough to try and make a few terraform learnings.  It is quick and local to my Windows 10 setup. So, pressing Windows key -> Docker Quickstart Terminal launches the Docker default VM (VirtualBox provider) via docker-machine.

The next step is to create a project, and to attempt to launch a simple Ubuntu container as per the tutorial (see https://www.terraform.io/docs/providers/docker/index.html).

In this example tutorial, the terraform source looks like this:

# Configure the Docker provider
provider "docker" {
    host = "tcp://127.0.0.1:1234/"
}

# Create a container
resource "docker_container" "foo" {
    image = "${docker_image.ubuntu.latest}"
    name = "foo"
}

resource "docker_image" "ubuntu" {
    name = "ubuntu:latest"
}

the above needs to be saved in a "*.tf" file, e.g. example.tf in a local directory. Note that ".tf" is the terraform file extension and all files found within a directory with this extension are automatically included when running a terraform command.

There I faced a first problem, which IP and protocol to use in the provider section?  I knew that the IP is the Docker VirtualBox VM (192.168.99.100) but not the port, and the protocol should be http as the Remote Docker API is REST based.

Finding this information is easy. In the Quick Start console, simply type docker-machine config.

baboune MINGW64 ~
$ docker-machine config
--tlsverify
--tlscacert="C:\\Users\\baboune\\.docker\\machine\\certs\\ca.pem"
--tlscert="C:\\Users\\baboune\\.docker\\machine\\certs\\cert.pem"
--tlskey="C:\\Users\\baboune\\.docker\\machine\\certs\\key.pem"
--H=tcp://192.168.99.100:2376

Thus the host IP/protocol is tcp://192.168.99.100:2376.

But after updating the example.tf file with that information, and trying a terraform plan command, I still got a malformed HTTP response error (second problem).

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but
will not be persisted to local or remote state storage.

Error refreshing state: 1 error(s) occurred:
* Error pinging Docker server: Get http://192.168.99.100:2376/_ping: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

A quick Google search indicates no matching results for my Docker setup (Docker Toolbox, recent docker version).  There are a few similar issues (here) but those are using boot2docker which has been deprecated in favor of docker-machine many releases ago. Also, a quick docker info will show that I am running release 1.12.1.

A netstat on the VM shows that port 2376 is open and associated with the docker server:

$ nestat -anp | grep 2376
tcp           0       0         :::2376        :::*        LISTEN        2662/dockerd

And a curl to the remote Docker API returns no visible results:

$ curl -XGET http://192.168.99.100:2376/_ping

$

But works.

It is at this point that I make the connection with the certificates directory listed by the docker-machine config command, and the Docker Remote API documentation.  Looking back to the terraform Docker documentation, it says:

The following arguments are supported:
  • host - (Required) This is the address to the Docker host. If this is blank, the DOCKER_HOST environment variable will also be read.
  • cert_path - (Optional) Path to a directory with certificate information for connecting to the Docker host via TLS. If this is blank, the DOCKER_CERT_PATH will also be checked.
While cert_path is indicated as optional, it seems that terraform might require this attribute to be set for on most deployments since docker-machine, as per Docker Remote API documentation, instantiates a Docker daemon that uses an encrypted TCP socket using TLS.

The working example.tf then becomes:

# Configure the Docker provider
provider "docker" {
    host = "tcp://127.0.0.1:1234/"
    cert_path = "c:\\Users\\Nicolas\\.docker\\machine\\certs"
}

# Create a container
resource "docker_container" "foo" {
    image = "${docker_image.ubuntu.latest}"
    name = "foo"
}

resource "docker_image" "ubuntu" {
    name = "ubuntu:latest"
}
HashiCorp allows users to update the documentation via GitHub, which leads me to create a pull request https://github.com/hashicorp/terraform/pull/8895 to update it with at least the default dockerd port information.

And terraform plan now works:


$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but
will not be persisted to local or remote state storage.

The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

+ docker_container.foo
    bridge:           "<computed>"
    gateway:          "<computed>"
    image:            "${docker_image.ubuntu.latest}"
    ip_address:       "<computed>"
    ip_prefix_length: "<computed>"
    log_driver:       "json-file"
    must_run:         "true"
    name:             "foo"
    restart:          "no"

+ docker_image.ubuntu
    latest: "<computed>"
    name:   "ubuntu:latest"

Plan: 2 to add, 0 to change, 0 to destroy.

Friday, May 13, 2016

Ubuntu 16.04 and Docker 1.11 - Accessing secured private registry

After trying to access a private registry that is secured via ssl, and adding the certificate authority (ca.pem) under

  /etc/docker/certs.d/<IP of registry>

Where my IP is 10.68.230.7, the pull/push requests still failed.

$ docker pull 10.68.230.7/alpine
Using default tag: latest
Error response from daemon: Get https://10.68.230.7/v1/_ping: x509: certificate signed by unknown authority

It seems docker does not like certificates with the ca.pem file name.  To fix this rename the ca.pem file to ca.crt.

$ cp /etc/docker/certs.d/10.68.230.7/ca.pem /etc/docker/certs.d/10.68.230.7/ca.crt

Then it works.

$ docker pull 10.68.230.7/alpine
Using default tag: latest
latest: Pulling from alpine
d0ca440e8637: Already exists 
Digest: sha256:5c826f3f0f5c34aca4df43360ec0faef6326b18bd311309cc8ae3a83f799d1eb
Status: Downloaded newer image for 10.68.230.7/alpine:latest

Ubuntu 16.04, systemd and Docker

Ubuntu 16.04 LTS is now available.  After having made the switch from 14.04 without really looking at the changes except for the kernel number (4.x), I was pleasantly surprised by the fact that it now uses systemd.

Trying to setup docker to pull/push from a private registry using security, I first attempted to change the logging level to debug by adding -D in /etc/default/docker and after restarting docker noticed that no "debug" logs were shown.

This took me a while to find out but /etc/default/docker is not used anymore.

This fact is in the /etc/default/docker file but easy to miss:

# Docker Upstart and SysVinit configuration file
#
# THIS FILE DOES NOT APPLY TO SYSTEMD
#
#   Please see the documentation for "systemd drop-ins":
#   https://docs.docker.com/engine/articles/systemd/
#
# Customize location of Docker binary (especially for development testing).
#DOCKER="/usr/local/bin/docker"
# Use DOCKER_OPTS to modify the daemon startup options.
DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4"
# If you need Docker to use an HTTP proxy, it can also be specified here.
#export http_proxy="http://127.0.0.1:3128/"
# This is also a handy place to tweak where Docker's temporary files go.
#export TMPDIR="/mnt/bigdrive/docker-tmp"

Also, one can find out where a service file and configuration is located:

$ systemctl show --property=FragmentPath docker
FragmentPath=/usr/lib/systemd/system/docker.service
$ grep EnvironmentFile /usr/lib/systemd/system/docker.service
grep: /usr/lib/systemd/system/docker.service: No such file or directory

What the above tells us is that the docker service has no configuration file at the moment.

There are different ways to configure services in systemd.  The option described below is OK but deviates from a standard systemd setup in the config location as, since this is Ubuntu,  /etc/default path is used instead of /etc/sysconfig.

The first step to use a config file is to add the required informarion  to the /lib/systemd/system/docker.service file by adding an EnvironmentFile attribute in the [Service] section:

$ vi /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket
[Service]
Type=notify
# see https://docs.docker.com/engine/admin/systemd
EnvironmentFile=-/etc/default/docker
ExecStart=/usr/bin/docker daemon -H fd:// $DOCKER_OPTIONS \
          $DOCKER_STORAGE_OPTIONS \
          $DOCKER_NETWORK_OPTIONS \
          $BLOCK_REGISTRY \
          $INSECURE_REGISTRY
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
[Install]
WantedBy=multi-user.target


Now when the docker service is restarted it will use /etc/default/docker as its environment file, and pull the environment variables from it.

As such, the final step is to add the matching options in /etc/default/docker:
# Docker Upstart and SysVinit configuration file

# Customize location of Docker binary (especially for development testing).
#DOCKER="/usr/local/bin/docker"

# Use DOCKER_OPTS to modify the daemon startup options.
DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4"

# If you need Docker to use an HTTP proxy, it can also be specified here.
#export http_proxy="http://127.0.0.1:3128/"

# This is also a handy place to tweak where Docker's temporary files go.
#export TMPDIR="/mnt/bigdrive/docker-tmp"
INSECURE_REGISTRY=""
DOCKER_STORAGE_OPTIONS=""
DOCKER_NETWORK_OPTIONS=""
BLOCK_REGISTRY=""

It should work.


Monday, August 24, 2015

Elasticsearch cluster balancing under heavy writes


We are using Elasticsearch-Logstash-Kibana (http://www.elasticsearch.org/overview/elkdownloads/) for an internal demo to received an event stream built using Actors (Scala) using Akka. The demo is used as a stand-alone application to provide strategic business insights or to integrate with existing applications and interact with incoming data.

In our setup, probes in a managed core network in Holland are streaming events to a deployment zone in Sweden. This unbounded data set (or stream) of ~4-5K events (~1Kb) per second, is then sent to our Lab in Kista (also in Sweden). Each event is then enhanced with additional data (akka), and transformed into a JSON object before being batch stored in Elasticsearch. We then visualize the results, and so some simple analytics on the incoming stream.
We have indexed up to 800 Millions such events for a total of ~1TB (roughly 7 days of data). Storage limitations (not enough hardware) prevent us from storing longer data sets.
Initially, we deployed 3 elasticsearch servers to form a single cluster.  Each node could be elected as master, and the voting requirement was set to 2.  In other words two nodes at least had to agree during an election of new cluster master.
Challenges:
  • Garbage collection/split brain: Each elasticsearch writer instance has 24GB of RAM, of which 12 is used for JVM heap. With initial GC settings, the long GC pause would reach up to 2 min. In the case where this node is the master, losing connection to the master meant other nodes would re-elect a new master and start redistributing the data shards. If this was close to another node GC or in a situation where the number of file merges were high, then the system would eventually end up in a split-brain situation. 
    • As a result: Our current setup uses the following strategy: 5 elastic search nodes are deployed.  Out of the 5, 3 are writer nodes(stores and indexes data) 24GB RAM and 8 vCPUs each, 1 is a master node (no data, no indexing, nothing running there, except that it is always the elected master) 2GB RAM 1vCPU, 1 is a client node (receives indexing data, and requests from Kibana, acts as a load balancer between the writers, no storage, no indexing) hosts Kibana 6GB RAM and 4vCPUs. Since the master is more or less idle, it is always available thus no split brains.
  • Indexing data: We started by sending one event per request to elasticsearch. This had the caveat of creating a very high number of file merges within how elastic search stores data (http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/merge-process.html). 
    • Bulk indexing performs better.
  • Garbage collector tuning: Defaults for GC were good but in our situation not aggressive enough. Had to tune GC to trigger more frequent collections at higher CPU cost but it gave us stable performance on searches and writes.
  • Write heavy configuration vs Searches: As indicated above, there might be some tuning required to handle the type of load that the cluster receives. We have daily indexes, a special mapping file (so that many of the parameters are not fully analyzed, see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html ). Some tuning of parameters like 'indices.store.throttle', 'indices.fielddata.cache', etc.
  • IOPS: Fast disk access is quite important. Overall using an Openstack cluster with default replication at the virtual file system is an issue. 
    • We used ephemeral storage that in our Openstack environment points to the compute host hard drives (3 SSDs) of the physical host on which ES is running. This ensures data writes locality.
    • Plan is to use containers soon. Hypervisors (KVM) are just too heavy.

Sunday, October 26, 2014

Mirantis Fuel 5.1


I have been using Mirantis Fuel (https://github.com/stackforge/fuel-library/) for about a year now.  I started with version 4.x, and am now using version 5.1.  It is a great open source project!

Fuel is used to setup and manage clusters of machines and allows for auto-deployment and installation of OpenStack via a nice and intuitive web UI.  Note that Fuel also offers a CLI even if I do not use it as much as the UI.

In our lab, Fuel is currently used to manage two different clusters. The first cluster is composed of 4 compute nodes and 1 controller node and is managed by Fuel 4.1. It runs OpenStack Havana on Ubuntu Precise.  It has now been in use for more than 7 months. The second cluster is composed of 15 compute nodes and 1 controller node. It is managed by Fuel 5.1 and running IceHouse on CentOS.  It is not used yet for production.  Both clusters are setup using Neutron and VLAN tagging.  As a comparison,we also have a manually setup OpenStack Havana on Ubuntu Precise also Neutron but GRE tunnels for virtualization.  All in all we have had to deal with tweaking and setting up OpenStack quite a lot (nova, glance, cinder, neutron, etc.).

Starting with the good things

Fuel has a sound architecture with different layers that covers the different areas of provisioning i.e. from finding the right image, to identifying the different machines to manage, and automating the deployment and configuration of OpenStack.  Each layer is made of open source parts like cobbler, puppet, etc.  And all of Fuel is available on Git.  So it is a fully open source solution and it is thus easy to see the source code, and to contribute back. 

The most important node in a Fuel deployment is the Master node aka the node where Fuel is running.  This node acts as the central coordinator.  The Master runs a PXE server (cobbler http://www.cobblerd.org/ at the moment) such as when a new machine is connected to the network, it can auto discover via DHCP what to install. Since at this point, Fuel does not know the node, a default bootstrap image is assinged to the newly discovered node. Fuel distributes a special bootstrap image that runs a special script called the Nailgun agent. The agent collects the server’s hardware information and submits it back to the Master (part of the Nailgun service).  This allows for rapid inspection and smart information collection about all the physical instances making up the cluster.  It also exposes each machines interfaces and disks visually to the Fuel UI. This makes Nailgun a critical service. In fact, any commands sent by the user either via UI or CLI the user is received and executed by Nailgun.  Nailgun stores the state of the system, ongoing deployments, roles and all discovered nodes in a PostgreSQL database.  This is a critical part of the system in case of failures, and for recovering in case of errors.  This makes it relatively safe to wipe out an environment or node and re-create it from scratch.

Once data has been collected from all the nodes, it is trivial in the Fuel UI to create an environment by assigning roles to nodes.  The UI is flexible and allows for setting various networking options for the different communications between OpenStack, nodes, virtual machines, storage, and management.  From there it is simple to click on a button and see the cluster being set up.  Internally, the nailgun service generates a template of the configuration and submits it as a task via RabbitMQ to the Astute service.  The astute service isaccording to the Fuel documentation in fact a set of processes that are  calling a defined list of orchestration actions.  For example, one of the action is to tell cobbler how to generate different images based on the environment settings and the options set by the user in order to distribute to each of the nodes. As a result each MAC address of each of the node can be set differently, or the storage options, etc. This is initially tricky to understand, and can sometimes lead to problems especially when a node system is not removed from cobbler.

As part of the deployment, Fuel installs on each node an "mcollective" agent.  I am not 100% sure what those do bu aacording to the documentation, these agents become responsible for listening on the RabbitMQ for further commands from Nailgun.  The final step is to use puppet to provision each node according to the specified recipe and user settings.

See http://docs.mirantis.com/fuel-dev/develop/architecture.html the fuel developer documentation for more info including sequence diagrams.

When we started with Fuel 4x, we were amazed at how easy the the provisioning was.  We were however using all hardware (HP G4s, and G5s) and that created some issues due to the P400 controller that most of the machines used.  Hats off to the Mirantis people on #fuel irc channel as they are really friendly and helpful.  Thanks to kalya, evg and MiroslavAnashkin, we eventually were able to fix some issues and eventually contribute back some code to Fuel.

In short, it is a very smart, asynchronous and layered approach to provisioning a complex environment.


The not so good things


Version 5.1 further layered the components by introducing each as a separate docker (https://www.docker.com/) container.  However, in this case, maybe Mirantis jumped too far too fast.  Also, version 5 contained way more bugs than 4.x and while some of bugs are quite basic, some are quite a pain.

One of the first bug is that an environment with more than 10 nodes simply fails.  This was fixed in 5.0.1, then came back in 5.1.  Then log rotation is not working, and the master node collects all the logs from each of the remotely managed hosts.  This means the disk fills up.  I did not notice this until too late, and even if the bug can easily be fixed manually, the machine was registering an inode failure when it was found.  More on this later.

There are some merges that were not done at  Mirantis OpenStack pacakging. As a result, the nova CLI is missing the server-group capabilities. This is similar to the following problem with RDO (nova-server-group-apis-missing-in-rdo-icehouse-installation) . Not a big problem except I wanted to use that.  Of course, it is possible to download the git code for nova and rebuild it locally then apply the package but Mirantis relabels the pacakges so it is a bit difficult to track. See https://answers.launchpad.net/fuel/+question/255847.

Docker is great and I really like it.  Containers are much more sensible than hypervisors in many situations.  Putting each Fuel component into its own container makes a lot of sense.  And since each container communicates via RabbitMQ it is very logical.  But coming from previous Fuel versions, at first I tried to re-do some of the tweaks that were done in 4.x and got confused.  For example, setting the dns masq options had no effect in the server. As it turns out, the dns masq has to be set within the cobbler container and not in the files on the server node hosting the containers.  The cobbler container hosts the DNS masq so changes outside of the container have no effect. This is a bit confusing as it is hard to guess which container does what at times.  But, docker is still young and unstable.  When the disk filed up due to missing log rotation, the container running the postgreSQL database (fuel/postgres_5.1:latest) started to flag an inode corruption. 

XT4-fs error (device dm-5): __ext4_ext_check_block: bad header/extent in inode #150545: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)

And Docker has this issue: https://github.com/docker/docker/issues/7229? of file system corruption when using devmapper.

Not sure what caused what but the result is that the container with all the data of my running OpenStack environment is now reporting an inode error, and my disk is full.

Mirantis introduced a new feature to backup/restore the whole Master.  This is great.  So, first let's delete some logs, restart rsyslog (dockerctl restart rsyslog) and launch that.  Bad surprises. First the backup tars all containers, then it tries to compress them into an lrz. This two operations require twice the amount of disk space of the final compressed tar ball.  The result is that it requires at least 25GBs of disk space to make a valid backup, and the compression phase is extremly long (count about 1-2h for a backup).  Personally I dont understand why compression is used.  A simple tar ball would have been sufficient.  Worse, if something fails, then compression occurs anyway and then all files except the compressed one are deleted.  Finally, where doing the restore decompression takes about 40 minutes. And I got an error during the compression:

backup failed: tar: Removing leading `/' from member names Compressing archives... tar: /var/backup/fuel/backup_2014-10-17_1​025/fuel_backup_2014-10-17_1025.tar: Wrote only 6144 of 10240 bytes tar: Error is not recoverable: exiting now

And no files except a corrupt archive.  At thr end of the "backup" operation, all files except the compressed binary are erased.  Anyhow, after adding an NFS mount with lots of space, I managed to finalize the backup.  It was then possible to launch the restore on another machine with the same Fuel 5.1 release pre-installed. 

Many posts point to Docker networking parts still being in the work. It seems that is what happened during the restore.  Once it finished (count about 2 hours), nothing was working.  All containers were pointing to being active and running while in fact all had network problems.  I could not reach web UI, fuel CLI commands returned either 500 or "bad gateway", etc..  A linux "route" command indicated that the NW setup on the node was screwed.  Fixed that but none of the container recovered. So, no go.  Not sure that this feature is actually working... In any case it is nor resilient nor fast.

By the time, I had finished the above, the Master (original failing node) had reached a critical stage.  The UI was non-responsive and could not be relaunched.  Trying to relaunch the nginx container was generating a missing container exception, inode errors were more frequent.  At that point, I tried to make a backup of the PostgreSQL database, and found out that this was not documented anywhere.  It was possible to reconstruct that from reading the source code though.  Miroslav on #fuel gave me instructions.  But it was Friday, and I was tired so I went home.  When I came back on Monday, the Master was not responding to backup postgreSQL commands.  Hopefully, the Master is running on an ESX machine and I had a snapshot so I used that to restore the Master.  Ther eI had made a mistake, the snapshot preceded my current environment and I ended witha Fuel Master managing a set of nodes with different identifiers than the one expected.  Cobbler generates a unique id for each node on each deployment, and this id is incremented by one for each node per deployment.  "node32" in Fuel database was now "node48"... Sigh... Funny enough, these generated errors in the network checks, and prevented resetting the environment. I had to re-create the whole thing from scratch.

Long lessson short, Docker is great but when it fails it hurts, restore/backup in Fuel is not very resilient.

Since it is difficult to find, here is the procedure for backing up Nailgun PostgreSQL data:

 dockerctl shell postgres
 sudo -u postgres pg_dump nailgun > /var/www/nailgun/dump_file.sql
 exit

This path /var/www/nailgun/ is cross-mounted between all the containers, so the dump appears in the root filesystem at the same path.

To restore postgreSQL data from dump, place dump file to /var/www/nailgun/ and then:

 dockerctl shell postgres
 sudo -u postgres psql nailgun -S -c 'drop schema public cascade; create schema public;'
 sudo -u postgres psql nailgun < /var/www/nailgun/dump_file.sql
 exit
 dockerctl restart nailgun
 dockerctl restart nginx
 dockerctl shell cobbler
 cobbler sync
 exit

In conclusion, Fuel is great and Mirantis has a great free support via irc.  But, it seems to me versions 5.x are not yet ready for production.  Then upgrades are still an issue as Mirantis relabels each OpenStack package making it hard to track and Fuel has a tight control on which package is available via owning the repositories.   As a result, I wonder if having a simpler setup would not allow for more rapid upgrades (maybe using a simpler puppet setup), and OpenStack Juno just came out and fixes thousands of IceHouse bugs.... All in all I am grateful for having my environment managed by Fuel, yet sometimes I  have doubts.

Thanks again to kalya, evg and MiroslavAnashkin for their patience and help.

Wednesday, January 22, 2014

Loved this look at the AWS infrastructure from James Hamilton, AWS VP and Distinguished Engineer.  (Yes, even their VPs are engineers.)  Fun entertaining and mind numbing talk with respect to the scale of their infrastructure.

http://www.youtube.com/watch?v=WBrNrI2ZsCo

"A behind the scenes look at key aspects of the AWS infrastructure deployments. Some of the true differences between a cloud infrastructure design and conventional enterprise infrastructure deployment and why the cloud fundamentally changes application deployment speed, economics, and provides more and better tools for delivering high reliability applications. Few companies can afford to have a datacenter in every region in which they serve customers or have employees. Even fewer can afford to have multiple datacenter in each region where they have a presence. Even fewer can afford to invest in custom optimized network, server, storage, monitoring, cooling, and power distribution systems and software. We'll look more closely at these systems, how they work, how they are scaled, and the advantages they bring to customers."

Thursday, January 16, 2014