Archive | Tips RSS feed for this section

Vagrant, Docker provider, (and Puppet)

30 Jul

While this is not exactly a Web or Data performance post, it is indeed about performance and the speedup of our DEV environments…

Vagrant + docker provider presentation at Docker meetup in Barcelona

Vagrant + docker provider presentation at Docker meetup in Barcelona


I regularly use Vagrant to share the DEV VMs among team developers and for open-source projects.  Vagrant allows to share, build, provision server environment in which to run code.  However, as you go adding VMs to your system, it starts consuming resources and making things slow.  After reading about Docker, decided to merge the two.

Docker is virtualization replacement for several scenarios based on Linux containers (LXC)Docker, allows to pack applications along with configuration and OSs without the need of virtualization (for latests Linux kernels).  So containers allow isolation, automation, portability, and sharing.  Many  of the features found in a VM (minus security, migration, etc), but containers runs simple as processes in our system, so no need for reserved resources and competing VMs.  Our Linux kernel scheduler is in charged of deciding when to run and how the distinct processes.  So, besides being more light-weight on our DEV machines, it can allow for example to replicate better a production environment in our laptops.  And many more features…

While Vagrant has an official docker provider, building a Vagrant compatible box from scratch became a challenge.  Couldn’t find a clean, step-by-step instructions.  So decided to build my own, present, and share it!

The source-code for the project can be found at:

Basically to build a Docker image compatible with Vagrant defaults, the next 7 step need to be performed:

  1. Import image FROM repo:tag
  2. Create vagrant user
    1. create password
    2. sudo permissions and password-less login
  3. Configure SSH
    1. Setup keys
  4. Install base packages
  5. Install Configuration Management system (optional)
    1. Puppet, etc…
  6. Expose SSH port
  7. Run SSH as a daemon

To the the right steps took me a good time of trial an error.  So I hope this save some time to other people interested.

So follow the Presentation:

Clone the repo, and vagrant up



Improve your DNS lookups with namebench

30 Jan

Ilya Ilya Grigorik’s great course on web performance made me aware of the importance of DNS server performance and how they are poorly mantained.  Domain Name System was invented in ’82, they compose one of the oldest core services of the Internet, however, they are often disregarded, as they are assumed to be fast and usually one connects to whatever is offered through DHCP.  DNS requires very little resources: uses UDP, client and server caches, and high optimized code.  DNS is also very reliable as clients have a pool of servers to connect to and requests can be forwarded between servers.  However, in general, DNS servers are poorly maintained and not optimized regularly, as they most of the time “work”.

Illya suggested trying namebench, an open-source tool to benchmark and help you choose the most appropriate DNS servers for your location.  What’s cool about the tool —besides being python based and having a multi-platform GUI– is that for it’s benchmark it can take domain names from your browser’s cache and graphical reports.

namebench DNS latency results


A tribute to Zabbix, great network+ monitoring system

7 Apr

Just a short post to recommend a great monitoring system if you haven’t heard of it yet:  Zabbix.



Zabbix has been around to the public since around 2001, and since 2004 as a stable version.  However I see very few posts about it, and it is far less popular than Nagios, even though it is more feature rich.  I have been using it since 2004, for various projects, and it is great.  It is very simple to install, it had since the beginning Windows and Unix agents so you don’t need to set up SNMP on your network, and scales very well.  I even use it to measure and keep track of  the performance of my own dev machine.

However the most important feature that I find besides monitoring servers, performance availability history, graphics and charts, is that you can extend it and import application data easily! (more…)

GROUP BY a SQL query by N minutes

2 Feb

Just a quick post about a tip to group in SQL [tested on MySQL] a date-time field by N number of minutes using FLOOR().

Functions such as DAY(), HOUR(), MINUTE() are very useful to group by dates, but what about 30, 15, or 10 minutes?


Deleting all non-expiring keys in a Redis sever from command line

23 Dec

This ending year, we have been reading, installing and testing several in RAM key-value storage(NoSQL), engines.   We have decided to give Redis a try mainly because it can save the data to disk, in case you have to restart a server or it crashes, you have have your data back.  Our experience with Redis has been mixed, good and bad things, but that will be for another post…

Yottaa, a great tool for monitoring the performance of your site

22 Nov

Recently I have found this really cool and useful website: Yottaa, besides the curious domain name is a web 2.0 style monitoring and benchmarking site. You input what pages to monitor or benchmark and it uses an automated Firefox browser to load and process the whole page, including the CSS and Javascript. Then it reports you page load time, time to interactivity, and a full YSlow report, and their custom score.


Calculating 95th, 99th, 50th (median) with a single MySQL query

16 Nov

As a first post, let’s start with a basic tip for SQL queries I use very often.  First let’s start with some basic info on percentiles:

Percentiles are one of the best indicators of how well our web site is performing, unlike averages.  Averages tend to hide information on outlier values, and while they might be showing you adequate numbers, they might be hiding how a significant portion of  users are seeing your site.

For websites, a good percentile to base measurements is the 95th percentile, while for network equipment the 99th.  This is because you might have some pages that are inherently longer to process, such an availability search for a specific product, while network equipment doesn’t have this constrain.  You should really use the number that best represents your workload, 95th seems to be set as the standard but you can also use the 85th according to the type of site you are monitoring.  The median (not average) corresponds to the 50th percentile.  (more…)