A good read on realities behind cloud computing

In this article on the venerable Next Platform site, Addison Snell makes a case against some of the presumed truths of cloud computing. One of the points he makes is specifically something we run into all the time with customers, and yet this particular untruth isn’t really being reported the way our customers look at it.

Sure, you are paying for the unused capacity. This is how utility models work. Tenancy is the most important measure to the business providing the systems. The more virtual machines they can cram on a single system, the better for them.

But … but …

This paying for vacancy/unused cycles isn’t really the expensive part.

The part that is expensive is getting your data out, or having significant volumes of data reside there for a long time. Its designed to be expensive. And capture data. This is a rent seeking model … generally held to be non-productive use of assets. It exists to generate time-extended monetization of assets. Like license fees for software you require to run your business.

We’ve worked through analyses for a number of customers based upon their use cases. Compared a few different cloud vendors with accurate usage models taken from their existing day to day work. One of the things we discovered rapidly, for a bursting big data analytics effort, with a sizeable on site storage (a few hundred TB, pulling back 10% of the data per month), was that the cloud models, using specifically the most aggressive pricing models available, were more expensive (on a monthly basis) … often significantly … than the fully burdened cost (power/cooling, space/building, staff, network, …) of hosting an equivalent (and often far better/faster/more productive) system in house.

The major difference is that one of these is a capital expense (capex) and one is an operational expense (opex), and they come from different areas of the budget.

For occasional bursts, without a great deal of onsite data storage, and data return, clouds are great. This isn’t traditionally the HPC use case though. Nor is it the analytical services use case.

Interesting read on the article, and the other points are also quite good. But as noted, the vacancy cost is important, but not the only cost involved, nor even the dominant one.

Viewed 1397 times by 264 viewers

Running conditioning on 4x Forte #HPC #NVMe #storage units

This is our conditioning pass to get the units to stable state for block allocations. We run a number of fill passes over the units. Each pass takes around 42 minutes for the denser units, 21 minutes for the less dense ones. After a few passes, we hit a nice equilibrium, and performance is more deterministic, and less likely to drop as block allocations gradually fill the unit.

We run the conditioning over the complete device, one conditioning process per storage device, with multiple iterations of the passes. After 2 hours or so, and 3 passes, they are pretty stable and deterministic.

Its always fun to watch the system IO bandwidth during these passes. Each system is rocking 18-21 GB/s right now. About 90% idle on CPUs. Banging interrupts/context switches hard, but the systems are responsive.

Actually, while this is going on, we usually do our OS installation if the unit has drives for this.

I like parallelism like this …

Viewed 3231 times by 406 viewers

New #HPC #storage configs for #bigdata , up to 16PB at 160GB/s

This is an update to Scalable Informatics “portable petabyte” offering. Basically, from 1 to 16PB of usable space, distributed and mirrored metadata, high performance (100Gb) network fabric, we’ve got a very dense, very fast system available now, at a very aggressive price point (starting configs around $0.20/GB).

Batteries included … long on features, functionality, performance. Short on cost.

We are leveraging the denser spinning rust drives (SRD), as well as a number of storage technologies that we’ve built or integrated into the systems. The systems provide parallel file systems, Amazon S3 objects, common block storage formats, simultaneously.

See the page (https://scalableinformatics.com/petabyte) for more details. Happy to answer questions or discuss this in depth. Reach out to me at the day job.

Viewed 9375 times by 816 viewers

Fully RAMdisk booted CentOS 7.2 based SIOS image for #HPC , #bigdata , #storage etc.

This is something we’ve been working on for a while … a completely clean, as baseline a distro as possible, version of our SIOS RAMdisk image using CentOS (and by extension, Red Hat … just need to point to those repositories). And its available to pull down and use as you wish from our download site.

Ok, so what does it do?


It boots an entire OS, into RAM.

No disks to manage and worry over.

No configuration drift.

You can run ansible, puppet, cloud-init, kvm, gluster, … etc. Already communicates over serial console by default, though you can complete control over that. Default password is randomly generated, though you can override it at boot time with an option.

Currently fits in 1.8 GB RAM, though with work, we can trim this down a bit.

We boot VMs, physical machines, etc. with this.

By default it will try to bring up the first 4 networks, dhcping the ones that show a carrier after bringing the interface up. If your switch is not configured for portfast, you should be ashamed, and fix that. This way, the system doesn’t waste time waiting for switch ports to come up. And only dhcps on active ports (eliminating delays for dhcp start on ports that have no connections).

We’ll have SIOS images with a number of other tools up soon as well.

Note: For copyright/trademark purposes, this is NOT CentOS or Red Hat. You should not confuse this image, built from CentOS binaries, as being CentOS or Red Hat. It is an instance of an installation, in such a way as to run entirely out of RAM.
Includes full RDMA stack, latest rev CentOS kernel.

Give it a whirl, let me know how it goes. More tools coming to this directory tree very soon. Stay tuned!

Viewed 9995 times by 873 viewers

An article on Python vs Julia for scripting

For those whom don’t know, Julia is a very powerful new language, which aims to leverage a JIT compilation mechanism to generate very fast numerical/computational code in general from a well thought out language.

I’ve argued for a while that it feels like a better Python than Python. Python, for those whom aren’t aware, is a scripting language which has risen in popularity over the recent years. It is generally fairly easy to work in, with a few caveats.

Indentation is the killer for me. The language is tolerable though, IMO, not nearly as “simple” as people claim, with a number of lower level abstractions peaking through. I am fine with those. I am not fine with (and have never been fine with) structure by indentation. This isn’t its only issue. The global interpreter lock, the incompatibility between Python 2.x and 3.x. Python does have a very nice interface to C/C++ libraries though, which make extending it relatively easy.

Julia eschews this structure by indentation. It also tries hard to be convenient, and consistent. IMO it does a great job of it. We are experimenting with using it for more than basic analytics, and it is installed on every single machine we ship in /opt/scalable/bin/julia , and have been for years. As is Python3, and Perl 5.xx.

These tools are part of our analytics stack, which has a few different versions depending upon physical footprint requirements.

Julia has made interacting with the underlying system trivial, as it should be, with few abstractions peaking out from underneath the syntax. This article discusses the differences from a pragmatic viewpoint.

Overall I agree with the points made. Perl, my go-to scripting language, has some of the python issues (abstraction leakage). Perl6 is better. Much better. Really … been looking into it in depth … and it is pretty incredible. Julia is better, and much better at the stuff that you’d want to use Python for.

Viewed 28677 times by 1787 viewers

OpenLDAP + sssd … the simple guide

Ok. Here’s the problem. Small environment for customers, whom are not really sure what they want and need for authentication. Yes, they asked us to use local users for the machines. No, the number of users was not small. AD may or may not be in the picture.

Ok, I am combining two sets of users with common problems here. In one case, they wanted manual installation of many users onto machines without permanent config files. In another case I have to worry about admins whom don’t want the hassle of dealing with admin on many machines.

Enter OpenLDAP. Its basically a read heavy directory service. Using lots of old (outdated) concepts. But it works fairly well once you get it setup. But getting it set up is annoying beyond belief. So much so, that people look to Microsoft AD as an easier LDAP. A single unified authentication/authorization panel for their windows/linux environment.

For these cases, we don’t have buy in from the groups running the AD. So we can’t connect to it.

Which means locally hosted LDAP.

This part is doable in appliance form. It is still not user friendly by any measure. I don’t have problem with the configuration of the services … but they are beyond ugly. Not something we should be using in 2016.

Then the client side. Originally in Linux, you used the PADL tools (ldap*). Like the whole LDAP system, it is … well … ugly. It is non-trivial to use. You have to be very careful of how you invoke it. Even for testing.

So RedHat noticed this and wrote what is generally considered a saner version. SSSD. And it is generally better … sssd.conf is well documented, but there are few real working examples for you.

So here is one. sssd.conf talking to a machine named ldap, which hosts an openldap database. Change your ldap_search_base and ldap_uri to point to what you need.

config_file_version = 2
services = nss, pam
domains = LDAP

filter_users = root
filter_groups = root

enumerate = true
cache_credentials = true

id_provider = ldap
auth_provider = ldap
chpass_provider = ldap

ldap_uri = ldap://ldap
ldap_search_base = dc=unison,dc=local
# following is debian specific
ldap_tls_cacert = /etc/ssl/certs/ca-certificates.crt
entry_cache_timeout = 600
ldap_network_timeout = 2

then you need to modify some of the pam system to make sure it makes use of this.


password        sufficient                      pam_sss.so
password        [success=1 default=ignore]      pam_unix.so obscure try_first_pass sha512
password        requisite                       pam_deny.so
password        required                        pam_permit.so


session [default=1]   pam_permit.so
session requisite     pam_deny.so
session required      pam_permit.so
session optional      pam_mkhomedir.so skel=/etc/skel umask=0077
session optional      pam_sss.so
session required      pam_unix.so 

Then, when you do this right, your test user is visible.

getent passwd | grep testuser1

Viewed 27621 times by 1744 viewers

M&A time: HPE buys SGI, mostly for the big data analytics appliances

I do expect more consolidation in this space. There aren’t many players doing what SGI (and the day job) does.

The story is here.

The interesting thing about this is, that this is in the high performance data analytics appliance space. As they write:

The explosion of data — in volume and variety, across all sectors and applications — is driving organizations to adopt high-end computing systems to run compute-intensive applications and big data workloads that traditional infrastructure solutions cannot handle. This includes investments in big data analytics to quickly and securely process massive data sets and enable real-time decision making. High-end systems are being used to advance research in weather, genomics and life sciences, and enhance cyber defenses at organizations around the world.

As a result of this demand, according to International Data Corporation (IDC), the $11 billion HPC segment is expected to grow at an estimated 6-8% CAGR over the next three years1, with the data analytics segment growing at over twice that rate.

12-16% CAGR for data analytics, which I think is low … . And the point they may about the data explosion is exactly what we talk about as well.

I’ve written about this in the past, with the cloud model (ultra cheap/deep/inefficient, scale performance by throwing incredible amounts of hardware at a problem … at a fairly sizeable cost, even if it is OpEx), or … far more efficient, far faster, better designed systems that can provide unapologetic massive firepower efficiently, so you need far … far less hardware to accomplish the same thing, at a corresponding savings in CapEx/OpEx.

There aren’t many players in this space, so lets see what else happens.

Viewed 36044 times by 2005 viewers

@scalableinfo 60 bay Unison with these: 3.6PB raw per 4U box

Color me impressed … Seagate and their 60TB 3.5inch SAS drive. Yes, the 60 bay Unison units can handle this. That would be 3.6PB per 4U unit. 10x 4U per 48U rack. 36PB raw per rack. 100PB in 3 racks, 30 racks for an exabyte (EB).

The issue would be the storage bandwidth wall height. Doing the math, 60TB/(1GB/s) -> 6 x 104 seconds to empty/fill such a single unit. We can drive these about 50GB/s in a box, so a single box would be 3600TB/(50GB/s) or 7.2 x 104 seconds to empty/fill a box full. Network bandwidth would be the biggest issue … we could get 2x 100Gb NICs going full speed, but even that would still be 20% of where we need to be to keep it fully loaded.

This would need to be for an archive, and you’d need a mixture of object store and erasure codes on this. No way would you even consider a RAID on such a beast.

Viewed 35228 times by 1995 viewers