M&A: Vertical integration plays

Two items of note here. First, Cavium acquires qlogic. This is interesting at some levels, as qlogic has been a long time player in storage (and networking). There are many qlogic FC switches out there, as well as some older Infiniband gear (pre-Intel sale). Cavium is more of a processor shop, having built a number of interesting SoC and general purpose CPUs. I am not sure the combo is going to be a serious contender to Intel or others in the data center space, but I think they will be working on carving out a specific niche. More in a moment.

Second, Samsung grabbed Joyent. This is Bryan Cantrill’s take on it, but his is denser with the meat of the why, and less filled with (though there is some) marketing blather on synergies, culture, yadda yadda yadda. This is a move by Samsung mobile, one of the Samsung companies. Joyent is famous for starting the node.js project, as well as its cloud with its Triton (data center as a container system), manta (object storage, and move processing to data for in-place computing … very similar in concept to what we’ve been pushing for the last decade), and of course SmartOS.

First off, I don’t see any of the dependency stack going away. Triton lives atop SmartOS. If anything, I see SmartOS benefiting from this massively, as Samsung may add weight to getting drivers operational on SmartOS. This is, IMO, an important weakness in SmartOS, and one I hope, will now be rectified. We were successful in convincing Chelsio to port to SmartOS/Illumos a few years ago, so we had a decent 10GbE driver. But I want 100GbE, and a few other things (NVMe, etc.) that I’d have to hire Illumos kernel devs for. Given Samsung’s focus on NVMe (not mobile, but the other folks), I’ll ping them about helping out with this … as NVMe on SmartOS + 100GbE would be AWESOME … (and for what it’s worth, the major siCloud installation we built a few years ago, started out with SmartDC, and moved to Fifo for a number of reasons … but our systems/code are all SmartOS/SDC/Fifo supporting, as long as we have working drivers).

Ok, bigger picture.

This is vertical integration in both cases. Bring more of the stack in-house, focus on the value that these things can bring. Joyent + Samsung gives you DC wide container engines. Great for mobile. But wildly awesome for other things (think of what OpenStack would like to do, and they are already available with Triton). Then qlogic + Cavium gives a verticalized integration play for a set of DC niches, in storage, in NPUs (possibly), in hyperscale systems …

Both of these are very interesting.

Viewed 6813 times by 436 viewers

About that cloud “security”

Wow … might want to rethink what you do and how you do it. See here.

Put in simple terms, why bother to encrypt if your key is (trivially) recoverable?

I did not realize that side channel attacks were so effective. Will read the paper. If this isn’t just a highly over specialized case, and is actually applicable to real world scenarios, we’ll need to make sure we understand methods to mitigate.

Viewed 10166 times by 610 viewers

Ah Gmail … losing more emails

So … my wife and I have private gmail addresses. Not related to the day job. She sends me an email from there. It never arrives.

Gmail to gmail.

Not in the spam folder.

But to gmail.

So I have her send it to this machine.

Gets here right away.

We moved the day job’s support email address off gmail (its just a reflector now) into the same tech running inside our FW. Because it was losing mail, pissing off customers.

Though in one of those cases, the customer had a “best practice” rule (read as: a random rule implemented without a compelling real problem that it “solved”, or risk it “reduced” … e.g. it was a fad, and a bad one at that, that likely caught MANY vendors up in it) that also messed with email.

Its not that this is getting old. Its that I am now actively looking at Gmail based mail as a risk to be mitigated. As mail gets lost. With no logs to trace what happened.

So … do I want to spend the time to manage our own mail, or do I want to continue to lose mail? That is the business question. What is the value of the lost mail, or lost good-will due to the lost mail?

Viewed 13148 times by 726 viewers

That moment in time where you realize that you must constrain the support people from doing anything other than what you direct them to do

This is Comcast. And my internet connection in my home office. Cable modem spontaneously started rebooting on me over the last few months. Looks like it happened after they replaced my older cable modem which was working nicely, with the new one … which isn’t.

First call in this week, after it kicked out a whole bunch of times while I was working on customer machines with hard deadlines to get things done in … they scheduled a tech, after I requested a replacement cable modem. They promised/swore he would have one with him, and would replace it.

Instead, he blamed filters outside the house (that Comcast had installed previously), that he removed.

This morning while working on a machine in the UK, and this afternoon while working on a machine in Ohio, it kicked out on me. Again, with hard timing deadlines (one was a bank, another a genomics medical site) on me to get it done.

Fed up, I called them back. On the phone now. Will insist they simply replace the box. They seem to get that this is an issue. Will see if they actually do this correctly.


Viewed 14626 times by 774 viewers

Real scalability is hard, aka there are no silver bullets

I talked about hypothetical silver bullets in the recent past at a conference and to customers and VCs. Basically, there is no such thing as a silver bullet … no magic pixie dust, or magical card, or superfantastic software you can add to a system to make it incredibly faster.

Faster, better performing systems require better architecture (physical, algorithmic, etc.). You really cannot hope to throw a metric-ton of machines at a problem and hope that scaling is simple and linear. Because it really never works like that. You can’t hope that a pile of inefficient cheap and deep machines has any hope whatsoever of beating a very well architected massively parallel IO engine at moving/analyzing data. Its almost embarrassing at how bad these pile of machines are running IO/compute intensive code, when their architecture effectively precludes performance.

Software matters. So does hardware.

What prompted this post (been very busy, but I felt I had to get this out) was this article on HN. I know its an older article, but the points made about implementation mattering in software for a distributed/scalable system, matter just as much (if not more) for high performance hardware systems.

Viewed 15719 times by 834 viewers

Having to do this in a kernel build is simply annoying

So there are some macros, __DATE__ and __TIME__ that the gcc compiler knows about. And some people inject these into their kernel module builds, because, well, why not. The issue is that they can make “reproducible builds” harder. Well, no, they really don’t. That’s a side issue.

And of course, modern kernel builds use -Wall -Werror which converts warnings like

macro "__TIME__" might prevent reproducible builds [-Werror=date-time]

into real honest-to-goodness errors. Ok, they aren’t real errors. Its just a compiler being somewhat pissy with me. And I had to work around it. I could disable the -Wall -Werror, but that is not what I wanted to do.

So I hand-preprocessed the code. In the makefile include. Before starting the compile.


__D__=$(shell date +%x)
__T__=$(shell date +%R)

target_prep: source.c
       sed -i 's|__DATE__|"${__D__}"|g' source.c
       sed -i 's|__TIME__|"${__T__}"|g' source.c
       touch target_prep

Which, I dunno … sorta … kinda … blows chunks … mebbe ? Working around an issue by not fixing what was broke, but instead introducing a new path so I don’t subvert the intentions of the kernel build system?

Viewed 20868 times by 1050 viewers

Talk from #Kxcon2016 on #HPC #Storage for #BigData analytics is up

See here, which was largely about how to architect high performance analytics platforms, and a specific shout out to our Forte NVMe flash unit, which is currently available in volume starting at $1 USD/GB.

Some of the more interesting results from our testing:

  • 24GB/s bandwidth largely insensitive to block size.
  • 5+ Million IOPs random IO (5+MIOPs) sensitive to block size.
  • 4k random read (100%) were well north of 5M IOPs.
  • 8k random read were well north of 2M IOPs.

Over a single 100Gb IB connection with our standard PFS BeeGFS running, we sustained 11.6 GB/s and 11.8 GB/s write and read bandwidth respectively.

Viewed 24610 times by 1251 viewers

Going to #KXcon2016 this weekend to talk #NVMe #HPC #Storage for #kdb #iot and #BigData

This should be fun! This is being organized and run by my friend Lara of Xand Marketing. Excellent talks scheduled, fun bits (raspberry pi based kdb+!!!).

Some similarities with the talk I gave this morning, but more of a focus on specific analytics issues relevant for people with massive time series data sets and a need to analyze them.

Looking forward to getting out to Montauk … haven’t been there since I did my undergrad at Stony Brook. Should be fun (the group always is). Sneaking a day off on Friday to visit with my family, then driving out Saturday morning.

Viewed 27329 times by 1369 viewers

Success with rambooted Lustre v2.8.53 for #HPC #storage

[root@usn-ramboot ~]# uname -r

[root@usn-ramboot ~]# df -h /
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           8.0G  4.3G  3.8G  53% /
[root@usn-ramboot ~]# 

[root@usn-ramboot ~]# rpm -qa | grep lustre

This means that we can run Lustre 2.8.x atop Unison.

Still pre-alpha, as I have to get an updated kernel into this, as well as update all the drivers.

These images don’t simply have Lustre in them, they also have BeeGFS, and we’ll have a few more goodies as well by the time beta rolls around in a few weeks.

Viewed 29859 times by 1452 viewers