Why we go to LinuxFest Northwest

For the second year in a row since I moved to Redmond, I’ll be joining the Microsoft crew sponsoring and attending LinuxFest Northwest in Bellingham, Washington. This is one of the largest, if not the largest Linux & open source event in the region and draws large crowds of smart geeks from Canada, the United States and other countries, as well as corporate sponsors like us.

One of the questions I get the most is why does Microsoft sponsors and participates this event? Microsoft has been sponsoring and participating in many open source conferences, projects and events in many parts of the world but some people are wondering why a non-corporate, pure Linux event, and some others are naturally skeptical about it.

I don’t think there’s a single reason why we rally to convince our bosses to do it, but we have been trying to do more closer to home, when it comes to open source. There is a vibrant Linux and open source ecosystem in Redmond, the Puget Sound area and the Pacific Northwest and while we have been very active in Europe and in the Bay Area, we haven’t done a good job of connecting with the people closer to home.

For example, I recently had the fantastic opportunity to help the Pacific Northwest Seismic Network from the University of Washington to run their Ubuntu-based Node.js applications for their “Quake Shake“. I think being able to help with that project or with any other project or conference in any other part of the globe is a good thing – but there’s no distance excuse for Bellingham!

Another great reason is the LFNW community itself. We love the crowd, the lively discussions, the sharing and learning spirit. And as long as we are welcome by the community we’ll continue to seek opportunities to connect with it. Plus, this is a really cool conference. This year, I’m cutting my vacations to attend the event. A coworker is skipping church duty to help. We have heard from many engineers and program managers that they will be attending and want to carpool and staff the booth. And my friend has been investing all this time in logistic ensuring we are having a meaningful presence.

The community invites some of the sponsors to bring unique content that is relevant to the participants. Last year I had the opportunity to demo a Raspberry Pi device connected to Office via Azure. Most people in the room didn’t know Office runs in a browser, or that Azure could run Linux. But they listened and they thought it was cool. Some of them are now partners, helping customers do more with open source in Azure.

This year, I want to bring more Debian to this event because I have been working a lot inside of Microsoft to get more people up to speed with Debian-based development and we have serious community momentum around Debian in Azure. In true Microsoft tradition, we will have a cake to celebrate the arrival of Debian 8. I’ll have in mind all of those friends in the Debian community with whom I’ve been working with for years to make sure we don’t drop the ball when it comes to responding to what our customers, partners and the community want when it comes to Debian.

And, hopefully, next year we’ll be back again in Bellingham for LinuxFest Northwest 2016!

Thoughts on growth and open source services

For many years I was infatuated with the idea of creating value out of open source professional services. To a certain extent, this is a function of when, where and how I was exposed to open source. Even today, after acknowledging the challenges of this model (the hard way) I find myself spending time modelling what needs to change in order to innovate it.

While today there are statistically no skeptics of the tremendous impact that open source software has had in and beyond the IT industry, thinking prevails that the open source opportunity doesn’t lay on professional services.

It’s commonly accepted that only a handful of players have found success in this model. In fact, some would argue that it can only be one that exhausts it for everybody else. Media commentators shun on rising startups whose business model smells too much of support and services.

As Ben Werdmüller recently wrote (motivating me to write this article) those services are not recurring and not scalable. And there’s also proof in the market that well designed, talented and recognized organizations eventually fail in their efforts to seize the open source consulting business.

Back in 2008, after 5 years selling open source services either as a freelancer or in small firms, I was invited to lead technical strategy for an open source focused system integrator in Venezuela. The organization had recently scored a support agreement with a large multinational hardware vendor for a subset of their customers’ Linux needs, and they were looking for a portfolio and an attractive environment for talent and for growth.

I spent the next 3 years building a team of 50+ in several countries in Latin America, shipping open source products and solutions and managing large consulting projects for customers in the public and private sector. That support agreement became 3 partnership agreements with large IT multinationals. Yet with all the impact, the challenges of dealing with the subtleties and complexities of the open source professional services challenge remained unaddressed.

There were numerous learnings I grabbed from that experience, ranging from managing a team of talented professionals who went on to highly successful roles in Europe and the Americas, to the art of marketing something as bland and commoditized as open source consulting.

Among the fun learnings: with a highly mobile talent pool in multiple countries we managed our daily operations via IRC. We also built a lean-and-mean sales process led by the delivery teams, not sales, embraced document and knowledge management and invested in the communities and ecosystem that help open source be successful.

But I digress. Portfolio-wise, we had organized our offering in three core areas (infrastructure, applications and databases) and a number of incubation areas that gave us a unique competitive advantage such as knowledge management and end-user experience (we focused a lot on Linux in the desktop) or business intelligence and unified communications. All with open source, all with Linux.

Yet market disruptions, such as government policy in an economy where public sector concentrates an overwhelming amount of spending power, contributed to mask the unaddressed. Since 2004, there was a stated pro-open source policy in the public sector which evolved into a number of unstated policies trickling to public and private sector alike.

When this policy was introduced there was a small talent pool to cover the complex needs of a public sector that sprawled beyond the vertical with plenty of Oil & Gas, Financial Services, Manufacturing and other needs. Furthermore, virtually no relevant foreign organization took advantage of this opportunity due to general market conditions, a difference between how similar policies were rolled out in, for example, Ecuador (where the US dollar is the local currency)

Therefore, supply and demand reality made margin management, a critical discipline in the services business, an afterthought. Plus, the depth and quality of our technical results was a catalyst for business opportunities so marketing wasn’t really in the picture. We were a go-to open source consulting company and we got away with selling bland OpenLDAP clusters and Asterisk IPBX as if they were actual products, repeatable and scalable.

And in exploring other models we found support was something we actually enjoyed: we were really proactive and fanatical about it and generally speaking never had to sell a support agreement. In the training side of things we managed to set consistency standards across courses and deployments but all accrued to that non-recurring base of services, to that dreaded hourly rate. So they were never differentiated sources of growth as it always converged in a consulting project.

At some stage we did invest in a products team that explored all the right things which years later hit the market (agile embedded with general purpose Linux OS, SaaS and cloud-powered IPBXs, analytics and insights, etc.) but the reality is that our operation corpora was built on a professional services foundation which made it unrealistic to detach. We tried using a different brand for our product labs, but the talent we had attracted and developed thrived in services.

I still see the boundaries of a VAR, an ISV and an SI as pretty artificial in the open source world, just as I find it less relevant to look at the boundaries of development and IT professionals with an open source hat on. Of course the business models are different, some are based in volume and depend on marketing and channel while others are based in margin and depend on trust and references. This mix is not different from what we’re seeing today in open source startup IPOs.

Today I don’t struggle to articulate a value proposition or find demand for the open source capabilities I’m selling. I’m struggling to find the right partner to help me scale. And I refuse to believe I can only go to a global SI or a well-known Bay Area ISV for those needs, when I have lots of VARs, SIs and ultimately great people in local markets who can land meaningful solutions. Yet I’m wary about putting all the eggs in the basket of building value out of open source professional services.

We’re now living interesting times where the successful players in this space are crowd sourcing services growth via channel. This is a fascinating move from an open source support and services behemoth and has a lot of potential if it can connect the local talent with consistency that accrues to growth.

In the meantime, common sense will still indicate that entering the market to sell non-repeatable open source professional services can be highly rewarding in developing people, acquiring and developing know-how and making an impact. It can even help reduce the consumption gap for a complex product and help build market share. It just doesn’t seem to be a high-growth strategy for most people out there.

Rebasing CoreOS for ephemeral cloud storage

The convenience and economy of cloud storage is indisputable, but cloud storage also presents an I/O performance challenge. For example, applications that rely too heavily on filesystem semantics and/or shared storage generally need to be rearchitected or at least have their performance reassessed when deployed in public cloud platforms.

Some of the most resilient cloud-based architectures out there minimize disk persistence across most of the solution components and try to consume either tightly engineered managed services (for databases, for examples) or persist in a very specific part of the application. This reality is more evident in container-based architectures, despite many methods to cooperate with the host operating system to provide cross-host volume functionality (i.e., volumes)

Like other public cloud vendors, Azure presents an ephemeral disk to all virtual machines. This device is generally /dev/sdb1 in Linux systems, and is mounted either by the Azure Linux agent or cloud-init in /mnt or /mnt/resource. This is an SSD device local to the rack where the VM is running so it is very convenient to use this device for any application that requires non-permanent persistence with higher IOPS. Users of MySQL, PostgreSQL and other servers regularly use this method for, say, batch jobs.

Today, you can roll out Docker containers in Azure via Ubuntu VMs (the azure-cli and walinuxagent components will set it up for you) or via CoreOS. But a seasoned Ubuntu sysadmin will find that simply moving or symlinking /var/lib/docker to /mnt/resource in a CoreOS instance and restarting Docker won’t cut it to run the containers in a higher IOPS disk. This article is designed to help you do that by explaining a few key concepts that are different in CoreOS.

First of all, in CoreOS stable Docker runs containers on btrfs. /dev/sdb1 is normally formatted with ext4, so you’ll need to unmount it (sudo umount /mnt/resource) and reformat it with btrfs (sudo mkfs.btrfs /dev/sdb1). You could also change Docker’s behaviour so it uses ext4, but it requires more systemd intervention.

Once this disk is formatted with btrfs, you need to tell CoreOS it should use it as /var/lib/docker. You accomplish this by creating a unit that runs before docker.service. This unit can be passed as custom data to the azure-cli agent or, if you have SSH access to your CoreOS instance, by dropping /etc/systemd/system/var-lib-docker.mount (file name needs to match the mountpoint) with the following:

Description=Mount ephemeral to /var/lib/docker

After systemd reloads the unit (for example, by issuing a sudo systemctl daemon-reload) the next time you start Docker, this unit should be called and /dev/sdb1 should be mounted in /var/lib/docker. Try it with sudo systemctl start docker. You can also start var-lib-docker.mount independently. Remember, there’s no service in CoreOS and /etc is largely irrelevant thanks to systemd. If you wanted to use ext4, you’d also have to replace the Docker service unit with your own.

This is a simple way to rebase your entire CoreOS Docker service to an ephemeral mount without using volumes nor changing how prebaked containers write to disk (CoreOS describes something similar for EBS) Just extrapolate this to, say, your striped LVM, RAID 0 or RAID10 for higher IOPS and persistence across reboots. And, while not meant for benchmarking, here’s the difference between the out-of-the-box /var/lib/docker vs. the ephemeral-based one:

# In OS disk

--- . ( ) ioping statistics ---
20 requests completed in 19.4 s, 88 iops, 353.0 KiB/s
min/avg/max/mdev = 550 us / 11.3 ms / 36.4 ms / 8.8 ms

# In ephemeral disk

--- . ( ) ioping statistics ---
15 requests completed in 14.5 s, 1.6 k iops, 6.4 MiB/s
min/avg/max/mdev = 532 us / 614 us / 682 us / 38 us

Understanding records in Koha

Throughout the years, I’ve found several open source ILS and most of them try to water down the way librarians have catalogued resources for years. Yes, we all agree ISO 2709 is obsolete, but MARC has proven to be very complete, and most of the efforts out there (Dublin Core, etc.) try to reduce the expression level a librarian can have. If your beef is with ISO 2709, there’s MARC-XML if you want something that is easier to debug in terms of encoding, etc.

That said, Koha faces a challenge: it needs to balance the expressiveness of MARC with the rigidness of SQL. It also needs to balance the convenience of SQL with the potential shortcomings of their database of choice (MySQL) with large collections (over a couple thousand records) and particularly with searching and indexing.

Koha’s approach to solve this problem is to incorporate Zebra to the mix. Zebra is a very elegant, but very difficult to understand piece of Danish open source software that is very good at indexing and searching resources that can come from, say, MARC. It runs as a separate process (not part of the Web stack) and it can also be enabled as a Z39.50 server (Koha itself is a Z39.50 consumer, courtesy of Perl)

The purpose of this post is to help readers navigate how records are managed in Koha and avoid frustrations when deploying Koha instances and migrating existing records.

Koha has a very simple workflow for cataloguing new resources, either from Z39.50, from a MARC (ISO 2709 or XML) file or from scratch. It has templates for cataloguing, it has the Z39.50 and MARC capabilities, and it has authorities. The use case of starting a library from scratch in Koha is actually a very solid one.

But all of the libraries I’ve worked with in the last 7 years already have a collection. This collection might be ISIS, Documanager, another SQL database or even a spreadsheet. Few of them have MARC files, and even if they had (i.e., vendors provide them), they still want ETLs to be applied (normalization, Z39.50 validations, etc.) that require processing.

So, how do we incorporate records massively into Koha? There are two methods, MARC import or fiddling with SQL directly, but only one answer: MARC import.

See, MARC can potentially have hundreds of fields and subfields, and we don’t necessarily know beforehand which ones are catalogued by the librarians, by other libraries’ librarians or even by the publisher. Trying to water it down by removing the fields we don’t “want” is simply denying a full fidelity experience for patrons.

But, in the other hand, MySQL is not designed to accommodate a random, variable number of columns. So Koha takes the most used attributes (like title or author) and “burns” them into SQL. For multivalued attributes, like subjects or items, it uses additional tables. And then it takes the MARC-XML and shoves it on a entire field.

Whoa. So what happens if a conservatorium is making heavy use of 383b (Opus number) and then want to search massively for this field/subfield combination? Well, you can’t just tell Koha to wait until MySQL loads all the XMLs in memory, blows them up and traverse them – it’s just not gonna happen within timeout.

At this point you must have figured out that the obvious solution is to drop the SQL database and go with a document-oriented database. If someone just wants to catalog 14 field/subfields and eventually a super detailed librarian comes in and starts doing 150, you would be fine.

Because right now, without that, it’s Zebra that kicks in. It behaves more like an object storage and it’s very good at searching and indexing (and it serves as Z39.50 server, which is nice) but it’s a process running separately and management can sometimes be harsh.

Earlier we discussed the use case where Koha excels: creating records from scratch. Does this mean that Koha won’t work for an existing collection? No. It just means the workflows are a tad more complicated.

I write my own Perl code to migrate records (some scripts available here, on the move to GitHub), and the output is always MARC. In the past I’ve done ISO 2709, yes, but I only do MARC-XML now. Although it can potentially use up more disk space, and it could be a bit more slow to load, it has a quick benefit for us non-English speakers: it allows to solve encoding issues faster (with the binary, I had to do hexadecimal sed’s and other weird things and it messed up with headers, etc.)

Sometimes I do one record per file (depending on the I/O reality I have to face) but you can do several at a time: a “collection” in just one file, that tends to use up more RAM but also makes it more difficult to pinpoint and solve problems with specific records. I use the bulkmarcimport tool. I make sure the holdings (field 942 in Koha unless you change it) are there before loading, otherwise I really mess up the DB. And my trial/error process usually involves using mysql’s dump and restore facilities and removing the content of the /var/lib/koha/zebradb directory, effectively starting from scratch.

Koha requires indexing, and it can be very frustrating to learn that after you import all your records, you still can’t find anything on the OPAC. Most distro packages for Koha have a helper script called koha-rebuild-zebra which helps you in the process. Actually, in my experience deploying large Koha installations, most of the management and operational issues have something to do with indexing. APT packages for Koha will install a cron task to rebuild Zebra, pointing at the extreme importance (dependency) on this process.

Since Koha now works with instance names (a combination of Zebra installations, MySQL databases and template files) you can rebuild using something like:

koha-rebuild-zebra -b -v -f mybiblio

Feel free to review how that script works and what other (Perl) scripts it calls. It’s fun and useful to understand how old pieces of Koha fit a generally new paradigm. That said, it’s time to embrace cloud patterns and practices for open source ILS – imagine using a bus topic for selective information dissemination or circulation, and an abstract document-oriented cloud storage for the catalogue, with extensive object caching for searches. And to do it all without VMs, which are usually a management nightmare for understaffed libraries.

Surviving NYE: Times Square

Ailé and I spent the last week of 2013 travelling in the United States’ Northeast. We had decided on spending December 31st in New York City, but we wondered whether receiving 2014 in Times Square would be a sane decision. But we did it, and we loved it. And in the process, we noticed that a lot of blogs and webpages, as well as most of the people we met in New York City, actually discouraged it. So in true Internet spirit, we’ll share our learnings from a great evening with a million of our friends from around the world.

Spending New Year’s Eve in Times Square means different things for different people. Coming to terms with your expectations is the first step in this process. Is seeing the ball drop your main expectation? Or is it to be on TV? Or is it to receive party favors? Or to see fireworks? Or getting drunk with your friends? Or enjoy the artists that play during the evening? I would say that if any of this is a priority for you, then probably spending the evening in Times Square like we did may not the best option for you.

We committed to spending the evening, and receiving the new year in Times Square because we wanted to be part of it: being with strangers from around the globe, feeling the energy of 1 million people and just being dazzled by Times Square. We didn’t care much about Macklemore or Miley Cyrus, or about the fireworks, or even about the ball itself which frankly is a very small object when compared to the screens and everything else in Times Square.

With this in mind, there are options for everyone. You can reserve early in the year for one of the few rooms with a view, you can pay from a couple hundred to a couple grand for one of the parties (with no actual view), you can spend it elsewhere in New York City, like in Central Park, or, you can do as we did and join the public event in Times Square, the one that is actually broadcast on TV, and the one attended by some 40% of the people that are in Manhattan. Keep reading if you want to learn more about this option.

Surprisingly, there are few sources of information about what exactly happens in Times Square in NYE. One of the most complete sources is the Times Square Alliance which has a useful FAQ and a discussion on NYE parties and tickets, etc., but also a wealth of discussion on the actual public event. It might be worthy to monitor local news channels sites 72 hours before, as well as sources like Twitter.

But, since we found a lot of New Yorkers didn’t actually attend the party in Times Square, there are few first person accounts of how exactly this goes. So we’ll try to explain from our experience, and hopefully provide some useful insights if you ever plan to do this.

The first question is when to arrive. We arrived at 2 PM. By that time they had closed some 3 blocks and we were at 48th. Street. After we got in they closed 49th, 50th and so on all the way up to Central Park. This means we spent 10 hours there. After the ball drops, confetti rains and both Auld Lang Syne and New York, New York plays, the NYPD will allow people to leave. This is around 12:10 AM.

How do you enter the event? The event actually happens on Broadway and 7th. Avenue, so NYPD will close cross streets and put entry points on the 6th. and 8th. Avenues. Our recommendation is to take the subway and exit on the 50th or even the 59th, walk towards Times Square on either the 8th or the 6th Avenue and try to get in as up front as you can get. The worst option is to exit on the 42nd. and start walking north to try and find an entry point.

Once you get in, you’ll see pens set up and depending on the time, NYPD might have started allowing people in. There will be a metal detector wand and a bag check. Then you’re free to go stand wherever inside the pen you want. Our suggestions are to secure a spot where you can lean against the fences. Up front is best, because you have uninterrupted view. The fence helps to cope with the crowds pushing, you can lean on it, you can sit and spread your legs, etc.

Notice the crowd will be a living organism. People will start leaving when they realize they have to stand for 10 hours. People will definitely leave to use restrooms (nowhere to be found) only to discover that they can’t come back to the pen. All this will result in natural crowd movements. Every time someone leaves the crowd will push to the front, and depending on what NYPD says, even move to the pen in front of yours. Try to anticipate. Choose good crowd neighbors at least for the first couple hours. Be polite, smile!

Lack of restrooms does not necessarily mean you will stand on a puddle of urine and feces from other attendees, as some blogs say. We sat on the pavement most of our 10 hours there and we never had to cope with such a situation. Note we were in the front of our pen. The pavement is very cold, though. More on cold below.

For most of the evening there will be people selling pizza, water and hot chocolate. Drinks are a no-no because there are no restrooms, and pizza is a no-no because it will make you thirsty. If you have to, it’s something like $20 and I suggest you wait until late-ish 10 PM. Don’t wait TOO long, though, because NYPD will enhance security as midnight approaches and this includes not letting pizza to be sold. We had just one bottle of water reserved for both of us, had a pizza before 11 PM and had a few sips of water just before midnight. We also had granola bars for earlier in the afternoon.

The evening was very cold. We actually had something similar to snow for a few minutes when the sun was still up and then we had the fortune of a clear but actually very cold evening, around the -4C or so. Preparation was key. I had thermal underwear and snow pants, and then a thermal shirt, a T-shirt, a sweater, a synthetic fleece jacket and a heavy fleece jacket on top. I had ear muffs, a thermal hat and thermal gloves. We had double wool socks and both hand and feet warmers (feet warmers are not awesome, but hand warmers are amazing) Also, you will get a runny nose.

You have to think how you will kill 10 hours. I had bought a couple e-books to read on my Kindle app, but I did not read as much as I wanted as the gloves were not touch-ready. I had to control the phone with my nose. We talked a lot to each other (most of our crowd neighbors were Chinese, Japanese or Korean) and listened to music for a while. AT&T data service was not bad for such a big crowd. The event officially starts at 6 PM, and they will keep you entertained with some stuff like the sound checks, an hourly countdown, some videos, etc. We really liked the NASA NYE Video and the AP 2013 Review.

We learned some interesting things. For example, we knew that some people from some parties were allowed to go out in “expanded pens” 15 minutes before the ball dropped. They just didn’t knew which lucky ones were going to be allowed to do so. Also some bystanders were allowed entrance some 40-50 seconds before the ball dropped so if you just wanted to see the confetti and take a quick picture before the crowds were released, that’s also an option.

Going back to sleep requires preparation, too. Going underground is impossible and so is taking a cab. We just had committed to walk to Columbus Circle, but we were actually surprised that most people were walking towards Times Square and not away from it. So trying to walk against the crowd and within the NYPD barricades was a bit awkward. We did end up in Central Park near the Bolívar statue which was a photo-op for Ailé, and then were surprised that the Columbus Circle station for the Uptown 1 was not crowded.

And finally, was it worth it? Just take a look at this or this. It was absolutely worth it!


Get every new post delivered to your Inbox.

Join 2,116 other followers