Software Is a Social Activity
Three years of Practical Software Engineering at Ticketmaster
Copyright © 2015 by Cam Riley All Rights Reserved
Why Write This?
When I left Lifelock in 2012 I wrote a thirty thousand word essay on my experiences there titled, Everything is Engineering Now. The original motivation for doing a long form essay was the book, “Coders At Work” which I thought did not cover any of the issues that a day to day coder in the modern software industry went through.
At Lifelock I honed my opinions and experiences when it came to quality. While at Lifelock I pushed unit testing as a leading indicator of quality and it payed off, the Tempe Engineering Group produced the highest quality code in the company. At Ticketmaster I had a similar experience, but this time with a Front End group rather than a Middleware group. Unit Testing, Integration Testing and Functional Testing are radically different with Front End technologies.
The three years I was at Ticketmaster also saw some amazing change. When I started there the main product the team I worked with took six months to get into production, it had no unit tests and carried over two hundred bugs when in production. Three years later the same engineers could get an artifact into production within fifty minutes, had 100% unit test coverage and were carrying zero bugs from release to release.
At Ticketmaster I acted in the role of Principal Engineer, Technical Lead, Technical Manager and as a Manager for other Managers. Hopefully this long form essay helps out other software engineers facing the challenges that come through dealing with an organization transitioning to scrum, a big bang project, changing focus and direction and the challenges that come with focusing on high throughput and high quality.
Interviewing With Ticketmaster
In late 2012 Ticketmaster embarked on the Jetson Project. This was a big bang project which was intended to replace the existing twenty or so ticketing systems. With the big bang project came several hundred million dollars to expand the engineering organization. According to the Vice President [VP] who interviewed me, I was the first Jetson hire in the Scottsdale office.
The way The Valley, or Phoenix, tech community works is that at any moment in time there is at least one company going through an investment cycle where they are hiring as many software engineers as they can gobble up. Either the local office is expanding, they are ramping up for an IPO or there is a big project happening and the company is trying to gather as many software engineers under their roof as they can. In late 2012, that company was Ticketmaster. The investment that Ticketmaster was making in itself coincided when I was ready to move on from Lifelock. As a result I interviewed with Ticketmaster in September of 2012.
The Jetson investment was coming with a great deal of change. Ticketmaster was moving over to a Spring and Java tech stack. The company was also converting all their engineering offices in Hollywood, Scottsdale, Seattle and Quebec to Scrum. These were big changes for a company that had a small but incredibly capable engineering group. In the Scottsdale office the VP was interested in interviewing software engineers that had dealt with Scrum, Spring, and Test Driven Development [TDD]. That bill fitted my experience almost perfectly.
Interviewing is hard. It is stressful and nerve wracking. The main issue is you only do it every three to five years or so. TLDR; you suck at your first few interviews. When I interviewed with Ticketmaster I made a couple of faux pas. The main one was I was given a white boarding question and the first thing I did was rub the question off. I had it in my head, but one of the interviewers later told me that behind my back one of the other interviewers mimed, “Why did he do that?”
One thing I do try do when I interview is get feedback of how well I have done. I interview infrequently enough that the fastest way to improve is to ask the interviewers at the end of the interview what I can do to get better. As it turned out, that was the question that got me the job offer at Ticketmaster, it was what tipped the scales in my favor.
One of the discussions we had during the interview were about quality and teams unit testing. I have been in companies that don’t unit test and it is impossible. The quality is poor, throughput is slow and the teams cannot function as software engineers or individuals; the operational pressure from low quality code is just too great. I had no interest in being in that position ever again. The interviewers wanted to unit test and were comfortable with any fall out that came with the change in testing. One of the big problems when you introduce unit testing is not everyone wants to do it. That proved to be that case at Ticketmaster where there were some dogged hold outs.
The Original Phoenix Tech Startup
Ticketmaster was a Phoenix startup in the 1970s that was the brain child of four Arizona State University [ASU] students. Their idea was not original, there were a couple of other companies at the time that were trying to digitize ticket management with the new mainframe and computerized systems, but the Ticketmaster software that came out of the decade would prove irreplaceable.
That software was known universally in the company as “The Host”. It should probably be called “THE HOST” such an amazing piece of software the Host is. Originally it was just known as “the system” but one of the engineers in the late 80s started calling it “The Host” and it stuck as a name, noun, adjective and verb.
In 2015 the Host was helping Ticketmaster do ten billion a year in revenue and defying multiple attempts to replace it with more modern ticketing software systems. The Jetson project was to spend nearly 320 million on trying to replace the Host only to discover it was impossible. This was not the first attempt to replace the Host either, there had been several attempts and the Host out performed and out shone them all.
The genius of Ticketmaster was not only the Host software but also the business model. Prior to Ticketmaster, ticketing was a cost center. It cost venues money. When Ticketmaster came along they made ticketing a fee which was added to the price of reserving a seat for a performance. Suddenly the venues got more money because they didn’t have to pay for tickets and later on they would split the fee money with Ticketmaster. In return Ticketmaster asked for exclusive contracts over several years. Venues were happy to do business with Ticketmaster, where once ticketing cost them money, it was now adding to their profits.
Ticketmaster’s biggest competitor when they started was Ticketron which was another digitized ticketing company. The biggest difference between the two companies was that Ticketron could only do best available seating. Ticketmaster’s software was so sophisticated that it allowed you to choose your seat if you wanted. This seems trivial in 2015, but in 1970, this was the difference between dropping off into nothing and become a multi billion dollar global company. Such are the tiny differences that economic and technical dominance is made of. Ticketmaster bought the largely irrelevant Ticketron in 1991.
The second gem in Ticketmaster’s ticketing arsenal was Archtics. This was the ticketing system that was built toward solving the problems of season ticket holders and the arts community. Most of the NFL teams, NBL teams, NBA teams and NHL teams used this software. It was especially useful in dealing with season ticket holders that had a different problem domain to the On-Sales that Host taken care of. With these two systems and the sharing of fees with clients, Ticketmaster had a strong hold on the ticketing market in North America and large parts of the globe.
Ticketmaster’s proud secret is that On-Sales are only possible because of the Host. When a tour goes On-Sale for a major artist the Ticketmaster website gets hammered by fans, by brokers, by bots and by anything else that happens to be crawling the web. The ticketing system backing this deluge of requests for tickets, for reservations and for purchases is the Host; and it handles it with ease, with aplomb and a snub to the newer e-commerce technologies.
During 2013 we were trying to hire engineers for the Resale project. At the time I interviewed an engineer from Hautelook dot com which was a flash sale website. My wife used Hautelook and when I mentioned we were interviewing an engineer from that company she said, “Their site is really slow.” I remember replying that when it came to On-Sales the only company that had solved that problem was Ticketmaster; and it was courtesy of the Host.
It is not just with On-Sales that the Host shines, it is also in the Box Office. These days we think of buying tickets online through websites but in the 1970s nearly all tickets were sold through the Box Office. Sales this way are a very different paradigm to the e-commerce one. At the Box Office clearing out a line is the most important goal. This means that latency, rather than the ability to handle load is more important. Once again, the Host can do both, it is so flexible that load and latency can be taken care of with the same software.
My first day at Ticketmaster I was put in the same cube row as the project managers and product managers. My first thought was that it would be a very noisy cube row. As it turned out they were all lovely people and were all into craft brews. It was a case of “You like IPAs? I like IPAs too! We must be best friends.”
I don’t want to say Ticketmaster had a strong drinking culture, but Thursday evening happy hours at Papago Brewing were nearly sacrosanct. The Phoenix office throws a lot of company parties too during the work day where food and drink is brought in. More than any other place I have worked at. The cube row I worked in eventually got christened as “IPA Row” replete with its own street sign.
It is easy to forget that Ticketmaster is a global company and because Host development is in Phoenix that a lot of people from out of town come to Phoenix for a few days and then take off again. When I first started working at Ticketmaster people didn’t say hello to me in the hall ways, after it was obvious I was not an out of towner and a new employee more and more people started saying hello. It is a strange quirk as Ticketmaster is the nicest place I have ever worked but it wasn’t obvious the first week or so.
Prior to my arrival Live Nation and Ticketmaster merged together as one company. It was supposed to be a merger of equals but Live Nation largely chopped the executive head off Ticketmaster. As one engineer put it, “Live Nation is a services company, and Ticketmaster a tech company. Live Nation didn’t really know what to do about Ticketmaster.” But everyone loves Ticketmaster’s reliable revenue each year and Live Nation is heavily saddled with debt.
With Live Nation and Ticketmaster merging it meant there were two systems for everything. For timesheets, PTO, expense reports, wiki, etc, etc. To make it worse with the new Jetson project there was also duplicates of all the platform tools. It was a mess. Each system demanded its own username and password which meant everyone was writing them down on stickies attached to their monitor. Not good. This problem got solved over the three years I was at Ticketmaster but when I arrived it was unnecessarily complicated.
When I arrived at Ticketmaster the bandwidth in the office was horrible. A common joke was that working at the office got better on Fridays because so many people worked from home and were not sucking up bandwidth. This too got solved and to the IT departments credit they put in a WiFi solution that was company and location wide. After they put that WiFi system in, I did not use a cat5 connection again. I could also travel to Hollywood or Seattle, flip my laptop up and it connected seamlessly. It was an excellent solution.
One of the questions I liked to ask was, “What is Jetson?” The question was answered irrevocably in 2015 but prior to that, everyone had a different answer to the question. The answers could range from public APIs, to Java tech stack, to new ticketing system. One of the major issues with Jetson was that it was ambiguous as to what it was and what it really was solving. As a result everyone had a different idea of what it actually was consequently everyone thought they were doing the right thing even though they were often working against each other.
Part of the intent of Jetson was to update Ticketmaster to a universal Java stack. This included Spring, ActiveMQ, Camel, Hibernate, etc etc. Basically it was intended all the middleware stack would be the same and there would be universal libraries for logging, metrics, etc. Previously the tech stack at Ticketmaster was a hodge podge of different technologies that differed from project to project. One good thing about the Jetson project was that it made the middleware consistent.
All in all though the standardization of the technologies as part of the Jetson stack was a good thing. It made it easier to stand up new products and meant that there was a common language between the products and the services. Prebuilt client libraries were easier to share as was a lot of the platform engineering libraries. In the end there was a Spring Boot maven archetype which gave all the platform engineering technologies straight out of the box.
The first Jetson project was ‘Resale’. Ticketmaster dominates the primary market of tickets but the secondary market [Resale] was getting larger and larger as StubHub and other companies made it easy and trustworthy to resell an unused or unneeded ticket. Previously you sold it to a scalper outside the venue, or you put it on craigslist and hoped that the buyer - and the seller - were legit.
Ticketmaster’s bet was the buyers in the market for tickets to a concert wouldn’t pay much attention if a ticket was being sold for the first time or second time, third time, etc when they were looking on ticketmaster.com for tickets to a concert. This turned out to be correct and the resale market is generating a significant amount of money for Ticketmaster and helping the company reach its goal of selling a billion tickets a year.
The resale project was the company’s first attempt at setting up a modern SOA stack on the Java platform with all the standard e-commerce technologies like Oracle databases and NoSQL databases. There were a lot of lessons learned. The main one was that Mongo DB couldn’t handle the pressures of ticketing. The Microservice architecture was also slow. Fortunately Ticketmaster’s draw is big enough that we had time to fix some of these issues as time went on.
The Concerts Jetson project was supposed to be the primary ticketing system to rule all primary ticketing systems. It was intended to retire the Host, Archtics, Ticketweb, Microflex and the other seventeen ticketing system Ticketmaster had bought over the years. Back in 2010 Ticketmaster was noticing a drop off in their direct clients. It was only a small percentage each year but when they asked the clients the response was that the tools for event management were dated and difficult. When Live Nation left the Ticketmaster fold and tried to create their own ticketing system, Ticketmaster decided to act. When Live Nation’s in-house ticketing system was not up to scratch they decided to buy Ticketmaster.
The Concerts Jetson project was mainly to create modern web based Event Management and Inventory Management systems. There had already been some movement in that direction, Inventory systems could be used through a modern UI in the Ticketmaster portal but it was intended to make a big bang and have everything Event, Inventory, Box Office, Online and Payments work through the new Jetson system.
Suffice to say it failed. I don’t want to go into too much detail but it was a very frustrating time for the teams I worked with as it felt like we were spinning our wheels far too often and the direction was broken. Ultimately the new Jetson system was too complicated and too slow. There was no way it would have worked. I suspect the executive management would have bulldogged it out except the roll out was too slow and they were having to pay for several working ticketing systems already.
While Jetson itself stopped, the products and technology that came from it did not. The product that I worked on with the SALES and SUPP teams was grafted on top of the Host inventory system. The most famous of the Jetson technologies is probably Harmonia which amazingly increased the speed of the already blindingly fast Host system by sixty times.
Jetson Is Dead, Long Live Jetson
When Jody Mulkey came on as CTO you could tell there was a change in direction. Mostly it was behind the scenes. Too much had been invested in the Jetson stack to just stop outright. I recall there was a Town Hall meeting that Jody held and I asked him about the Box Office Point Of Sale software we were developing. I said;
“We have built this high quality and good looking web application, but it has no customers.”
This was one of the frustrations of the Jetson project. We were developing applications for the venues, but because they were on the Jetson ticketing system, until that went live, no-one was using the software. Jody’s reply was:
“What have you done about it?”
Which made me grind my teeth in frustration. The good side was I felt I had been both challenged and given a green light to prove that the Jetson Box Office could sell Host inventory. Something that Product was adamant would never happen. The next Friday the two teams I worked with spent the entire day in a hackathon trying to get the Jetson Box Office to sell Host inventory. By 3:00 pm that day, thanks to some very talented engineers, we had a printed ticket in our hands from the Host system. We had reserved, sold and printed a Host ticket!
I sent an email out that evening with a photo taken from my phone of the ticket to our VP and Jody. The email must have done the rounds because I got an email from the head of product who asked,
“Is it crazy to think that Jetson Box Office can sell Host inventory?”
I replied that we had built the Point of Sale system to sell any thing. It was a Point of Sale system first and completely unaware of what it was selling. Making the Point of Sale sell Host, Archtics, Resale, etc inventory was just a case of hooking the appropriate inventory system underneath it.
We were not the only teams proving that Jetson technology had immediate use with the current ticketing systems. The Harmonia project was the biggest of them. When I had been challenged to hook the Point of Sale into the Host, Jody was also meeting with the BEDROCK team about Harmonia and how it could make the Host even faster.
The ENTRY team which develops the software for the barcode scanners at the venue doors took a similar approach. They already had to support all the current ticketing systems. To them Jetson was just another ticketing system and they developed their software that way. It meant they were Jetson minimal viable product [MVP] six months before expected. Approaching Jetson that way was contentious with Product but it was the right decision.
The barcode scanner software is pretty cool. Like the Host it was developed in a different time and is being updated bit by bit to modern standards. When I asked the Lead Architect for the ENTRY software of the billion tickets Ticketmaster expect to sell, how many will the scanning software scan in?
“A billion.” He replied.
In late 2015 when Jetson collapsed under its own weight, the new Jetson technologies became part of a pivot where anything Jetson that could be grafted into the existing ticketing infrastructure was. The Point of Sale system we developed for Jetson was one of the two big Promises of Value that were bet on at the end of 2015.
Thankfully we developed the Point Of Sale system well enough that it was inventory agnostic and the changes to formally sell Host inventory through the Jetson Point Of Sale were pretty minor.
The 100% Team
When I started working with the team that would be known as SUPP we decided to stake out a simple goal. Every line of code should be covered by a measurable test. In front end engineering that usually means unit tests as the integration and functional testing side is pretty spotty. So the goal was 100% unit test coverage.
There were multiple reasons for this. Firstly the team had not unit tested before and we wanted to keep the goal simple and measurable. Basically any code you added should be tested. Secondly, most of the team had less than three years software engineering experience and was not experienced enough to make value decisions as to whether something was testable enough. The simplest dividing line is test everything in this situations.
Thirdly engineers love to argue, “But the models!”, “But the DTOs!”. The answer was find a way to test them that is meaningful because you are not in a position to make value decisions. Usually the “But the models!” argument is really an argument for, “I don’t want to do it.” and this was not acceptable.
Finally the idea of testing every line of code you write was a sign of completeness, professionalism and ensuring all your code was not only fully tested but also testable. One of the main differences between unit tested code and code which is not tested is that non tested code is quite literally non-testable. It is ugly, useless and a mess. I agree with Michael Feathers, it is instantly legacy code without unit tests.
When the Java code base started getting close to 100%, myself and two other engineers stayed late to get every line of code covered with unit tests. We had got it down to two lines that were not covered according to Cobertura and sure enough one of them was hiding a bug. There was a break statement instead of a continue statement. Consequently when I get asked why should you test to 100% I tell the story of how the Resale Support Tool had two lines uncovered by tests and one of them was hiding a bug.
During the Resale project the SUPP scrum team got known as the 100% team. At the end of each sprint I would send out a summary of the sprint which would include the user stories, velocity, unit test coverage and number of functional tests. I would forward these emails on to Vice Presidents, Team Leads and other engineers that were interested. Ultimately the SUPP team got known for their approach to unit testing.
In 2015, one of the younger SUPP engineers went to a hackathon in Hollywood. He was talking to his team lead for the hackathon and mentioning how 100% was the norm for the team. The lead for the hackathon was amazed at the level of unit testing code coverage. Ironically the junior engineer had not known anything different. When he came back to Arizona he asked me;
“Don’t all teams unit test to 100%?”
I had to reply “No, no they don’t.”
When I first came to Ticketmaster I was asked to do a Tech Talk on Product Quality. The talk I did was the second Tech Talk and about three hundred engineers were on the call. I had done a talk to the Scottsdale engineers prior to doing it in front of the entire company, but the Tech Talk was done over a Webex and the phone. The slides had an amusing take on “Fight Club” but when you are talking over the phone when everyone is muted, it can be really odd and paranoia inducing when you are the only one laughing at your own jokes.
“Why should you unit test?” An audience member asked.
“Why even bother coming to work.” Was his terse reply.
I love that answer. He was saying if you are not going to unit test, then why turn up each morning, why even get out of bed! He was intimating that unit testing was the new bare minimum for engineering competency. If you were not prepared to unit test then you are a liability to your team members and further you were just half arseing your job. As he said, if that is the case, why even come to work?
When I started at Ticketmaster source control was done through SVN. As a source control system SVN is pretty good but it is weak when it comes to merging branches. Additionally you tend to end up with a lot of folders that are branches, tags, etc. Which can become confusing over time.
The BEDROCK Team had their own server which had Mercurial on it. They had recently installed Git on the same server and one of our principal engineers wanted us to move over to it for the Resale project. One of the cultural values in Ticketmaster is embrace change, so embrace we did and moved over to git.
I had not used Git before but rather than git being the new hot thing it was flat out replacing most previous source control systems. The advantages of Git over SVN are largely in merging. Git does it so much better and with less merge conflicts. The downside of Git is that it feels more complicated than SVN and it takes a while to get people used to it. Later when Atlassian’s Source Tree came out, Git repos became easier to manage for most engineers. The Source Tree UI made its branching and merging seem more intuitive and less fraught with error.
The Principal Engineer that argued for the team to move over to Git was also a Git master. With a few piped commands in the terminal he could fix any error, any branch mess ups; basically anything to do with Git. Since we did feature branching this was invaluable as every now and then with Git you can your local repository into a state that looks like it can never be fixed. We were fortunate to be able to run to someone who could fix anything within minutes.
The same Principal Engineer was indicative of some of the amazing engineering talent that existed at Ticketmaster. I was talking to the same engineer about the movie “The Martian” and he off handedly replied that code he had written for the Mars Pathfinder’s camera system was in the movie. He quite literally has code on another planet! That just blows my mind.
As Git became more and more popular throughout the company, Ticketmaster got a Git Lab license and all projects moved over Git Lab as the primary interface for product repos and individual repos. As Ticketmaster was doing that, it was also moving all its apps over to single sign on as well, so the move to Git repos on Git Lab were not a big deal.
Some teams adopted Git Flow, other adopted the Git Lab style of merge requests, for us we stuck to Feature Branches. This is where an engineer or engineers doing a User Story, created a new branch for that user story and then merged it into master once it was completed. As long as you merged master into your branch every morning, this style of development was good. Even when we had two scrum teams working in the once codebase, this style of Feature Branching proved effective and manageable.
It took me a while to get used to the different nature of Git from SVN, CVS and Perforce which I had used in the past. As I mentioned earlier, Atlassian’s Source Tree made it easy to manage both remote and local branches as well as keeping master up to date. Anything other than creating branches, committing locally and merging master I tended to do from the command line.
Spring was the first of the Dependency Injection [DI] or Inversion of Control [IoC] containers. For this reason alone it is awesome. Dependency injection allows you to change the injected object at runtime. The awesome thing is runtime can be when it is in production or when it is being tested. This allows you to inject different objects to satisfy both. This means DI applications are super super super super easy to unit and functional test.
Spring has some other things going for it. It runs on Tomcat and Jetty which is lighter weight than the JEE containers that are Spring+Tomcat’s main competitor. Spring also has a lot of standard libraries that solve standard business problems. It is easy to wire in integration points such as REST services, databases, messaging systems, etc. Spring supports most of these systems through configuration which means you have less setup code littering your own business code.
Another advantage of modern Spring is that configuration is now done through annotations which become compile time checks. This fits perfectly with Deming’s approach of test up front rather than inspection at the end. It does mean in a lot of cases that configuration is checked by the compiler courtesy of Spring making configuration both metadata and compilable.
The downside of Spring is that DI comes at a cost and that is complexity. It is hard for engineers to understand how the Spring container works and how a Spring Bean is different from an instantiated object. Spring Beans are very close to singletons and programming with them leads to a functional programming style but it can be hard for engineers to wrap their brains around how it all works and how injection works which can lead to bugs and odd behavior.
Spring can also be an all or nothing. You have to embrace Spring 100% to make it work. The other all or nothing aspect is if something is not configured correctly then you get five hundred lines of stack traces and exceptions to dig through. In amongst those five hundred lines is two words that is the clue as to why your configuration blew up at runtime. For younger engineers getting comfortable with Spring stack traces and configuration can be a daunting task.
Enterprise Service Bus
The Enterprise Service Bus [ESB] is a wonderful idea but I have never seen it actually work in production. With the advent of the popularity of REST it has even less reason to exist in middleware ecosystem. The original idea behind a Service Bus is that existing services can be hidden behind it and the Service Bus can provide a strong aggregated or orchestrated contractual interface. It turns out though that it is easier to do that in the Java layer and expose it as a service.
The Service Bus is also fighting the direction that services are going. As more and more services were made, and then idea of a Microservice became popular the heavily centralized nature of the Service Bus is harder and hard to maintain and justify. At Ticketmaster as services became more and more automated it was easier to stand a service up and then expose it to the outside world via a VIP. The Service Bus got left behind.
The main use for the Service Bus in Ticketmaster was for a centrally managed set of SSL certificates. The services that required certificates such as PII or PCI were put behind the Service Bus. As certificates become decently managed then it is likely the Service Bus will be retired, but for now, the Service Bus has little to justify its existence.
SOAP vs REST
Netcraft now confirms that SOAP is dying. Actually it is worse than that, SOAP has lost and REST is the dominant winner. When the Jetson project started it was intended that the Front End applications would talk to their Java layer via REST and AJAX but would make calls down to the Enterprise Service Bus using SOAP to get data from the middleware.
The downside of SOAP was that it made calls binary, they either worked and a response body was returned or they failed and a fault was returned. The fault was easy to turn into an exception in Java using some conversion layer like CXF. But the reality is that REST is more human readable, it is not uncommon for engineers to hit a GET REST interface through a browser to check that the data is coming back.
The other advantage of REST is that the HTTP Status Codes give quite a deal more of information about the service response. For the most part only a few status codes are useful but they give more contextual feedback from the server than a fault object does. The downside of REST is that HTTP Status Codes were never meant to be used that way and it can be pretty confusing when you get a status code that you weren’t prepared to handle explicitly.
Eclipse to Intellij
One of the big changes Intellij allowed was that we didn’t have to check in the Intellij equivalent of .project files. The import feature for Intellij using maven was very strong and meant that anyone from an intern to a junior engineer to a principal level engineer could get a project up and running very quickly. This was a big difference from Eclipse projects that tended to be complicated enough that it was easier to check in the .project file and everyone use that as a baseline.
IRC and Slack
Every workplace requires electronic collaboration of some kind. My favorite tool for this job has always been IRC. However the world is changing and most people preferred IM clients. Jabber was the most used internal one. Fortunately Ticketmaster has a lot of older hardcore engineers that remember the internet when it was burgeoning to life. As a result there was a Ticketmaster IRC server and in typical Ticketmaster style it was on a server under someone’s desk.
The main team on the IRC server was the TMOL team. Their lead, Jon Philpott was super chatty and I made friends with him through IRC before I ever met him in person. The first time I went to Hollywood I ended up messaging him through IRC and saying where are you? I remember introducing myself and several people saying, “Oh you’re Cam!”
IRC was never that popular inside Ticketmaster, it was mainly the older developers using it. When Slack was introduced into the company it took over pretty quickly. As it turned out many different teams were already using Slack for their products and eventually it got subsumed into one big Ticketmaster Slack channel.
Slack is basically IRC with sugar thrown in. For instance the ability to post code into a Slack channel that is nicely formatted is a good feature. When URLs or images are posted into a Slack channel then they are nicely formatted and embedded as well. That all wrapped up in a nice UI makes for a pleasant IRC plus style experience. I can fully understand Slack’s popularity.
Ticketmaster’s head office is in Hollywood. You can’t walk down Hollywood Boulevard without seeing a million tourist shops all selling cheap little gold Logie figurines. When User Experience [UX] started working close with the teams producing the Point of Sale system the working relationship continually improved and when the final UX design was completed I bought Gold Logie’s for the User Experience team members and thanked them for making the product beautiful. The UX members thought they were awesome. It was a great way to award a lot of hard work and patience.
The use of trophies spread through the company as well. Other teams and Scrum Masters started purchasing them to celebrate projects being completed or products going out to production. It was not unusual to see someone’s desk with three or four trophies documenting the different products they had worked on.
Hiring For Diversity
I believe diversity in software teams is important. That doesn’t say much in and of itself as everyone agrees that is true, but the reality most people don’t actually do anything to improve diversity, especially in the tech industry which tends to be male dominated due to the common entry requirement of a Computer Science degree.
It is important to hire for diversity as it appeals to our better angels. When you have a white male frat boy software teams all the worst aspects of human behavior come out. In those situations any joke, no matter how mild about sex, gender or race has to be stopped immediately. Otherwise these behaviors creep in and become the culture of the team.
When you have a software team that is a mix of women, minorities and in some cases trans-gender then the better angels come out in the individuals that are a part of the team. Software is a social activity now and to get a User Story done it often takes from two to six people working together to get that software in production. With a diverse team working together comes friendship, empathy and ultimately understanding. It makes us better as individuals. As a manager your goal is to improve people and teams, hiring for diversity takes care of a lot that on its own.
In my time at Ticketmaster I hired two women and five men. When I started managing other managers, my two direct reports were both female managers, which was good. They were both exceptionally talented engineers and managers who faced some very difficult challenges and responded exceptionally well.
I also like to try and use our production code as I want to see their reaction to what is common place code for us. There was one engineer I interviewed who tested our code, found a bug in it, fixed the bug and then handed me my laptop back. I ran maven install, then checked it into master. A couple of weeks later the same engineer joined us working on the resale project. It is very easy to say yes when something like that happens.
In front end teams I like to hire a dedicated CSS/UX developer. My belief is that products are treated better when they look great. Again I like to give candidates real world problems that we face day to day. We had a problem with a div dropping below the ones next to it and made the application look ugly. I gave the candidate my laptop and that problem and asked him what he could do about it. He spent most of the interview focused on it, enough that he waved off a couple of questions while trying to solve it before he realized what he had done. Which was amusing. Again, that was an easy yes.
It is harder hiring engineers out of college as you are taking a big bet on their capabilities. We did pretty well at that. One engineer we hired which turned out to be an excellent engineer capable of shouldering senior level tasks was hired because of his pragmatism. I can recall chatting to him about his Capstone Project that ASU got him to do in his final year. I asked about how he deployed to production and he told me how hard it was and took ages to work out all the issues before it even deployed successfully. Sounds like real life.
Ticketmaster changed their approach in 2013 by starting a very large internship project. This ended being a great program for introducing talented young engineers to Ticketmaster. We were able to see them work in teams on User Stories and Bugs and how they responded and learnt. It was normal for the good interns to be kept on part time until they finished their degree and then given a full time job.
The interns became a major source of energy in the office as well. The teams I worked with had several well established junior engineers that the interns would crowd around. It meant that our cube rows were always full of life and laughter which is very important.
It is the same with unit tests. All our Spring beans had protected as the visibility for the methods and the reason was so they could be tested directly and in isolation rather than through the one or two public methods. Uncle Bob Martin wrote, “Tests trump Encapsulation. No test can be denied access to a variable simply to maintain encapsulation.” Testability is so important it should drive design decisions.
The best design is a fully testable one.
Culture Of Testing
When you have a strong enough culture of testing then you can convert your SDETs and QA into engineers so that they are adding to the throughput by doing User Stories. The culture of testing also becomes team based where the teams are responsible for the quality of the product they are producing. Rather than testing being a hand off to another team and another organizational structure the team as a group starts to assert minimum standards of quality in their product. That kind of pride and consensus makes for great products and the high quality results feed on themselves.
We found that when engineers moved to other teams that did not have this culture of testing, the first thing they did in their new teams was set up unit tests, code coverage, functional tests, continuous integration and continuous deployment pipelines. Which was fantastic. For those engineers the culture of testing had become the norm and the path to high quality products and when it wasn’t to that standard in their new teams they started introducing that culture to their new teams using their own initiative which was wonderful.
Service Oriented Architecture
Microservices are often compared to Monolithic applications where all the code is one artifact and there is one big database underneath it. That is not how most places are structured. Most companies have already split out accounts, orders, billing, fulfillment and product into some kind of SOA. The question for Microservices is how much further do you split out these basic business functions?
Product is important as this is what you are selling. Ticketmaster’s product is a barcode. It is what buys you a lease on a seat in a venue for a particular time slot. The barcode is what is scanned and gets you into the venue. At Ticketmaster product is supplemented with discovery and offers.
Order Capture is a snapshot in time. It has to answer;
- Can I bill you?
- Can I fulfill you?
- Are you real? (fraud)
Usually a cart or some other mechanism gathers enough information that the purchaser can be billed for the product and can be fulfilled for the product they have purchased. In the case of Ticketmaster this tickets being delivered to the purchasers mailbox, their email, txt message or phone app. There is also the step to make sure the purchaser is a real person and that the purchase is not fraudulent.
It is important to note that Order Capture has no history. It is a moment in time. Billing and Fulfillment have histories. With billing there may be changes such as refunds, chargebacks, etc. With fulfillment there may be changes such as tickets not coming through the mail and being sent via email instead. So the basic SOA structure of any business should follow those patterns:
- Order Capture
Most companies also have:
- Digital Marketing
- Data Science
Websites are sophisticated enough these days with A/B testing that digital marketing can work on them to improve the conversion rate. Most companies also have some kind of Reporting or Business Intelligence [BI]. Accounts is a no brainer, you want to decrease the friction in purchasing by saving billing and fulfillment information in an account. An account is the logical place to store emails, phone numbers and addresses along with a connection to the billing system for the primary credit card.
These days Data Sciences is becoming more and more important. Large businesses now have so much data they don’t know what to do with it. The Data Sciences group provides mechanisms to stream pre-computed data and schemas into the firehose that is Data Sciences. When schema’d data is being sent to Data Sciences its tends to impinge on reporting’s main function.
With Ticketmaster the largest chunk of data falls under Product. There are events, venues, artists, seats, sections, rows, season passes, etc. So most of the complexity is weighted in the product area. As a result Product also includes:
Discovery is where a product is attempted to match seamlessly with a potential purchaser. To aid the decision to purchase are offers, such as 10% off parking when you buy a ticket, etc. So all those categories above are the broad sweep of what makes a modern digital business that is selling products. So any reasonably mature business will already have these broken out into services whether it is old time SOAP/XML or the increasingly popular REST/JSON services.
When you compare the pro and cons of Microservices, it should not be against Monolithic applications but against a broad business separation of concerns such as above where services provide these business functions.
Conway’s Law is a wonderful observation on human behavior and it is present here. Of the chunky services I have listed above, each will have its own team, own artifact and own deployment cycles. This level of service is compatible with the two pizza teams, with Devops, with continuous integration and continuous deployment. Breaking business concerns at this level is also scalable both horizontally and vertically. So where it the advantage in Microservices over this structure?
SOA usually comes with the expectation of an ESB. History has proven ESBs to be a horrific technology and they are usually pulled out of the infrastructure as quickly as they can be after it is discovered ESBs cause more problems than they solve. So once you get past the ESB and start standing up services behind a proxy and load balancer, what is the difference between SOA and Microservices?
The only real difference seems to be size of the service. In a Microservice architecture there are more services and they are smaller in size than in a more traditional SOA which are organized around the major business components and steps. Microservices make autonomy and loose coupling more prevalent simply because there are more of them.
If you have about thirty services you can standardize, when you get to two hundred plus, the only standard becomes that they are available, they have an API and that API is REST based.
One of the failings of modern software practice is how we succumb to fashion so quickly and easily. Software engineers like to think of themselves as meritocratic, driven by rational thought, but we are not. We are the normal hodge podge of human emotions and anxieties with strong wills to belong and be accepted. In the software community this means we are suckers for software fashions. I have often made the joke that we are “software beliebers”.
The current fashion is Microservices and Ticketmaster embraced Microservices with unbridled enthusiasm. I went through the whole XML fashion where it was going to create the semantic web. XHTML was a nightmare to work with because it was too strict and XSLT … fair dinkum. Just horrible.
Then there was the Cloud, except the Cloud wasn’t just fashion, it was an amazing technology. Basically servers got put behind an API and once something goes behind an API it becomes super powerful. Because of all the fashion, marketing and beleibing around XML I kind of ignored the Cloud as it started growing in use, functionality and adoption. I was wrong. When anything goes behind an API it is a big deal. One of the reasons JUnit is so powerful is that it put testing behind a universal API. But cynicism of software fashion often blinds us to true leaps in technology.
One of the ironies of software engineering is that middleware drives a lot of the decision making. For some bizarre reason middleware is seen as ‘real’ software engineering and front end engineering an after thought. Which is the wrong way around as it is the front end that interacts with real customers. Companies are more and more exposing their data through B2B APIs but the majority of sales are still coming through the standard B2C, B2B and Mobile front end websites and applications. The push to Microservices is very much a middleware push.
So what about Microservices? I think I am willing to pronounce them fashion simply because engineers have no self control or discipline. One of the Microservices we had to integrate with was for a boolean. We had to do all the over head of handling 200, 404, 500, etc all for a boolean. We also had to make an HTTP hop in a performance constrained product all for a boolean. We had to aggregate that boolean into another data structure all for a boolean.
Worse, where we aggregated that boolean is where that boolean should have come from in the first place. I remember when Thoughtworks got involved with Ticketmaster and the Jetson project that I was interviewed by one of Thoughtworks consultants and engineers. I mentioned that the Microservices was being taken to absurd lengths and their engineer agreed that the services should be ‘chunkier’.
There are two very real problems with Microservices:
In a Microservice architecture atomicity is nearly impossible. In one product we worked on there was a requirement to update a customer account. The underlying customer account system was a massive CMS that spanned two databases and two systems. The customer account update for the product we were working on was designed for the information such as name, status, phone numbers, emails and addresses to all be collected at the same time and then the changes saved to the underlying customer account CMS systems.
The main problem was the name, the status, the phone numbers and addresses were all updated through different APIs on the customer account service. There was no way our product could do an atomic update on the customer account information as it required the orchestration of eleven different REST calls to make all the updates, possible deletes and possible creation of data that had to happen invisibly to the user of the product.
There was no atomicity.
The second problem was when we asked for a single update service we were told it was impossible based on the underlying systems, also it would most likely mean they could not meet their SLAs. So basically our product couldn’t meet its SLAs anyway because of all the calls, and, we could not guarantee atomicity.
In the end we saved a mix of state in the DOM and in our product’s aggregation layer along with a message to the end user of the product’s user interface that something went wrong and we were only able to save some of the information.
The SLA problem manifested itself several times as well. One of the problems with Microservice architecture is that the front end applications - which interact with real customers - end up having to do a lot of aggregation and orchestration.
One of the products we developed required the aggregation of multiple orders, accounts, payments and a third system that contained the connection between all these separate systems exposed to us through an API as Ids. We could never get all the aggregation together under a suitable SLA so we asked the downstream service to do the aggregation for us so we could meet our SLAs. We were told no, as it would break their SLAs.
We were the product that had real customers, but the downstream Microservice was adamant they wouldn’t do aggregation as that would break their SLAs. It was frustrating. The downstream services were too slow once they were aggregated together into something that a real user would actually need. We ended up having to ask the proxy in front of us to increase their timeout. We were not the only front end team that had to ask that either.
It is easy to under-estimate the impact of multiple HTTP calls. All it takes is for an aggregation service to start hopping across data centers and a quick aggregation in the Stage environment suddenly becomes dead slow in production. For instance the speed of light from an Arizona datacenter to a datacenter in Virginia is about 200 milliseconds. So if they aggregation application starts cluster hopping an aggregation process that was under 80 milliseconds can quickly grow to three seconds depending on how chatty the aggregation service is.
This is not an argument against services, they are fantastic, it is an argument against bottom up Microservices where the middleware is god and the end customer a second thought. Where micro purity decides the path of what is becomes a service and what doesn’t rather than an end user. Where standing up services and viewing them as throw away rather than delivering a feature to a customer do something that they would like to do today, tomorrow, next week and next year. If software is something, it is not throwaway and Ticketmaster was a good example of that. The Host has been making Ticketmaster billions for the last thirty or so years. We haven’t been able to replace it yet despite many attempts.
So when do Microservices make sense? It appears that the fashion is pushing Microservices into the nano arena with Amazon’s Lambda product where code can be executed without even having to bother with the overhead of a Microservice framework. Microservices didn’t appear in a vacuum there are obviously lots of good reasons for companies to head in that direction.
The number of scrum teams and engineering teams keep exploding as software becomes increasingly important to modern business. Ticketmaster has about fifty scrum teams whereas company’s like Microsoft have closer to two thousand. Enabling these scrum teams to remain fully productive without stepping on each other’s toes requires some separation of concerns.
Ironically we are probably looking at Conway’s Law taken to extremes as scrum teams become responsible for smaller and smaller pieces of software through Microservices. This push to smaller and smaller discrete products and deployments has been followed by technology. The Linux kernel exposes individual processes which enables technologies like Docker which run on one process and allow teams to bundle code and configuration together in a small Unix based container.
Microservices are probably an understanding that Scrum Teams are the smallest unit of productivity and delivery. The cloud technology has helped software flow in that direction and where software goes so do other products which make it faster and simpler to push to production at this level.
So what makes a Microservice? Basically it is a small amount of business logic exposed through a public contract of HTTP and REST on top of a framework that commodifies logging, health checks, heartbeat and discovery. At Ticketmaster there was a service template project based on Spring Boot that had all the commodified actions built in. But even then some teams were starting to break away from the standard Java middleware and explore Nodejs, Scala, etc.
Microservices understand that SLAs are an issue, precompute and caching are both used very heavily in Microservices to try and serve up data as quickly as possible despite the HTTP hop being made. There are a lot of patterns around trying to hide the amount of service calls being made in reality. Often though, if you need the most up to date information there is no getting around that services must call out to other services who call out to other services and so on.
One of the good things to come out of Microservices is team and product autonomy. The idea is that the REST contract is the only deliverable. What happens behind it is completely up to the engineering team. This means the application maintains its own data and data storage system.
REST has won. It has beaten SOAP handily and heavily. The problem with REST though is that it is really hard to make your real world data structures match pure REST outside of anything that is a simple data structure. Making a ToDo list, a Blog article, or a Blog Comment with a clean REST structure is easy. Making an Account that comes from a CMS system is a lot harder.
So what are the best practices for a REST system? The most concise discussion of REST best practices I have seen is from Ismail Elshareef. It is a fantastic summary of what a REST service should look like. We used this methodology on our internal products inside Ticketmaster.
REST versioning should be done in the URL. It is the best way to do it. With Java it is difficult to version through the headers. Especially as most developers hit REST interfaces through the browser when testing or viewing what is coming back. Putting the version in the URL removes all ambiguity as to what version of the API is being hit. For instance:
Scrum teams like to call their products silly names based on Star Trek, Star Wars, or acronyms they made up. The URL exposed should be utilitarian and describe what the service is doing. If it is exposing orders then you should not call it:
In Restful APIs the URL does mean something. So putting in jokes or team names into the URL is not a good idea. It impedes discoverability and intuitiveness.
For REST data structures the JSON API format for JSON responses is the best summary of what a good JSON API should pass back and forth. It is nice and consistent, not overly verbose, and there are plenty of libraries that support it. The JSON API largely defines the HTTP Status Codes which are one of the most confusing parts of the REST approach. There are tonnes of HTTP Status codes and REST kind of shoe horns itself into using them. The main ones to use are 200, 201, 400, 401, 403, 404, 429 and 500.
With these responses it is best to throw RuntimeExceptions that have enough information for the JSON API Error data structure to be created at the product’s public REST interface.
One of the great things about a REST API is that it can be made to be walkable. It can announce through self, related, links and href the information that is relevant to the URL. I always like that the root URL announces who the API is, the copyright and where to start walking through the API and what it exposes. I like the idea of the root URL being the beginning of the API being self documenting.
One of the Junior Engineers asked that I do a talk on Spring as many of the engineers were confused by it. The main question was why would you use Spring and the answer is; Dependency Injection, Dependency Injection, Dependency Injection, Dependency Injection.
The main advantage of an Inversion of Control (IoC) or Dependency Injection (DI) container is that the injected beans can be switched out when testing and new beans injected or mocked in. That is the main reason to use Spring. It makes automated testing at the unit and integration level easier.
There are some other reasons. Spring and REST go well together. It is simple to annotate out a new service interface to the outside world. Spring boot with its embedded application server also makes standing up new services pretty quick. But these days NodeJS is a strong competitor and it is easier and faster to set up a NodeJS instance with a service interface courtesy of express. Less code too than is required for Spring and the Spring container. I expect Spring will start losing to NodeJS in the next five years or so.
The other advantage of Spring at Ticketmaster was that the entire Platform Engineering stack was geared for it. Configuration came out of git repos. Deployment was through Rundeck and was built for Spring artifacts. The Platform Engineering around Continuous Integration in Jenkins was also built for the Spring stack. So at Ticketmaster it was pretty easy to spin up a new Spring product and have it flow through the CI and Release mechanisms out to production.
The front end frameworks at the moment are the wild west. New ones are appearing every day and teams are jumping on each one. We also have teams like AngularJS making major incompatibilities between versions enough that teams are scared to adopt the technology. We also have light frameworks like Backbone that encourage too much business logic in a view and then super light ones like React which only solve one issue.
Backbone is a pretty decent framework when used in conjunction with underscore and jQuery. The saying with Backbone apps is they grow one view at a time until you have too many views with a mix of controller and view logic in them. The best way to deal with Backbone apps is to treat the view as being for DOM manipulation only and move all the other logic out into an application logic layer which should make it more testable. More importantly testable without requiring the DOM as a dependency.
The other issue with Backbone apps is that jQuery is used for DOM manipulation. jQuery was written in a time before unit testing and is not the easiest to unit test when it is in a Backbone view. Often you end up spying too much on the jQuery actions. The alternative is to use fixtures and let jQuery directly manipulate a mock DOM in the test. Are you really testing the function or are you testing behavior or are you testing jQuery’s ability to change the DOM? These are all hard questions with no good answers.
The Backbone Model ended up being a bit black magic as well and took engineers a while to come to terms with how it worked. We had the other issue that the service interface at the Java level was not very RESTy and the Backbone app had an issue dealing with model structures that were not perfectly RESTable. This ended up leading to explicit AJAX calls being made out of the Backbone Model rather than the default one.
This ended up being a very strong pattern as it meant we defined our business state and operations at the domain level. With the Point of Sale system we developed we ended up defining a Domain Model that was for selling tickets and it did not need to know which inventory system the ticket came from. When the Point Of Sale was hooked into existing inventory systems that change over was very small because of the strong domain model.
Ticketmaster had engaged Thoughtworks to work with us to improve the structure of the Backbone application. We spent several days with Thoughtworks breaking out business logic that did not interact with DOM into an application logic layer. This proved to be a good model for Backbone application development.
Often you can speed things up through social changes and mechanisms. One of the approaches we took with User Stories was that you can work on nothing else until that User Story is in production. This meant that often a release would be cut and three or more engineers would have nothing else to do until it was pushed up to production and verified. It meant that the engineers were swarming on testing the release in stage and then testing it in production. If a bug was found it was either fixed immediately or if it was not a big deal or it was a large amount of work it would be put into the backlog to be prioritized.
When a release artifact was cut a simple shell script was run from the repo that created the release email. The release email had what the artifact version was, the User Stories and Bugs as JIRA links that were included in the release, which team members were responsible for taking it through to production and then the links to the production documentation such as the Runbook, Backup Plan etc. For the most part the production documentation was static in confluence. I would have loved to have the artifact contain the production documentation but it was not really feasible the way that Ticketmaster was structured.
After the release script was run an email was sent out and from there on in, the engineers mentioned in the email focused on deploying to the stage environment, verifying that the new features worked, then doing a quick spot checking as well to make sure nothing was affected functionality wise or visually. The next steps were to release to production and then once again test the new features and spot test the application to make sure no functionality or visual aspects were broken or not working.
This process worked exceptionally well when there were two scrum teams working on the one product. It forced the two teams to work together to get the artifact out and into production. This enabled the teams to get a lot of throughput done. Prior to the teams changing to Kanban, they were releasing up to twenty five artifacts to production each sprint. That is close to three a day from two teams with about eight engineers each. Quite remarkable.
I can recall I was in a meeting where the client representatives were being shown the new Point Of Sale system. The lead Product for the Point of Sale was saying how my teams were releasing changes very quickly. He asked;
“How many releases have they done this sprint?
“Seventeen”, I replied but in the Australian manner of dropping the t.
“Seven! They have released seven new features!”
“Sorry, that is my accent, it is seventeen.” I said this time saying the t in the heavy manner of the American accent.
As I finished the sentence my inbox lit up with another release going out to production. I didn’t bother correcting the number but it was a good indication of how the through put just kept on trundling along and became the norm that release upon release would go out to production without drama.
This release methodology also kept pushing down cycle time. There was one demo that a Product Manager was doing for the executives where he found a bug as he was going through his demo plan. It was the day before the demo and the pressure was on. One of the engineers saw the bug come in, fixed it, deployed it in stage then production and fifty minutes later, after the bug was reported, the fix was in production.
There were limits as to how long it took to release an artifact to the virtual machines. The artifact had to go through stage first and be tested there before it could be called ready for production. So there were distinct physical limits on how low we could get that number. That being said, when I arrived at Ticketmaster, a similar Point of Sale system had to go through a three to six month QA cycle before it could be released to production. So over the three years we had reduced that cycle time from months to minutes.
We had also made releases ordinary. They became just part of the daily practice of software engineering. The previous Point Of Sale system that went through a multi-month release cycle was also difficult to release into production. The release was done at night, and if any issue was found it would be rolled back. When the previous Point of Sale system went into production and not rolled back a party was held afterwards. It was quite literally a big event.
By doing the constant releases during a sprint and later in kanban releases went from an event to something you do once a sprint, to something done for every User Story or Bug. It became a non-event and something you do to finish your User Story. Releases were normal part of the rhythm of developing software.
Releases became boring.
The last ten years has seen large changes in how work is organized. We have gone from Waterfall to Scrum, Kanban, Devops, Swarms, Hackathons, Pair Programming, Swarming, etc, etc. With each change the process of creating software has increasingly become a social activity. No longer do you see the lone engineer coding away in isolation on some problem and surfacing every few months to push a release to production.
Now it is scrum teams of six to ten people working quickly to stay in step with their customers requirements and move each slice of code out to production as quickly as possible so they can get feedback on the change from a customer. The teams have changed too, where once you had specialists and hand offs from one specialist to another, such as engineer to a quality analyst, now the team drives the release process in pairs or swarms to get a feature out into production. The specialist knowledge is still there, but it is owned by the team rather than an individual.
When I came to Ticketmaster the company was just starting to convert over to Scrum. One of the reasons I was hired was because I had experience working in scrum environments. It has been my experience that unless the entire company adopts scrum then it is doomed to fail. It has to be accepted at the highest levels that scrum is the new way that work will be organized.
This involves quite an investment as every team must then have a Scrum Master and Product Owner. If existing project managers, program managers, business analysts, engineering leads etc are going to start performing the roles required in Agile and Scrum then they need training or outside expertise needs to be brought in to train and coach. Ticketmaster took the external training approach and most of us were sent off to train in Scrum Master and Product Owner certifications in Denver, CO.
One of the best parts of the training was that it heavily solidified the roles of Scrum Master, Product Owner and Team in the scrum process. For instance the Product Owner determines the what and why, the Team determines how and how quickly while the Scrum Master is there to ensure that the process is adhered too and this can be as little as daily stand ups and sprint retrospectives. As it turns out the Scrum Master also ends up occupying the hand off space between teams and products which leads to the Scrum Master trying to unblock blockers.
The Scrum Master also exists to stop outside interference to the team such as an Executive tapping a developer on the shoulder and saying, “You need to spend a week on this other product because X said so.” It is ok to have team members work on other stuff, but it needs to be prioritized with the other work the team does as it can effect velocity and hence planning.
A Scrum Master should not be a project manager or a program manager. These are very different roles to that of Scrum Master. This is something that previous project managers or program managers have difficulty with when they transition to a Scrum Master. For them Scrum Master seems like less responsibility and less involvement in a project. Their world shrinks to stand ups, retrospectives and a single product of features rather than the expansive world of Gantt charts, massive projects and driving software engineers into overtime to get that important feature done for a demo.
There is a reason why the Scrum Master has such a small set of responsibilities and it is because the most important part of Scrum is the daily stand ups and the retrospective. Those are so important it is worth one person’s salary to ensure they are done. Not only they must be done, but the Scrum Master has to keep managers and executives out of the stand ups and retrospectives. The Team has to feel comfortable in those, enough that they can admit they made mistakes to the Team and not worried they will be judged by managers or executives for saying they made a mistake.
Ticketmaster moved over to Kanban after having done Scrum for a couple of years. I am not sure what the reason to change over to Kanban was. The main advantage of Kanban is its pull nature but Ticketmaster did not use it that way. No longer were stand ups and retrospectives a part of the structure as they are with Scrum. Some teams continued doing them, but others did not. One of the teams I worked with I asked what they liked about Kanban vs Scrum. Their reply was the thing they liked most about Kanban was that there was no sprint deadline. The thing they liked least was that there was no sprint deadline. They felt the deadline put them under pressure to finish their user stories which was missing in Kanban. However often the sprint deadline put too much stress and pressure on them.
When the teams I worked with moved over to Kanban I hoped that the three columns we had on the Scrum board could be collapsed to one. Basically, “what is the most important thing we can do for a customer right now”. That is the only column. For some reason the Kanban boards had nine columns on them, and then they included swim lanes and other stuff. Blech. A simple process where you only worked on something when it was needed by an upstream request had been mangled into a complicated process with multiple meaningless steps that was completely divorced from the pull process of Kanban.
As it turned out the teams I worked with had to adopt the same Kanban boards as everyone else. Apparently a downstream team saw the Kanban boards we tried to build with fewer columns and they complained. So in the end we gave in and adopted the same boards as everyone else. JIRA couldn’t support jumping multiple columns at once and had to be modified with a new version to do this. From my perspective the reality that teams were jumping multiple columns meant they were not necessary in the first place.
The next issue with Kanban is how do you know you are getting better? In Kanban there are two metrics that are used, firstly to make everything as uniform as possible there is the concept of ‘right sizing’. This is where every User Story in the Backlog is about the same size. At Ticketmaster a right sized story was 5 story points. There was some deviation around a 5 but for the most part all features or work of value was about a 5.
This leads to the second concept which is ‘cycle time’. This is the time it takes for a User Story to go from being pulled from the Backlog into Work In Progress and when it is deployed to production for a customer to use. The reason that User Story are right sized is so this metric is more useful and reliable.
The way you know you are getting better is you deliver more User Stories in a time period and you shrink your cycle time. The major issue with JIRA was that there was one puny graph that didn’t really show you cycle time that well. None of these metrics were gullible from JIRAs REST API either, so I could not automate the process of determining cycle time. Because the data was not pullable from the JIRA REST API I could not determine our cycle time when we were doing Scrum either.
So how did I know we became better when we moved to Kanban and how did I know we were improving when practicing Kanban? I didn’t. The entire company moved over to Kanban so we went with the flow but the reality was there was no visibility into whether we were improving or not as a software team that delivers value to a customer.
Ticketmaster did large twenty four hour hackathons that were on a theme. The first time people did them they were excited but the next time people did not want to. Twenty four hours is too long, people have families they have to get home to, and no-one wants to sleep on an office floor or in an office chair.
Outside of the big company wide hackathons, we used hackathons as little spurts of productivity. A good example was a post MVP feature that the Chief Strategy Officer [CSO] mentioned he would love to have as MVP for the product we were working on. We got myself, a UX Engineer and two front end engineers to work on the problem. We went off site to a little craft coffee and beer cafe in Old Town Scottsdale called SIP and spent the day working on the problem.
Collaboratively we brought the feature down to its most basic functionality. We decided to use exact match instead of having to sort through results, we broke the UI down to its most basic and eight hours later we had a thin slice of the original feature completed. It went out into production a couple of days later and I was able to email the CSO that a post-MVP feature was now in production courtesy of a hackathon and negotiating the feature down to its basic essence.
Ticketmaster had 10% time for engineering driven innovation. This was intended as time for the teams to work on something they cared about but was outside of the normal User Stories and product requirements. One of the problems we faced as engineering teams was that our time spent was heavily controlled by product. Which gave us very little room to spend on exploring ways to improve our products from an engineering point of view, or ways to improve the customer experience through engineering means. The 10% time was an attempt to give engineering room to innovate outside of the strict constraints of product and program.
I found that unless the 10% time was focused on a problem, a challenge or set of issues then engineers would default to doing work. Their plates were stacked heavy with work and the 10% time became a way of getting through that work. Consequently I would do the 10% time off site, often at local incubators like Gangplank or at work friendly places such as SIP Coffee and Beer. I would also have a focus to the 10% time so we were all working toward a goal.
One 10% time we focused the goal on adding SMS messaging to the Point of Sale product. The normal approach was to have a local Product Owner walk us through the steps we would need to identify a customer problem, come up with a problem statement and then convert that to en engineering problem. Normally we would split the teams up into small groups of two to four engineers and they would work through the product problem themselves. We did that for an hour and then the teams would say what customer problem they had defined and how they planned on solving that business problem with technology.
Next we would spend an hour integrating with a new library or an external service. The teams would focus on different approaches such as using a direct HTTP connection, using CXF, using Spring or some other integration technology. The team that solved it first and robustly would check their code in and the rest of the teams would use that integration code for the rest of the day.
After that we would break for lunch and then spend the afternoon hacking on the business problem. Depending on how quickly the teams got through their problems we would present in a demo at about 4:00 pm what everyone had done and achieved. I usually took screenshots and photos of the teams during the day and then emailed out the next day to interested folks in engineering and product what the teams had created. It was a lot of fun and we did some very cool stuff.
Agile is often equated with Scrum but in reality it more an approach and a mindset than a specific process. The agile manifesto states:
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
So how well did Ticketmaster do on these things as it changed the organization of work from Waterfall, to Scrum and then to Kanban?
Individuals and interactions over processes and tools? With the software teams that I worked with this was true. Swarming became the norm as features moved faster and faster out to production. The teams were Junior Engineer heavy which meant the Senior Engineers had to spend a lot of time working with Junior Engineers on User Stories and Bugs. One of the teams I worked with heavily gelled as a group of workmates, friends and support group. Their cohesion as a team was amazing. As per the title of this essay, software had become a social activity with the teams I worked with. The software was higher quality and released faster due to this approach.
Working software over comprehensive documentation? This was true of the products I worked on. There was very little documentation outside of a Getting Started document and the standard SDLC production support documents. The products I worked on were heavily unit tested so the software itself was documented through those tests.
Customer collaboration over contract negotiation? We failed as a company and as teams at this. The Jetson project was a big bang product change over where we knew best what the client and customers wanted. It turned out we could deliver nothing at all. When the Jetson project collapsed under its own weight and poor quality, the pieces that got picked up were very small indeed. Harmonia was the big take away from Jetson and it was developed by one team that was a year ahead of its commitments. Harmonia was developed by a principal level team that did not have a Product Owner. The engineers in that team built what they thought was necessary for the product and it turned out to be the best feature to come out of the Jetson investment.
Another failure in collaboration over contracts was that the features and user experience of the products we were building were not up for discussion. It was Product’s way or the highway. This made the software teams feel disconnected from the products. Additionally internal demos were seen as more important than quality. As a result a lot of poor quality products and components were released that made it even less likely that the Jetson system would ever see a real customer. The final cherry on the cake was that the work the teams did was meaningless as no customer was using it. Why should the teams care when the only consumer of their products was their functional tests and QA?
Responding to change over following a plan? Ticketmaster failed on this one. Jetson was a hugely orchestrated plan over multiple years that collapsed in on itself. There was little to no deviation allowed amongst the teams as to what they could do other than follow the plan tightly and keep producing features for the demo. Worse, once Jetson was stopped, the changes in direction came too quickly and too fast for existing products to be put in production in front of real customers. So more time went to waste, and more features never saw use.
As one Ticketmaster engineer said to me, it wasn’t always like that and my view of Ticketmaster is jaundiced by being the first Jetson hire in Scottsdale and then leaving Ticketmaster when the Jetson project stopped and changed to Promises of Value. So my entire experience of Ticketmaster was during the Jetson investment.
Micro Managing Engineers
We talk about Scrum, Kanban, etc and they are improvements over Waterfall and other types of development. What needs to be recognized is the extent that these methodologies micro-manage individual engineers. We restrict our engineers so heavily. The pressure from the process, from program, from product, from UX, from management and from executives is constant. It isn’t enough that we give constant feedback on where we are through User Stories completed, bugs created and velocity. Engineers are micro managed down to the size of a User Story they are working on, if they are blocked, if they are going over the original estimate, etc.
I don’t know what the answer is. The Agile methodologies are used because they produce better outcomes than what we have used to organize work previously. However, I think we have to recognize the cost, and that is we micro manage engineers and teams, putting a lot of pressure on them at the smallest level which is the User Story.
This level of micro management is taken for granted. It hit me when the company pivoted away from Jetson and the teams I worked with suddenly had nothing to do for a couple of weeks. They didn’t really know what to do with themselves. The reason is because we map their every day out so heavily with processes like Scrum and Kanban.
Two weeks after I had been at Ticketmaster for three years I handed in my notice. It had been a good run and we had done a lot of good work. The teams I had worked with had gone from a multiple month release cycle to fifty minutes. They had gone from zero unit tests to 100% coverage and the product the team released as part of the Resale project had two bugs in 2015 of which one required a code change. All in all it was a remarkable progression for a team that had struggled with poor quality code, slow releases and a crippling operational burden the year before I joined Ticketmaster.
I am hugely proud of these achievements and the engineers I worked with. Without their willingness to work hard, their focus on incremental improvement and the understanding that the highest standards is what they will deliver we would never have achieved these results. I have made a lot of friends at Ticketmaster and will always remember my time their fondly. For now, off to the next challenge!
Simple Steps To Higher Quality Code And Faster Code To Production
In the previous long form essay I wrote of my time at Lifelock I added a series of points at the bottom that teams should follow in order to improve quality and reduce cycle time to production. Reading back on them, there are a few things I would change from my experiences at Ticketmaster, but not many.
Unit Test. If you only do one thing, this should be what you do. Unit testing is a leading indicator of quality so it is a good metric to graph. I agree with Uncle Bob Martin when he asks the questions, “Why would you write a line of code that is not tested?”. I also agree with Michael Feathers when he defines “Legacy Code” as code without unit tests. The difficulty with unit tests is that it is an upfront investment as per W.E. Deming’s advice of test up front rather than inspection at the end. For that reason it is often difficult to get managers and product to understand why it is so important. Unit Testing is a skill and it can take up to nine months to train an engineer in how to test well. So patience is required. Further, a culture of testing needs to be created and this too takes time. Getting Unit Testing to the level where W.E. Deming, Uncle Bob Martin and Michael Feathers would be happy takes time and patience but it so, so, necessary.
Functional Test. This goes hand in hand with unit testing. The Functional Test is an end to end test which hits services and databases. This is a necessary form of testing to make sure your product actually does what it says it does. Functional Tests are hard to write and difficult to maintain. Generally there are only a few of them, whereas there are thousands of Unit Tests.
Javadoc. It is important to Javadoc. The teams I worked with had most of their engineers with less than three years experience at software engineering. As a consequence they would go into a code base and have no idea what is going on. Javadocs at least tell them what everything should do and where the arguments are coming from and what they should be. That being said Oracle made a mistake in Java 1.8 by making the Javadocs so restrictive. That will scare engineers off from Javadocs rather then encouraging it.
Testable. Code needs to be testable first and foremost. Design decisions that enable full testability are better than decision decisions which do not. The code base and the product must be testable at the unit and functional level and it should be designed and architected with this in mind.
Delete commented out code. We have source code repositories to remember what the code base was previously. There is no need to have commented out code. Usually is a sign of lack of confidence. Unit testing should give confidence so there is no reason to have any commented out code.
Automation. Automate all the things. There are tools to help such as Jenkins for Continuous Integration, etc. If there isn’t a tool for a specific application then a shell script will suffice and if it needs to be run by more than one person run the shell script from Jenkins or something else similar.