Open Source in Sports
A Q&A with a data engineer at local MLS club Atlanta United, and how Open Source benefits the sports data and analytics community.
Open Source is everywhere, including in places like our hobbies and interests that bring us joy. Ahead of Saturday's Atlanta United victory over the New England Revolution in Major League Soccer, we talked with the very talented data engineer Akshay Easwaran from United's sports data and analytics department. Akshay describes his journey in Open Source and how it has helped him down the path of a career with his favorite team.
This Q&A has been edited for length and clarity.
To start, tell me about your current role at Atlanta United.
I am the data engineer at Atlanta United, so mainly responsible for building – I like to say: it's building a digital infrastructure to support sporting-side decision making. Building data syncing pipelines, building applications, building databases. Anything that gets us to making better data driven decisions, and being able to use the data in an efficient manner to help inform those decisions.
It’s always interesting to see how Open Source fits into different organizations in different ways. How is it approached and used at Atlanta United?
A lot of the technology we use is in some way based in Open Source libraries, and even in previous roles that I've had we made a lot of use of Open Source utilities and modules. We're a big R programming language shop – R and Python – at Atlanta United. And I think it's fun to see how the Open Source sports analytics community has grown over the last few years. We use a bunch of tools from there, right? There are so many smart people working on stuff out in the open, and they're lowering the barrier of entry to doing advanced analysis. We do analysis and we have our own way of doing it because we're focused on a club. So we're able to take those tools – that baseline that they've built for us already – and then apply our own secret sauce on top of it.
The secret sauce leads to my next question. I’d imagine in the sports data and analytics space, teams are always trying to keep a competitive edge, which probably leads to wanting proprietary/closed systems. How does Open Source fit into this picture?
I think the way that I would draw the line is that everyone is trying to solve the same problem on the back end. Everyone's effectively building the same software systems just to get to the same square one. And then from there, that's where you add the more proprietary secret sauce. The metrics, the models, whatever, right?
But to even build your databases, to build your pipelines, you are always interfacing with some sort of tooling, and training yourself in some sort of tooling. Docker for example: a bunch of the images are Open Source. We make a lot of use of the images, but we have our own stuff that we customize. We say those are our own base images and we build applications and tools on top of them. So I think that's a good example of how that fits into our flow. We build up our systems using Open Source and then that proprietary edge is what goes on top of the platform.
When you're working behind the curtain of a club, you have this motivation or this desire to keep delivering, and you also don't want to reinvent the wheel in pursuit of that goal. Part of interfacing with these libraries is: hey, people have solved problems for us already. Like: you want to get data from this specific place? Okay, well there's someone that's already built a tool so that they put it in an efficient format that you can then store it in a database, or using a visualization, or a graphic, or whatever.
We want to take advantage of the foundational tools that are already out there, and then what use we put them to is obviously unique to the club and unique to whatever situation that we find ourselves in.
It's all about taking advantage of who came before you and the challenges that they've solved and building on top of that.
Can you give me some examples of the tools that you use most often that are specific to the sports space and these foundational layers?
I'll give you a couple here I've been involved with, but outside of work too. I've been involved a lot with the SportsDataverse initiative, which is a library of a bunch of these kinds of packages across Python and R where you're able to pull in sports data. So from ESPN, or I think one of these libraries works with KenPom, also you have a Transfer Market and FBREF and all these different types of sources. It's just a mini universe like it says in the name of these types of sports data packages that's curated by a small discord community as well, and led by Saiem Gilani. He's really sharp, too. He worked on the first college football data scraper called cfbscrapeR that's all built in R. He's really pioneered this. community of open source data and democratizing access to sports data, not just making it behind this curtain of paywalls and only accessible to clubs and that sort of thing.
For things that get custom built in-house, are there ever considerations about releasing it to Open Source, or is there an individual contrib policy of contributing things back to any of the tools?
I'm familiar with those policies. I think for us, we aren't that mature of a technology organization to have more formal policies like that. I know previously, when I worked at Salesforce, there was a big community of making sure that if you solve problems in an open source library, you had the ability to fix those bugs in the upstream version so that everyone could benefit. There's a very robust open source contributor policy there.
But I think currently it's just two of us. We're not really at the maturity level to have one of those.
Maybe someday, right?
Yeah, hopefully, we'll have a veritable army of analysts and engineers, and we'll be able to tackle those big sports data platform problems.
Is there a separate group doing similar things but over on the football side with the Atlanta Falcons?
The Falcons have a number of engineers and analysts as well. Obviously the NFL, it makes a good chunk more money than MLS, so have they have a larger team. They're doing really cool stuff. They're really sharp people. They have a lot more, I think it's the same realm of maturity, right? They're just a more mature technology organization. They've been there a lot longer. They've had some of these databases and pipelines a lot longer, and so the tools that they have for their staff are just more robust.
You touched on this a little bit, but tell me more about your individual experience with Open Source. How did you first get involved with it?
Oh man, this goes back a ways. The first time that I got into Open Source, and this is going to be a throwback for me and hopefully it is for you: was probably 2013-2014? It was when jailbreaking iPhones was still in vogue. Around then, maybe, a little earlier, I originally was working on jailbreak tweaks. Just a little custom pieces of software that were able to be run because you had access to the internals of iOS. I think there are a couple of projects that I saw that were Open Source at the time. I think one of them specifically had been kind of abandoned by its developer, and I was like, okay, well, I have some rudimentary iOS programming skills, and this is something that I'm doing, an area that I'm already involved in. Jailbreaking and tweaking out my phone, so why don't I try my hand at developing something like this? That's kinda how I got kicked on to it.
That was the first step, I think. Then going further into the future, 2016-2017. I was really fortunate to get the opportunity to port an open source Android game called College Football Coach over to iOS. There's no reason I would've been able to do that if the code wasn't Open Source. I talked to the guy, the original developer, and got his permission and blessing to develop it for iOS. Just being able to cross-reference my implementation against his and being able to build off of a known foundation was really useful.
More recently, going closer to present day, it's funny that College Football Coach project and the SportsDataverse initiative and my involvement: they're sort of intertwined. In 2019 I was getting into college football advanced analytics for the first time because of more and more articles on Twitter and more and more articles on the Internet about stats.
I thought it would be really fun to include an expected-points model and a win-probability model in the College Football Coach game. Because the cfbscrapeR and the models were Open Source, I was able to reach out to Saiem and his co-collaborators and talk to them a little bit and say "Hey, like, what does it take to take this model that you've built in R and port it over to iOS?"
He added me to his community and we had a couple discussions about how to build these sorts of models, and that's where I got involved with sports data. That's how that community, and my involvement with that community, kind of grew from there.
We've worked on a couple more projects from from there: Game on Paper, that's all built with Open Source tools and all the SportsDataverse. cfbfastR, that's all at the next evolution of cfbscrapeR, all of that is open source. I've tried to be an advocate for it even if I am behind the curtain of a club. I still use these tools to this day and outside work, the ones that I can outside of soccer and try to be a good steward also just try to keep these resources clean, keep them usable, keep them sustainable.
Like many people involved in Open Source, you've had a lot of side projects. What's your favorite project you worked on outside of your current and previous jobs?
Game on Paper. It's been the most fun because it just started as two people that had expertise in two different parts of the world of analytics sports technology. I had the software engineering side and an interest in sport, but Saiem had the data science side locked down.
We were able to build something that met both of our desires, right? We wanted to democratize access to these sorts of advanced analytics. They're all usually behind ... they're either used internally at teams or they're behind the paywall at PFF in some cases or at the very least, they're just not available to the layman. But we knew that we had access to ESPN's data via their website. We knew that we had the data science chops to turn that, and knowledge of how these models worked. So we were able to put together this really awesome thing that blended both sides of our knowledge to build something that a lot of people have gotten utility out of.
I think that's my favorite part of it. We get feedback all the time from journalists, especially student journalists too, that are data curious and data savvy that are writing about their teams for either their student newspaper or their team sites.
I've seen a couple of posts using Game on Paper data on Kansas football's team site. So being able to see so many places that our efforts to like bring this data to the layman have actually impacted people. I think that's my favorite part.
If you had one wish to create a new Open Source project and project community to help something at Atlanta United, what would it be?
What would make my life easier at Atlanta United?
Actually, let me take a different tack on it. I think when we're talking about Game on Paper, one of the things that would really help us is if ESPN provided an Open Source module themselves to track their data. As it stands, I think that there is a level of risk with these projects, because there's no SLA, there's no guarantee of support. They can change the data source at any time. At the end of the day, the APIs that they put out in the world are meant to support their products. That's not necessarily something we can change, but I think ESPN reopening their API access because they had one, they had one that looks a lot like this back in 2014 and they shuttered it I think around 2015. Reopening their API access and providing modules for more, sustainable and – what's the word I'm looking for – uniform and for lack of a better term, clean looking data, I think that helps us out a lot. We've had a couple of issues here and they are just working with data that's sometimes not fit for use. ESPN will sometimes mis-mark plays. Like there's one that I'm thinking of where there's a pass interference penalty from a couple of seasons ago that was mis-marked for like 11,000 yards.
This is stuff that's obviously wrong, and I think between the accuracy of the data, some level of support there and a good relationship with them where we can report issues and fix them, and then also obviously like giving us a sustainable way to access that data would really help us. Because every season we inevitably end up having to change something or find a new place to pull something from in their data. It just adds to that idea of risk. Eventually maybe this goes away, and we just don't know what the expiration date is going to be.
The flip side would be if they would open it up and it opened up to contribution in an Open Source way, then there's a whole universe of people capable to help them make their product better, right?
Yeah, and I think you've seen a really successful version of that with Microsoft. A lot of their Azure resources are developed out in the open, and I think we've gotten a lot of value out of being able to see what challenges other people are facing. Whenever we hit a roadblock with something that we're doing with our resources, we can just dial up their GitHub repo and see: well, someone else was facing this issue, is this something that we're also facing?
Or is this something that we need to tackle with support or some implementation problem of our own? Having that openly available issue list, not even just an open line of communication, but just the openly available issue list and being able to cross reference that it just helps us solve problems faster.
I'm sure Microsoft gets a lot out of it, too. They're able to see: we built this piece of software one way, but here's how our customers are actually using it, and they're able to tailor their future product decisions based on that feedback.
Events
March 12, 2024: ODSC Atlanta Data Science Webinar, Online
March 26, 2024; The Atlanta JavaScript Meetup Group, Tech Square ATL
April 9-11, 2024: Devnexus 2024, Georgia World Congress Center, Atlanta
October 22-23, 2024: SOSS Fusion 24, The Hotel at Avalon, Alpharetta
Jobs
Senior Manager, Software Engineering, Full Stack, Capital One
We aim to feature available Open Source jobs in the Atlanta area in this space each week. Use the contact form on the opensourceatlanta.com site to submit your job posting if you are hiring.
About
Open Source Atlanta's mission is to help support and build the Open Source Software communities in the Atlanta area. We will start with a weekly newsletter focused on Open Source, and we plan to organize regular events including meetups and camps.