Why Should I Become a Data Engineer

NOTE: Originally published on my previous domain blog, blog.alkemist.io, which I’ve retired in case the .io domain goes away. Original publish date was 2023-01-23. Reproduced here with no edits, even though it needs some. :D

Data engineering usually isn’t something I’ve heard of as most people’s first choice of tech careers. Let’s delve into the motivation of why we do data engineering and figure out if it’s the right option for you.

Before venturing too deep down the rabbit hole and make sweeping generalizations that are only useful for someone running for Prime Minister of Dataland, one thing is abundantly clear: it’s completely okay to stumble into data engineering or do it since you’re good at it and it’s a high quality and well-paying job. With that said, let’s take a more deliberate look into various reasons why you might want to get into this space.

I’ll start with my personal story. I knew at some point, even before the data craze began circa 2011, that I somehow wanted to be involved in the technical aspects of analytics. The interest started in late high school with exposure to the power of statistics. It could accurately predict a number of phenomena with high accuracy, as if by magic. My life path took me towards the things I was inherently good at, like networking and systems, till I got the opportunity to work at a supercomputer center and learned about the difficulties of analyzing large volumes of data and all that it entailed.

It still took me over a decade to get my technical skills and acumen to a point I could justify transitioning to data engineering in my day job (part of this was imposter syndrome, but that’s a different day and a different post). Over the past 5 years as a data engineer, I’ve been able to try both data and analytics engineering. After building my first, rather complex, data model from scratch, I quickly learned analytics engineering is not what I want to do every day, if given the option. After joining the DBT Slack, helping completely migrate a data warehouse backend, and listening to a few episodes of The Data Engineering podcast, I knew data engineering was the real hook for me.

Others’ motivations vary somewhat widely. Some people like the challenge of breaking down and performance-tuning analysis of obscene amounts of data that can’t be easily tackled by a single backend system. Alternately, it can also be very gratifying to build a platform that helps a cause that you believe in. For instance, if you’re building a fast response analysis system for identifying medical image scan anomalies that detect cancer. Or if you’re generating protein combinations for finding targeting medications with less severe side effects before they go to trial. Sometimes you’re there for the technical challenges. Other times you’re interested in the problem space and enjoy the technical aspects of findings solutions in that problem space.

One cautionary note: if you’re getting into data engineering because you think it’s a cool gig… you’re going to be disappointed very quickly. It’s the least glamorous role in the data ecosystem and arguably gets the least amount of end-user/client exposure and attention. Your primary stakeholders are your immediate data team. You’ll usually get requests proxied from end users thru data scientists and analytics engineers. There are some truly incredible data engineering pioneers and names out there, but like most fields, the big names are .01% of the total number of practitioners. They might not even be the “top” people in the field, however you want to interpret the meaning of “top”. The point I’m trying to make is that the data engineering community is relatively small, and you trying to achieve rockstar status, even if you’re incredible, is a bad reason to get into it.

All this being said, it’s a fascinating field to work in. If you’re drawn to it after reading this, hop in! We need all the help we can get.