Eric Khumalo
May 2023
Privacy Engineering in Action: Learning About Trackers

I love my three dogs. To train them, I search for instructional videos on YouTube. I watch a few, learn some things, and before turning my attention back to training, I get lost in a rabbit hole of music. After procrastinating some, I return to YouTube and the first ad is offering a free course on “How to train your dog the same way the experts do using five simple steps.” I know how this works - trackers! As we interact with websites, there are numerous trackers processing data behind the scenes. They are scripts (or code) designed to collect information about how we use a website.

Before I went to college and majored in data science, I had a vague idea of what trackers do and thought using “incognito” mode would prevent me from being tracked. Deep, nuanced, and accurate knowledge didn’t come until later, and I’m still learning. This post is part of a series on Privacy Engineering in Action which documents my path starting a career in privacy engineering.

In college I interned with Twitter’s anti-spam and anti-abuse team, where I built actual trackers. Our goal was to deploy trackers to prevent a hostile takeover of a user's account and to protect our users from harm. These trackers helped us detect instances that were not the usual way a user interacted with the app such as logging in from a new device or location. In this case, the trackers were used to maintain security which was a new challenge for me. I had never taken a class learning about this. So in the process of writing the code, I consulted places like Stack Overflow and other blogs to learn “best practices.” Prior knowledge helped me understand trackers can be bad, but this experience helped me understand that trackers can be good.

Two years ago, I started working at Good Research as a privacy engineer. Nathan Good and Will Monge introduced me to the wide, wide world of trackers. Thanks to them, I learned trackers are not good or bad. It’s all about context. But first I had to learn more about data, something I only knew a little bit about from college and my internships.

Working on a project about mis- and disinformation opened my eyes to all the different kinds of data and all the different ways people collect and use data. Just like I had a vague idea about trackers, I used to think data is valuable - like oil! - but as a new privacy engineer, it was surprising to learn that combining and aggregating data makes it even more valuable.

Not only is data valuable, it’s critical to running the internet. Websites and apps are built to collect data.They have to function! The browser and internet provider you use, the sites you visit, the forms you fill out, all collect data about you. A little bit here and there is useful, but in aggregate, over time, those pieces of personal data are valuable but they also put you at risk.

After that project, I started reading more about privacy, privacy engineering, and high risk data. Will gave me a reading list that included books, papers, articles and presentations ranging from motivated intruder attacks to de-identification to differential privacy to regulation to ethics, and more!

Today I don't build trackers but instead, as a privacy engineer, I make judgments about the legal or ethical use of the data collected by the trackers. Whether they are good or bad depends on what data is being collected, by whom, and for what purpose. I answer these questions:

  1. Who is doing the receiving and sending the info? Is it a third party?
  2. What type of information and pieces of data are being collected?
  3. How is this data used? What could they possibly do with the information?

To answer these questions, I use what I call the Three Cs of Privacy Engineering: classify, contextualize, and communicate. First I analyze the data and organize to understand what type of information is being transmitted. Then I contextualize. Meaning, I figure out who is sending, who is receiving, why or why not this data is needed, and for what purpose. Finally, I communicate my findings, my judgment. Depending on who is asking me for this determines how I communicate and in what format.

I am far from done learning about trackers, data, and privacy engineering. For a software engineer, privacy is a highly subjective field. However, like the rest of software engineering, it’s a quickly changing field, especially as new regulations and new technologies emerge. I’ll keep you updated on what I learn!

Thanks to Will Monge and Jessica Traynor.