Blog: Confessions of My Youth: A Journey to Responsible AI
This is a post on how I came to the idea of responsible AI and AI ethics, and ultimately to hold the views on the usage of AI and ML that I do today.
It was around 1991. It all started out good. I bent my problem solving skills to things that (still today) advanced the social good. I worked with NOAA to build algorithms and systems to predict the potential for tsunamis based on real-time oceanographic data. I also created algorithms to govern geographically distributed systems that automate equitable distribution and management of scare water resources, systems that predicted and automated hydro-electric operations to ensure downstream fish spawning grounds were not destabilized at critical times, satellite image processing for NASA, and other projects like these that created what I thought of as a net positive benefit for the world. I was quickly gaining a skillset in bending large scale, distributed, and often real-time/continuous data streams to data analytics and supervised ML approaches, and it was both fun and heady stuff.
But then… “When I was younger, so much younger than today”, the challenge of problem solving and creating technology became the beginning, the means, and the end, and I took on some projects that today when I look back on, “I’m not so self assured (and now I find) Now I find I’ve changed my mind”.*
In the mid 1990s, a number of cities were contemplating or even deploying networks of video cameras in and around urban cores. The question we were asked, was whether images of faces from these cameras could be, in near real-time, compared with a database of mug shots and pictures of known criminals and people with outstanding warrants. Could we automate identification of individuals from still images from the video feed? At the time, it seemed like a social good to help law enforcement find and get criminal actors off the street, and to be honest, a really cool CS problem to work on. Sadly, at no time did any of us on the team contemplate utilization of this technology for broader purposes, such as population surveillance (despite having read pretty much every dystopian future sci-fi ever published). So we developed the technology and deployed it. While certainly not as sophisticated as the convolutional neural networks being used for facial recognition today, the system used a probabilistic approach to match the incoming images with those in the facial database and learned based on human operator feedback that labeled the accuracy of the matches. We created a number of novel and interesting approaches to pull this off. We achieved a throughput rate and accuracy level that essentially allowed for identification, notification, and response within seconds.
When I think about facial recognition and AI ML being used today, with all the potential bias inherent across the entire life-cycle, this project seems naive and ill-conceived in hind-sight. The technology we developed primitive and crude by today’s standards, but no less powerful in its potential for unintended consequences. Today we have scores of evidence that facial recognition is compromised by severe forms of bias and is unreliable; we also understand that the potential for misuse of this technology is more than just probable, its certain because it’s already happened. It’s odd to me then, that today, there are companies and police departments, and even real-estate owners that still choose to pursue these types of systems. I read in the news almost daily of events of mis-identification, under represented populations being over-targeted, and bias come to life in these systems.
Because I still hadn’t learnt the core lesson after doing the early facial project, I took on another project in the early 2000s that was supposed to catch bad actors, this time as they crossed over points of entry into the U.S. My company at the time had developed some unique technology we were marketing. It went like this: feed in your data, and we’ll find all the relationships, generate a graph out of it complete with relationship strengths, and let your users explore the results in a rich and highly interactive web-based application. The system was fed both data from the entry event (license plate #, for example), and a number of databases that contain various information about outstanding warrants, etc. Our system was basically a data hungry inferential network algorithm that could build and update a graph in real-time. The first pass returned a network with an average of 3 degrees of linkage from any given node. If any of these linked entities carried a flag, someone got notified and further investigation triggered.
Somehow, at the time, we thought this was a good idea, and the customer was paying well for adaptation of our tech to the purpose. But again, looking back in hind-sight, the power this system conveyed without governance or accountability to a small group of people, coupled with both the potential for misuse and no real understanding of the data sources and any inherent bias present in the data powering it scares me today. The small bit of good news here is that (to my knowledge) this system was never fully deployed- at least the customer never purchased the production licenses that would have been required to do so. It was a pilot project, one that I hope died and stayed dead. That said, the current political landscape around U.S. border security along with modern ML automation technology seems to me somewhat ripe to revive the kind of thinking that funds these kind of systems, so I would’t be surprised to learn that something similar has been developed in the intervening years.
Somewhere between then and now, I began to develop an understanding and appreciation for the darker side of these kinds of technologies. I think of it like the evolution of smoking tobacco. Our grandparents didn’t know how bad it was for them, in fact, they may have even thought there were health benefits; today, we of course know better and that smoking shortens life significantly. Choosing to start smoking in 2019 goes against everything rational. In my mind, the same is true of arming anyone (private or public) with technologies that automate the invasion of privacy and skew toward bias against under represented populations. Just say no, it’s not cool.
I am happy to report that in recent years, I’ve returned to using ML for good. Things like edtech, health improvement, understanding disease processes, and enabling precision medicine. Less socially impactful, but good economically, I’ve also focused a good bit of time helping client companies implement so-called “digital transformation” capabilities using automation and thus successfully serving their customers better.
With AI and ML we now have real world evidence of the bias and harm that certain types of AI systems can and do wreak on mostly under represented populations. We also know that these algorithms can amplify extremism and penetrate our very sense of privacy. We need to choose not to smoke, and instead find ways to create and promote the utilization of responsible AI that benefits everyone. We need the automation used by government to be transparent. We need the government to be accountable use of and reliance on models, and we need the companies that are building and selling these systems to be accountable for the bias and harm that result from the use of these models. Transparency is key, and yet this is the single most likely thing we are unlikely to get in the current landscape.
*lyrics quoted from Help!, The Beatles