Blog: Privacy in Machine Learning: PII
Privacy is not a value explicitly written into the US Constitution, but the essentials are there. As a democratic republic, we expect to have privacy as a lack of privacy is tied to tyranny. The founding of our nation was opposed to tyranny, at least ideologically, even though we have had some major issues with the subject. Overtime, we have been able to fix many, and the major issue du jour is privacy with respect to machine learning. So, what is PII, and why is it so important to the future of machine learning?
On April 10th, 2018, the term PII was introduced to the American people thanks to Mark. Mark had a small company that violated privacy by selling people’s Personally Identifiable Information (PII), and Congress wanted to chat with him. In the introduction to the hearing, the head of the committee used the term PII, and my heart jumped. We had been having the PII discussion ever since I started at Apple, and the protocols to keep it secure only increased. PII is also close to my heart because I had found it very important to me personally and professionally back when I started grad school; I just didn’t have a better name other than privacy.
PII covers any data that could be linked back to the original subject just by having that data or some combination of data. Face images are inherently PII data. Some times data is PII because when combined with other subject information, you could determine the subject’s identity. The resulting issues with Face ID is clear, but with health data, it may not be obvious to everyone. For example, if I participate in a user study, and some health issue is discovered. If my health insurance company gets a hold of that data, maybe they would increase my rates. I’m not sure what they would do with that data, but people have been known to misuse data and PII data before.
Originally, I wanted to work on autonomous vehicles, but I ended up doing biometrics. Back in 2006, biometrics paid the bills for many computer vision projects. My lab collected data every year. People would sign a consent form (ICF), get 5 Domer dollars, and go through a few stations to give some biometric data. Their data was de-identified with a subject ID, but there was a list tying the two together in case someone withdrew from the study. This list was more limited in who could see it for privacy reasons of course. The data collected could be used in publications, so people’s faces were shown even though they didn’t have a name tied to them. This was a normal part of the Informed Consent Form (ICF).
However, I did not participate in having my data collected. I was an anomaly for other researchers. I didn’t feel comfortable with it, and data collection was voluntary. Not everyone was happy with the concept that I would ask for people’s help but wouldn’t help myself. I was stubborn in my belief that I didn’t feel the system was private enough. I didn’t want my picture in research papers either. Ultimately, my advisors said nothing about it, which is in keeping with voluntary data collection. I proposed a system that would not keep any information to tie to the subject, but instead use a face or iris scan to recognize return people and enter new subjects into the database. That would have been a dream.
I then went on to collect 4,600 3D face scans of ~500 subjects. I did my dissertation, and I graduated.
I went to work at Digital Signal Corp (DSC), and again, I declined to participate in data collection again. This time was a bit different. I was the one asking for more data. We didn’t do external data collections, and we actually didn’t even use an ICF. People were asked and volunteered, and I think more people were rubbed the wrong way that I didn’t participate. I would have if I had some assurance that my privacy would have been kept.
DSC had a long range (15 to 25 meters) 3D face scanner that could provide decent scans even while the subject was walking. Two years after I started, two were mounted in the hallways for tests and demos. We had a few others for data collection. However, these two scanners were constantly collecting data as people walked down the hall. Again, no ICF was signed by the employees or any visitors. Most people didn’t mind. I did and covered my face every time I walked by. I was so good at it that two years later, some guys in QA we’re thrilled when the face detector got a partial of my chin.
The ICF was eventually addressed and baked into the employment agreement. I’m not sure the legal implications of this method, and they wanted all the employees to sign a new employment contract. Many refused at first. They offered stock options, but that’s just paper to me. Unless the company has a chance at going public, stock options are as valuable as the paper they’re printed on. The amount also paled in comparison to what I had been given in a round of stock options a year previous.
I may have been the last holdout. I refused to sign a brand new agreement, only an addendum. The addendum said that if they happened to capture data of me while in the office, it was okay to use. After this fun event, I lost all trust in them. I was also in charge of the data we had, and I continued to ask for more because algorithms always needs more data. I was working on the algorithms, determining scan quality, validating data, design of experiment for more data, and failure analysis.
I don’t regret the stance I took especially given that the company went under a few years after I left. I have no idea what happened to the PII data they had. The company was bought, and fat chance without a lawsuit, anyone would be able to get their data deleted.
I left DSC to come out to California. I participated in data collection because it was heart rate and the benefits for healthcare seemed very clear. I also wasn’t giving my image. Something interesting happened: I got to see how they handled private information. I saw how everyone including myself took privacy seriously. We had ICF’s for everything. There is a Human Studies Review Board (HSRB) for any human user study collecting any data.
On top of that, I got more insight into the metrics collected from customers when they opt-in. Any metric could not be used in combination with others to uniquely identify any specific person. The only desirable data was what could help make a better product without compromising the customer’s privacy. Privacy is an essential component in the company’s DNA. Privacy was just as important as the user experience because it was part of the user experience.
PII: To consent or not consent, that is the question.
Then I switched to Face ID. Would I participate in data collections? Would I use my own data to improve the customer experience? More importantly, could I trust my company to securely store and use my PII?
I decided, I could trust them. I saw how they acted in the previous two years, and I didn’t see any intention from anyone I worked with to convince me otherwise. Everyone I’ve worked with have been professional and steeled with integrity with respect to PII data.
Privacy is a virtue I hold dear.
My PII journey may seem very boring, but if you’ve read this article to this point, maybe you find it interesting. Only when a user experience is important to us do we care so much about it to make it right for them. I have cherished my privacy for years, and I’m glad to have worked on a project that collected so much data while working so hard to make sure it is secure and kept safe.