Last up in Privacy Tech for #enigma2021, @xchatty speaking about "IMPLEMENTING DIFFERENTIAL PRIVACY FOR THE 2020

Differential privacy was invented in 2006. Seems like a long time but it's not a long time since a fundamental scientific invention. It took longer than that between the invention of public key cryptography and even the first version of SSL.
But even in 2020, we still can't meet user expectations.
* Data users expect consistent data releases
* Some people call synthetic data "fake data" like
"fake news"
* It's not clear what "quality assurance" and "data exploration" means in a DP framework
We just did the 2020 US census
* required to collect it by the constitution
* but required to maintain privacy by law
But that's hard! What if there were 10 people on the block and all the same sex and age? If you posted something like that, then you would know what everyone's sex and age was on the block.
Previously used a method called "swapping" with secret parameters
* differential privacy is open and we can talk about privacy loss/accuracy tradeoff
* swapping assumed limitations of the attackers (e.g. limited computational power)
Needed to design the algorithms to get the accuracy we need it and tune the privacy loss based on that.

Change in the meaning of "privacy" as relative -- it requires a lot of explanation and overcoming organizational barriers.
By 2017 thought they had a good understanding of how differential privacy would fit -- just use the new algorithm where the old one was used, to create the "micodata detail file".
Surprises:
* different groups at the Census thought that meant different things
* before, states were processed as they came in. Differential privacy requires everything be computed on at once
* required a lot more computing power
* differential privacy system has to be developed with real data; can't use simulated data to do this because the algorithms in the literature weren't designed for dats anything like as complex as the real data (multiracial people, different kinds of households, etc)
* to understand the privacy/accuracy trade-off requires a lot of runs, representing a *lot* of computer time
Census bureau was 100% behind the move
* initial implementation was by Dan Kiefer, who took a sabbatical
* expanded team to with Simson and others
* 2018 end to end test
* original development was on an on-prem Linux cluster
* then got to move to AWS Elastic compute... but the monitoring wasn't good enough and had to create their own dashboard to track execution
* it wasn't a small amount of compute
* republished the 2010 census data using the differentially private algorithm and then had a conference to talk about it
* ... it wasn't well-received by the data users who thought there was too much error
For example: if we add a random value to a child's age, we might get a negative value, which probably won't happen to a child's age.

If you avoid that, you might add bias to the data. How to avoid that? Let some data users get access to the measurement files [I don't follow]
In summary, this is retrofitting the longest-running statistical program in the country with differential privacy. Data users have had some concerns, but believe it will all come out.
Code is up on github and papers are up online. (@xchatty have some links?)

[end of talk]

More from Lea Kissner

More from Tech

These past few days I've been experimenting with something new that I want to use by myself.

Interestingly, this thread below has been written by that.

Let me show you how it looks like. 👇🏻


When you see localhost up there, you should know that it's truly an experiment! 😀


It's a dead-simple thread writer that will post a series of tweets a.k.a tweetstorm. ⚡️

I've been personally wanting it myself since few months ago, but neglected it intentionally to make sure it's something that I genuinely need.

So why is that important for me? 🙂

I've been a believer of a story. I tell stories all the time, whether it's in the real world or online like this. Our society has moved by that.

If you're interested by stories that move us, read Sapiens!

One of the stories that I've told was from the launch of Poster.

It's been launched multiple times this year, and Twitter has been my go-to place to tell the world about that.

Here comes my frustration.. 😤

You May Also Like

"I lied about my basic beliefs in order to keep a prestigious job. Now that it will be zero-cost to me, I have a few things to say."


We know that elite institutions like the one Flier was in (partial) charge of rely on irrelevant status markers like private school education, whiteness, legacy, and ability to charm an old white guy at an interview.

Harvard's discriminatory policies are becoming increasingly well known, across the political spectrum (see, e.g., the recent lawsuit on discrimination against East Asian applications.)

It's refreshing to hear a senior administrator admits to personally opposing policies that attempt to remedy these basic flaws. These are flaws that harm his institution's ability to do cutting-edge research and to serve the public.

Harvard is being eclipsed by institutions that have different ideas about how to run a 21st Century institution. Stanford, for one; the UC system; the "public Ivys".
IMPORTANCE, ADVANTAGES AND CHARACTERISTICS OF BHAGWAT PURAN

It was Ved Vyas who edited the eighteen thousand shlokas of Bhagwat. This book destroys all your sins. It has twelve parts which are like kalpvraksh.

In the first skandh, the importance of Vedvyas


and characters of Pandavas are described by the dialogues between Suutji and Shaunakji. Then there is the story of Parikshit.
Next there is a Brahm Narad dialogue describing the avtaar of Bhagwan. Then the characteristics of Puraan are mentioned.

It also discusses the evolution of universe.(
https://t.co/2aK1AZSC79 )

Next is the portrayal of Vidur and his dialogue with Maitreyji. Then there is a mention of Creation of universe by Brahma and the preachings of Sankhya by Kapil Muni.


In the next section we find the portrayal of Sati, Dhruv, Pruthu, and the story of ancient King, Bahirshi.
In the next section we find the character of King Priyavrat and his sons, different types of loks in this universe, and description of Narak. ( https://t.co/gmDTkLktKS )


In the sixth part we find the portrayal of Ajaamil ( https://t.co/LdVSSNspa2 ), Daksh and the birth of Marudgans( https://t.co/tecNidVckj )

In the seventh section we find the story of Prahlad and the description of Varnashram dharma. This section is based on karma vaasna.