Last up in Privacy Tech for #enigma2021, @xchatty speaking about "IMPLEMENTING DIFFERENTIAL PRIVACY FOR THE 2020

Differential privacy was invented in 2006. Seems like a long time but it's not a long time since a fundamental scientific invention. It took longer than that between the invention of public key cryptography and even the first version of SSL.
But even in 2020, we still can't meet user expectations.
* Data users expect consistent data releases
* Some people call synthetic data "fake data" like
"fake news"
* It's not clear what "quality assurance" and "data exploration" means in a DP framework
We just did the 2020 US census
* required to collect it by the constitution
* but required to maintain privacy by law
But that's hard! What if there were 10 people on the block and all the same sex and age? If you posted something like that, then you would know what everyone's sex and age was on the block.
Previously used a method called "swapping" with secret parameters
* differential privacy is open and we can talk about privacy loss/accuracy tradeoff
* swapping assumed limitations of the attackers (e.g. limited computational power)
Needed to design the algorithms to get the accuracy we need it and tune the privacy loss based on that.

Change in the meaning of "privacy" as relative -- it requires a lot of explanation and overcoming organizational barriers.
By 2017 thought they had a good understanding of how differential privacy would fit -- just use the new algorithm where the old one was used, to create the "micodata detail file".
Surprises:
* different groups at the Census thought that meant different things
* before, states were processed as they came in. Differential privacy requires everything be computed on at once
* required a lot more computing power
* differential privacy system has to be developed with real data; can't use simulated data to do this because the algorithms in the literature weren't designed for dats anything like as complex as the real data (multiracial people, different kinds of households, etc)
* to understand the privacy/accuracy trade-off requires a lot of runs, representing a *lot* of computer time
Census bureau was 100% behind the move
* initial implementation was by Dan Kiefer, who took a sabbatical
* expanded team to with Simson and others
* 2018 end to end test
* original development was on an on-prem Linux cluster
* then got to move to AWS Elastic compute... but the monitoring wasn't good enough and had to create their own dashboard to track execution
* it wasn't a small amount of compute
* republished the 2010 census data using the differentially private algorithm and then had a conference to talk about it
* ... it wasn't well-received by the data users who thought there was too much error
For example: if we add a random value to a child's age, we might get a negative value, which probably won't happen to a child's age.

If you avoid that, you might add bias to the data. How to avoid that? Let some data users get access to the measurement files [I don't follow]
In summary, this is retrofitting the longest-running statistical program in the country with differential privacy. Data users have had some concerns, but believe it will all come out.
Code is up on github and papers are up online. (@xchatty have some links?)

[end of talk]

More from Lea Kissner

More from Tech

I could create an entire twitter feed of things Facebook has tried to cover up since 2015. Where do you want to start, Mark and Sheryl? https://t.co/1trgupQEH9


Ok, here. Just one of the 236 mentions of Facebook in the under read but incredibly important interim report from Parliament. ht @CommonsCMS
https://t.co/gfhHCrOLeU


Let’s do another, this one to Senate Intel. Question: “Were you or CEO Mark Zuckerberg aware of the hiring of Joseph Chancellor?"
Answer "Facebook has over 30,000 employees. Senior management does not participate in day-today hiring decisions."


Or to @CommonsCMS: Question: "When did Mark Zuckerberg know about Cambridge Analytica?"
Answer: "He did not become aware of allegations CA may not have deleted data about FB users obtained through Dr. Kogan's app until March of 2018, when
these issues were raised in the media."


If you prefer visuals, watch this short clip after @IanCLucas rightly expresses concern about a Facebook exec failing to disclose info.
Thought I'd put a thread together of some resources & people I consider really valuable & insightful for anyone considering or just starting out on their @SorareHQ journey. It's by no means comprehensive, this community is super helpful so no offence to anyone I've missed off...

1) Get yourself on the official Sorare Discord group
https://t.co/1CWeyglJhu, the forum is always full of interesting debate. Got a question? Put it on the relevant thread & it's usually answered in minutes. This is also a great place to engage directly with the @SorareHQ team.

2) Bury your head in @HGLeitch's @SorareData & get to grips with all the collated information you have to hand FOR FREE! IMO it's vital for price-checking, scouting & S05 team building plus they are hosts to the forward thinking SO11 and SorareData Cups 🏆

3) Get on YouTube 📺, subscribe to @Qu_Tang_Clan's channel https://t.co/1ZxMsQR1kq & engross yourself in hours of Sorare tutorials & videos. There's a good crowd that log in to the live Gameweek shows where you get to see Quinny scratching his head/ beard over team selection.

4) Make sure to follow & give a listen to the @Sorare_Podcast on the streaming service of your choice 🔊, weekly shows are always insightful with great guests. Worth listening to the old episodes too as there's loads of information you'll take from them.

You May Also Like

1/12

RT-PCR corona (test) scam

Symptomatic people are tested for one and only one respiratory virus. This means that other acute respiratory infections are reclassified as


2/12

It is tested exquisitely with a hypersensitive non-specific RT-PCR test / Ct >35 (>30 is nonsense, >35 is madness), without considering Ct and clinical context. This means that more acute respiratory infections are reclassified as


3/12

The Drosten RT-PCR test is fabricated in a way that each country and laboratory perform it differently at too high Ct and that the high rate of false positives increases massively due to cross-reaction with other (corona) viruses in the "flu


4/12

Even asymptomatic, previously called healthy, people are tested (en masse) in this way, although there is no epidemiologically relevant asymptomatic transmission. This means that even healthy people are declared as COVID


5/12

Deaths within 28 days after a positive RT-PCR test from whatever cause are designated as deaths WITH COVID. This means that other causes of death are reclassified as
Still wondering about this 🤔


save as q