Worried about data overload or AI overlords? Here’s how the CDH Social Data School can help

By Anne Alexander

Ahead of the CDH Social Data School application Q&A on May 4, Dr Anne Alexander, Director of Learning at Cambridge Digital Humanities (CDH), explains how the programme provides the digital research tools necessary for the data-driven world.

The world we live in has long been shaped by the proliferation of data – companies, governments and even our fellow citizens all collect and create data about us every day of our lives.

Much of our communications are relayed digitally, the buildings we live in and the urban spaces we pass through have been turned into sensors, we work, play and even sleep with our digital devices. Particularly over the past year, as the pandemic has dramatically reduced in-person interactions for many, the data overload has come to seem overwhelming. 

The CDH Social Data School (June 16-29) which Cambridge Digital Humanities is organising in collaboration with the Minderoo Centre for Technology and Democracy is aimed at people working with data in the media, NGOs and civil society organisations and in education who want to equip themselves with new skills in designing and carrying out digital research projects, but who don’t enjoy easy access to education in data collection, management and analysis.

We want to make available the methods of inquiry and the technical skills we teach to students and staff at the University of Cambridge to a much wider audience. 

This year’s CDH Social Data School will include modules exploring the ethical and societal implications of new applications in Machine Learning, with a specific focus on the problems of structural injustice which permeate the computer vision techniques underpinning technologies such as facial recognition and image-based demographic profiling. 

We are keen to hear from participants whose work supports public interest journalism, human rights advocacy, trade unionism and campaigns for social justice, environmental sustainability and the decolonisation of education. 

Although criticism of the deployment of these technologies is now much more widespread than in the past, it often focuses on the problems with specific use cases rather than more general principles.

In the CDH Social Data School we will take a “bottom-up” approach by providing an accessible introduction to the technical fundamentals of machine learning systems, in order to equip participants with a better understanding of what can (and usually does) go wrong when such systems are deployed in wider society. 

We will also engage with these ideas through an experimental approach to learning, giving participants access to easy-to-use tools and methods allowing them to pose the questions which are most relevant to their own work. 

Participants are not expected to have any prior knowledge of programming to take part – familiarity with working with basic office tools such as spreadsheets will be helpful. We will be using free or open source software to reduce barriers to participation. 

We are particularly interested in applications from participants from countries, communities and groups which suffer from under-resourcing, marginalization and discrimination.

We are keen to hear from participants whose work supports public interest journalism, human rights advocacy, trade unionism and campaigns for social justice, environmental sustainability and the decolonisation of education. 

The CDH Social Data School will run online from June 16-29.

Apply now for the CDH Social Data School 2021

Please join us for a Q&A session with the teaching team:

Tuesday 4 May 2 – 2.45pm BST

Registration essential: Sign up here

Read more on the background and apply for your place at the School here.

The flight from WhatsApp

John Naughton:

Not surprisingly, Signal has been staggering under the load of refugees from WhatsApp following Facebook’s ultimatum about sharing their data with other companies in its group. According to data from Sensor Tower Signal was downloaded 8.8m times worldwide in the week after the WhatsApp changes were first announced on January 4. Compare that with 246,000 downloads the week before and you get some idea of the step-change. I guess the tweet — “Use Signal” — from Elon Musk on January 7 probably also added a spike.

In contrast, WhatsApp downloads during the period showed the reverse pattern — 9.7m downloads in the week after the announcement, compared with 11.3m before, a 14 per cent decrease.

This isn’t a crisis for Facebook — yet. But it’s a more serious challenge than the June 2020 advertising boycott. Evidence that Zuckerberg & Co are taking it seriously comes from announcements that Facebook has cancelled the February 8 deadline in its ultimatum to users. It now says that it will instead “go to people gradually to review the policy at their own pace before new business options are available on May 15.”  As Charles Arthur has pointed out, the contrast between the leisurely pace at which Facebook has moved on questions of hate speech posted by alt-right outfits and it’s lightning response to the exodus from WhatsApp is instructive.  It shows what really matters to the top brass.

Signal seems an interesting outfit, incidentally, and not just because of its technology. It’s a not-for-profit organisation, for one thing. Its software is open source — which means it can be independently assessed. And it’s been created by interesting people. Brian Acton, for example, is one of the two co-founders of WhatsApp, which Facebook bought in 2014 for $19B. He pumped $50m of that into Signal, and no doubt there’s a lot more where that came from. And Moxie Marlinspike, the CEO, is not only a cryptographer but also a hacker, a shipwright, and a licensed mariner. The New Yorker had a nice profile of him a while back.

Trust in/distrust of public sector data repositories

Posted by JN

My eye was caught by an ad for a PhD internship in the Social Media Collective, an interesting group of scholars in Microsoft Research’s NYC lab.  What’s significant is the background they cite to the project.

Microsoft Research NYC is looking for an advanced PhD student to conduct an original research project on a topic under the rubric of “(dis)trust in public-sector data infrastructures.” MSR internships provide PhD students with an opportunity to work on an independent research project that advances their intellectual development while collaborating with a multi-disciplinary group of scholars. Interns typically relish the networks that they build through this program. This internship will be mentored by danah boyd; the intern will be part of both the NYC lab’s cohort and a member of the Social Media Collective. Applicants for this internship should be interested in conducting original research related to how trust in public-sector data infrastructures is formed and/or destroyed.

Substantive Context: In the United States, federal data infrastructures are under attack. Political interference has threatened the legitimacy of federal agencies and the data infrastructures they protect. Climate science relies on data collected by NOAA, the Department of Energy, NASA, and the Department of Agriculture. Yet, anti-science political rhetoric has restricted funding, undermined hiring, and pushed for the erasure of critical sources of data. And then there was Sharpie-gate. In the midst of a pandemic, policymakers in government and leaders in industry need to trust public health data to make informed decisions. Yet, the CDC has faced such severe attacks on its data infrastructure and organization that non-governmental groups have formed to create shadow sources of data. The census is democracy’s data infrastructure, yet it too has been plagued by political interference.

Data has long been a source of political power and state legitimacy, as well as a tool to argue for specific policies and defend core values. Yet, the history of public-sector data infrastructures is fraught, in no small part because state data has long been used to oppress, colonize, and control. Numbers have politics and politics has numbers.  Anti-colonial and anti-racist movements have long challenged what data the state collects, about whom, and for what purposes. Decades of public policy debates about privacy and power have shaped public-sector data infrastructures. Amidst these efforts to ensure that data is used to ensure equity — and not abuse — there have been a range of adversarial forces who have invested in polluting data for political, financial, or ideological purposes.

The legitimacy of public-sector data infrastructures is socially constructed. It is not driven by either the quality or quantity of data, but how the data — and the institution that uses its credibility to guarantee the data —  is perceived. When data are manipulated or political interests contort the appearance of data, data infrastructures are at risk. As with any type of infrastructure, data infrastructures must be maintained as sociotechnical systems. Data infrastructures are rendered visible when they break, but the cracks in the system should be negotiated long before the system has collapsed.

At the moment, I suspect that this is a problem that’s mostly confined to the US.  But the stresses of the pandemic and of alt-right disruption may mean that it’s coming to Europe (and elsewhere) soon.

Create your website with WordPress.com
Get started