update sybil post

This commit is contained in:
jaseg 2020-09-10 12:32:58 +02:00
parent e13c0259dd
commit 9fc934f9d2

View file

@ -1,5 +1,5 @@
---
title: "Sybil Resistance and Digital Identity"
title: "Theia Attack Resistance and Digital Identity"
date: 2020-09-09T15:00:00+02:00
---
@ -11,7 +11,7 @@ date: 2020-09-09T15:00:00+02:00
</figure>
Sybil in Cyberspace
Theia in Cyberspace
===================
In informatics, the term *distributed system* is used to describe the aggregate behavior of a complex network made up of
@ -19,33 +19,44 @@ individual computers. For decades, computer scientists to some success have been
individual computers that make up such a distributed system need to be programmed for the resulting amalgamation to
behave in a predictable, maybe even a desirable way. Though seemingly simple on its surface, this problem has a
surprising depth to it that has yielded research questions for a whole field for several decades now. One particular
as-of-yet unsolved problem is resistance against so-called *sybil attacks*. Named after the 1973 book by Flora Rheta
Schreiber on dissociative identity disorder, in distributed systems a sybil attack is an attack where one computer
acts to the rest of the network as if it were multiple, independent systems. The core insight is that there cannot be
any technological way of preventing such an attack, and any practical countermeasure must be grounded in some authority
or ground truth that is external to the systems—bridging from technology to its social or political context.
as-of-yet unsolved problem is resistance against *theia attacks* (or "sybil" attacks in older terminology)*.
Named after the 1973 book by Flora Rheta Schreiber on dissociative identity disorder, a sybil attack is an
attack where one computer in a distributed system pretends to be multiple computers to gain an advantage. From my
standpoint, naming a type of computer security attack after a medical condition was an unfortunate choice. For this
reason this post uses the term *Theia attack* to refer to the same concept. This is named after a greek godess of
light and glitter and alludes to the attacker performs something alike an optical illusion, causing the attacked to
perceive multiple distinct images that in the end are all only reflections of the same attacker.
The core insight of computer science research on theia attacks is that there cannot be any technological way of
preventing such an attack, and any practical countermeasure must be grounded in some authority or ground truth that is
external to the systems—bridging from technology to its social or political context.
Looking around, we can see a parallel between this question ("which computer is a real computer?") and a social issue
that recently has been growing in importance: Just like computers can pretend to be other computers, they can also
pretend to be humans. As can humans. Be it within the context of election manipulation or down-to-earth astroturfing_
the recurring issue is that in todays online communities, it is hard for an individual to tell who of their online
the recurring issue is that in today's online communities, it is hard for an individual to tell who of their online
acquaintances are who they seem to be. Different platforms attempt different solutions to this problem, and all fail in
some way or another. Facebook employs good old snitching, turning people against each other and asking them "Do you know
this person?". Twitter is more laid-back and instead of such Stasi_ methodology simply opts to require a working mobile
phone number from its subjects, essentially short-circuiting identity verification to the phone company's check of their
this person?". Twitter is more laid-back and avoids this Stasi_ methodology in favor of requiring a working mobile phone
number from its subjects, essentially short-circuiting identity verification to the phone company's check of their
subscriber's national passport.
.. the preceding is a simplified representation of these platform's practices. In particular facebook uses several
methods depending on the case. I think this abbreviated discussion should be ok for the sake of the argument. I am
not 100% certain on the accuracy on the accuracy of the statement though. Does fb still do the snitching thing? Is
twitter usually content with a phone number?
Trusting Crypto-Anarchist Authorities
=====================================
Beyond these centralistic solutions to the problem, crypto-anarchists and anarcho-capitalists have been brewing some
interesting novel approaches to this issue based on *blockchain* distributed ledger technology. Distributed ledgers,
often colloquially called "blockchains", are a distributed systems design pattern that yields a system that works like
an append-only logbook. Participants with the right permissions can create new entries in this logbook, but
noone—neither the original author, nor other participants—can retroactively change a logbook entry once it has been
committed to the log. In the blockchain model, past entries are essentially written into stone. This near-perfect
immutability is the property that opens them for a number of use cases from cryptographic pseudo-currencies
[#cryptocurrency]_.
Beyond these centralistic solutions to the problem, crypto-anarchists and anarcho-capitalists have been brewing on some
interesting novel approaches to online identity based on *blockchain* distributed ledger technology. Distributed
ledgers are a distributed systems design pattern that yields a system that works like an append-only logbook.
Participants can create new entries in this logbook, but no one—neither the original author, nor other participants—can
retroactively change a logbook entry once it has been written. In the blockchain model, past entries are essentially
written into stone. This near-perfect immutability is what opens them for a number of use cases from cryptographic
pseudo-currencies [#cryptocurrency]_.
An overview over a variety of these unconventional blockchain identity verification approaches can be found in `this
unpublished 2020 survey by Siddarth, Ivliev, Siri and Berman <https://arxiv.org/ftp/arxiv/papers/2008/2008.05300.pdf>`_.
@ -61,16 +72,18 @@ social contacts. These computers then run an algorithm derived from the SybilGua
of random-walk based algorithms. These algorithms assume that authentic social graphs are small world graphs: Everyone
knows everyone else through a friend's friend's friend. They also assume that there is an upper bound on how many
connections with authentic users an attacker can forge: Anyone who is not embedded into the graph well enough is cut
out. Disregarding the catastrophic privacy issues of storing large amounts of data on social relationships on someone
else's computer, this second assumption is where this model unfortunately breaks down. Applying common sense, it is
completely realistic for an attacker to forge a large number of social connections: This is precisely what most of
social media marketing is about! A more malicious angle on this would be to consider how in meatspace [#meatspacefn]_
multi-level marketing schemes are successful in coaxing people to abuse their social graphs to disastrous consequences
to the well-being of themselves and others. Similar schemes would certainly be possible in cyberspace as well.
out. Like this, they put an upper limit on the number of theia identites an attacker can assume given a certian number
of connections to real people.
An additional point to consider is that the upper limit SybilGuard_ and others place on the number of fake identities
one can have is simply not that strict at all. An attacker could still get away with a reasonable number of false
identities before getting caught by any such algorithm.
Disregarding the catastrophic privacy issues of storing large amounts of data on social relationships on someone else's
computer, this second assumption is where this model unfortunately breaks down. Applying common sense, it is completely
realistic for an attacker to forge a large number of social connections: This is precisely what most of social media
marketing is about! A more malicious angle on this would be to consider how in meatspace [#meatspacefn]_ multi-level
marketing schemes are successful in coaxing people to abuse their social graphs to disastrous consequences to the
well-being of themselves and others. Similar schemes would certainly be possible in cyberspace as well. An additional
point to consider is that the upper limit SybilGuard_ and others place on the number of fake identities one can have is
simply not that strict at all. An attacker could still get away with a reasonable number of false identities before
getting caught by any such algorithm.
.. Duniter
@ -79,17 +92,20 @@ them, and who is at most a few degrees removed from one of several pre-determine
vulnerable to conmen and other scammers, this system has the glaring flaw of roundly refusing to recognize any person
who is not willing or able to engage with multiple of its members. Along with the system's informal requirement for
members to only vouch for people they have physically met this leads to a nonstarter in a cyberspace that grown
specifically *because* it transcends national borders and physical distance.
specifically *because* it transcends national borders and physical distance—two most serious obstacles to in-person
communication.
.. Idena Network
The last scheme I will outline in this post is based around a set of `Turing tests`_, that is, quizzes that are designed
The last scheme I will outline in this post is based around a set of `Turing tests`_; that is, quizzes that are designed
to tell apart man and machine. In this system, all participants have to simultaneously undergo a Turing test once in a
fortnight. The system uses a particular type of picture classification-based Turing test and does not seem to be
designed with the blind or mentally disabled in mind with accessibility concerns nowhere to be found in the so-called
"manifesto" published by its creators. But even ignoring that, the system obviously fails at an even more basic level:
The idea that everyone takes a Turing test at the same time only works in a world without time zones. Or jobs for that
matter. Also, it assumes that an attacker cannot simply hire a small army of people someplace else to fool the system.
fortnight. The idea is that this limits the number of theia identities an attacker can assume since they can only solve
that many Turing tests at the same time. The system uses a particular type of picture classification-based Turing test
and does not seem to be designed with the blind or mentally disabled in mind with accessibility concerns nowhere to be
found in the so-called "manifesto" published by its creators. But even ignoring that, the system obviously fails at an
even more basic level: The idea that everyone takes a Turing test at the same time only works in a world without time
zones. Or jobs for that matter. Also, it assumes that an attacker cannot simply hire a small army of people someplace
else to fool the system.
.. _SybilLimit: https://www.comp.nus.edu.sg/~yuhf/yuh-sybillimit.pdf
.. _SybilGuard: http://www.math.cmu.edu/~adf/research/SybilGuard.pdf
@ -108,29 +124,29 @@ the important question remains unasked:
Departing from all the systems outlined above, I want to make a suggestion on how we can approach this topic in a more
practical, less discriminatory [#discriminatory]_ manner. I think both using people's social connections and proxying
the decisions of external authorities such as the state are bad systems to decide who is a person and who is not. Let us
now illustrate this point a bit. Let us think about how many digital identities a human beign might have. Let us first
the decisions of external authorities such as the state are bad systems to decide who is a person and who is not. I will
now illustrate this point a bit. Let us think about how many digital identities a human beign might have. First,
consider the case of n=0, someone who simply wants no business with the system at all. For simplicity, let us assume
that we have solved this issue of consent, i.e. every person who is identified by the system consents to this practice.
For n=1, the approaches outlined above all provide some approximate solution. States may not grant every human
sufficient ID (e.g. children, mentally disabled or prisoners might be left out), and the social systems might fail to
catch people who simply do not have any friends, but otherwise their approximations hold. Maybe. But what about n=2,
sufficient ID (e.g. children, the mentally disabled or prisoners might be left out), and the social systems might fail
to catch people who simply do not have any friends, but otherwise their approximations hold. Maybe. But what about n=2,
n=3, ...? None of these systems adequately consider cases where a human being might legitimately wish to hold multiple
identities, non-maliciously.
Consider the case of a lesbian, conservative politician. An active social media presence is a core component of a modern
politician's carreer. At the same time, "conservative homophobe" is still well within the realm of tautology and it
would be legitimate for this politician to wish to not disclose this aspect of their private life to the world at large,
and have a separate online identity for matters related to it. For this politician, the social relationship-based
systems referenced above would either have outing them as a design feature, or they would force them to choose either of
these identities: Requiring them to choose between private life and carreer. When deferring to the state as the decider
over personhood, at least the platform's operator would know about the outrageously sensitive link between the
politician's online identities. Clearly, none of these systems are socially just.
Consider a hypothetical lesbian, conservative politician. An active social media presence is a core component of a
modern politician's carreer. At the same time, "conservative homophobe" is still well within the realm of tautology and
it would be legitimate for this politician to wish to not disclose a large fraction of their private life to the world
at large. They might have a separate online identity for matters related to it. For this politician, the social
relationship-based systems referenced above would either incorporate outing as a design feature, or they would force
the politician to choose either of their two identities: To choose between private life and carreer. When deferring to
the state as the decider over personhood, at least the platform's operator would know about the outrageously sensitive
link between the politician's online identities. Clearly, no such solution can be considered socially just.
Let us try not to be caught up on saving the world at this point. The issue of conservative homophobia is out of the
scope of our consideration, and it is not one that anyone can solve in the near future. Least of all can true change be
forced through contracts, legislation or other rules. There is a case for legitimate uses of multiple, separate digital
identities, and we do not have a technical or political answer to it. All hope is not lost yet, though. We can easily
scope of our consideration, and it is not one that anyone can solve in the near future. Magical realism aside, least of
all can some technological thing beckon this change. There is a case for legitimate uses of multiple, separate digital
identities, and we do not have a technical or political answer to it. All hope is not lost yet, though. We can easily
undo this gordian knot by acknowledging an unspoken assumption that underlies any social relationships between real
people, past the procrustean bed of computer systems or organizational structures these relationships are cast into.
@ -140,10 +156,10 @@ people, past the procrustean bed of computer systems or organizational structure
Thinking beyond the straw man politician above, this is evident in more subtle ways in almost all our everyday
relationships: Some people may know me by my legal name, some by my online nickname. To some I may be a computer
scientist, to some a flatmate. None of my friends and acquaintances have ever wanted to see my passport, or asked to
take my DNA to ascertain that I am in fact a differnet human than the others they know. It would simply be exceedingly
weird for someone I know to snoop around the other people I know, trying to build a map of where these people know me
from and whether they think the same about me. Yet, this concept of a consistent, global identity is exactly what up to
now all technological solutions to the identity problem are about.
take my DNA to ascertain that I am a distinct human being from the other humans they know. Also, it would simply be
exceedingly weird for someone I know to snoop around the other people I know, trying to build a map of where these
people know me from and whether they think the same about me. Yet, this concept of a single, consistent, global, true
identity is exactly what up to now all technological solutions to the identity problem are trying to achieve.
Building Bridges
================