Dataset Highlight: Large Scale Sybil Detection

The term “Sybil attack” metaphorically gets its name from Sybil (1973), a book about a psychotherapist treating a woman with dissociative identity disorder (DID). In the book, the therapist describes never knowing exactly which “version” of the patient she was talking to. The patient, Sybil Dorsett, manifested sixteen different identities over the course of treatment, each with unique accents, personalities and backgrounds.

Nearly thirty years after the book’s publishing, Microsoft researchers Brian Zill and John Douceur published the seminal paper “The Sybil Attack”, drawing parallels between Sybil’s disorder and the ability in decentralized systems for attackers to create numerous fake identities originated from the same entity. For years, the Sybil Attack was the primordial barrier preventing the creation of decentralized networks since attackers could easily flood the system with fake identities and manipulate the consensus process.

It wasn’t until the creation of Bitcoin in 2008 that the Sybil Attack was economically mitigated at the consensus level by Proof-of-Work. As Sybil protection mechanisms, both Proof-of-Work and Proof-of-Stake have empirically prevented Sybil attacks at the consensus level. These systems make it costly for attackers to try to manipulate consensus by simply creating fake identities, thereby solving the need for institutional trust.

While the Sybil issue has been solved at the consensus layer, it has become one of the most pervasive issues for the applications operating in decentralized networks. The pseudo-anonymous nature of decentralized network participants introduces challenges, particularly when there are incentives to fake behavior to harvest tokens or to manipulate governance processes. These attacks have now become the silent killers of crypto applications, undermining airdrops, corrupting onchain governance, distorting user analytics, and demanding valuable time and resources from the leaders and builders of these protocols.

In this post, we explore the many ways that Sybil attacks are negatively impacting the adoption of crypto applications and the promise of user-owned networks. We also examine current methods for building onchain Sybil resistance, and share how we at Portex are approaching this challenge with a fresh perspective.

unfairdrops: How Sybils Polluted the Idealistic Goals of Airdrops

Airdrops first emerged as a way to reward early adopters of crypto applications and experiment with aligning incentives. In the crypto vernacular, an “airdrop” refers to a token generation event (TGE) for a protocol in which some portion of the newly issued tokens are sent to a set of early users who contributed to the growth of the protocol. The protocol team sets some criteria for eligibility based on historical onchain data (this can be as lenient as just interacting once with the application or requiring more engagement) after which users can claim tokens if they are eligible.

The original premise behind airdrops was both compelling and idealistic: by distributing tokens to the users who participated early or demonstrated a commitment to an ecosystem, developers could create a decentralized set of owners who would directly contribute to the future of the protocol via governance. Airdrop proponents wondered things like what if Uber could have verifiably rewarded its first 1,000 users and drivers on the network? After all, those were the users who took a chance on trying a new product out and knew it best.

The early examples of airdrops such as Uniswap's 2020 launch of UNI inspired many projects to follow suit, turning token distribution into a key strategy for bootstrapping a user base. The benefits were clear: airdrops created excitement, helped incentivize new users to come and test the robustness of young infrastructure, and ensured that value was distributed to those who had already shown dedication to the community—in many cases, to tens or even hundreds of thousands of users.

However, these incentives also presented a lucrative target for exploitation once they were expected to be coming. With significant amounts of value on the line, opportunistic agents began gaming these systems, creating hundreds or even thousands of addresses to interact with an application onchain. What began as a novel means of rewarding community members has transformed into a breeding ground for industrial-scale Sybil attacks—a phenomenon that now threatens to undermine the primary goal of airdrops to get tokens into the hands of real users, turning onchain governance into plutocracy.

Into the Mind of a Sybil

In the context of blockchain applications, a Sybil attack involves a malicious actor creating multiple addresses to gain disproportionate influence or rewards within the application. This manipulation can have devastating effects, particularly for applications that rely on fair token distribution, voting-based governance, or any system designed to reward distinct, unique participants for their actions.

One way to visualize a Sybil attack at the application layer is through a cluster of seemingly unique addresses, all controlled by a single adversary. This is most easily detected through common funding patterns as illustrated in the cluster below, but is increasingly being concealed through more sophisticated funding patterns in an onchain game of cat and mouse.

Shoutout @nansen_ai chads for finding this this single cluster of 60,995! pic.twitter.com/5LDvo1OLiN
— Bryan Pellegrino (臭企鹅) (@PrimordialAA) June 8, 2024

The dynamics of Sybil attacks basically comes down to a simple equation: as long as the expected value (EV) of an attack is greater than the costs, the application remains vulnerable. Sybil attacks are fundamentally an economic problem as attackers weigh the potential gains of manipulating token distributions, governance, or incentives against the costs of executing the attack. Sybil attacks at scale (as we'll see below) involve time, resources, and creating sufficient plausible identities to evade detection. For many airdrops, the value at stake (there have been numerous billion-dollar airdrops at this point) is high enough that the rewards of a successful attack far outweigh the costs, creating strong incentives for Sybil actors.

Case Study 1: LayerZero’s ZRO Token Launch

If there is a single example of a recent airdrop that illustrates just how dire the Sybil problem in crypto has become, it would be LayerZero's launch of its ZRO token. LayerZero is a popular multi-chain interoperability protocol that allows users to move data across blockchain networks. Anticipating a token launch, Sybils interacted with LayerZero aggressively in hopes of exploiting the airdrop.

To address this anticipated Sybil problem, the LayerZero team took significant action to identify and exclude Sybil participants. They partnered with onchain data analysis firms Chaos Labs and Nansen to help hunt down and exclude Sybil attackers from the airdrop list. These firms provided support with clustering techniques to detect patterns indicative of Sybil activity, such as multiple addresses exhibiting similar transactional behaviors. As a result, the LayerZero team identified and excluded over 800K addresses suspected of being Sybils.

Another interesting aspect of the LayerZero token launch was a self-reporting incentive, which encouraged Sybils to self-report multiple addresses in exchange for retaining 15% of their allocation, rather than being fully disqualified. Inspired by similar strategies used by Hop Protocol and Safe, the LayerZero team also launched a community bounty hunt program to catch Sybils, rewarding onchain vigilantes with tokens that would have otherwise gone to Sybil accounts. When it finally came time to claim tokens, LayerZero added yet another step in the process in a “Proof-of-Donation” that required users to donate $0.10 to the Ethereum Protocol Guild for each ZRO they claimed.

Despite the monumental effort undertaken to detect Sybils, the aggregate user statistics following the LayerZero airdrop were telling. There was a sharp drop in daily active addresses interacting with the protocol, revealing the extent of Sybil involvement prior to the airdrop. The number of active addresses fell by close to 90% after LayerZero announced on May 1st that the first “airdrop snapshot”, the determination of eligible addresses based on historical data, had been completed. This highlights yet another downside of Sybil farms: their distortion of user analytics, which can mislead protocols about the true level of user engagement and adoption.

Case Study 2: zkSync

The recent zkSync airdrop provides an example in sharp contrast to LayerZero, showing the consequences of having no Sybil resistance strategy. Unlike LayerZero, zkSync chose not to implement any Sybil detection measures during its airdrop of the ZK token. While it is not entirely clear why, this decision might have stemmed from the time and cost associated with existing Sybil detection processes.

As one observer pointed out, if the zkSync team had even chosen to reuse the Sybil list generated from the LayerZero airdrop, they could have filtered out a significant number of Sybil addresses and saved millions of tokens from going to Sybil farms.

Shocking data from ZKSync Eligibility List

Sybil accounts are bagging 2,000,000+ ZK tokens by depositing identical ETH amounts on the same day, each receiving 15,000 tokens per wallet

What's more, nearly all of them are flagged on the @LayerZero_Labs sybil list

Link below⬇️ pic.twitter.com/hd9uipFzuj
— Artemis the Sybil Hunter (@artemis_rsch) June 11, 2024

Adding more support that ZKSync was dealing with a serious Sybil Problem, the network also experienced a downward trend in activity following the airdrop’s announcement, though not as sharp as with LayerZero. Once again this example shows the negative effects of Sybil farms not only on fair token distribution but also on the overall perception of user engagement.

Case Study 3: Gitcoin Grants

The onchain crowdfunding platform Gitcoin offers yet another example of a system that is under constant threat of Sybil attacks. Gitcoin is a platform for funding public goods: initiatives that are openly accessible and are offered free of charge, but as a result, struggle with funding through traditional business models. Gitcoin helps teams supporting public goods in the Ethereum ecosystem fund themselves through a Grants model that implements an approach known as Quadratic Funding (QF). QF operates under the basic principle that the number of unique donors to a cause matters more than the total amount donated. This is done via a matching function, where a pool of capital is set aside not to simply match each dollar in a 1:1 fashion but to more closely match the number of unique contributors to a project, creating a more democratic process.

However, this also introduces an incentive to conduct Sybil attacks by spinning up many addresses and donating small amounts from many addresses, thereby increasing the matching QF component for that project. Gitcoin started facing a challenge with Sybils years ago and has since been at the forefront of research and development of Sybil resistance solutions, mostly out of sheer necessity to safeguard the protocol.

To visualize an example, we pulled data for donations sent during Gitcoin Grants Round 9, which experienced significant challenges with Sybils. In the cluster below, we can see 5 unique donating addresses to Gitcoin, likely belonging to a single entity given the funding pattern we observe. The Sybil attack below is executed in a chain-like manner, with the original funder (0x86F…62c) funding 0xC71…EE0, which donated to Gitcoin and funded 0x3A8…5Ec which also donated to Gitcoin and funded 0xe72…C43 which also donated, and so on.

Note how all of the donations are also made roughly within the same day or so. Furthermore, the transfers back and forth between addresses give more credibility to the hypothesis that these addresses are controlled by a single entity. This is a smaller-scale cluster, but the effect is illustrated well: by splitting up donations into 5 separate addresses rather than donating the whole amount from one address, this entity had an outsized influence on the QF mechanism.

Given the importance and severity of the problem, Gitcoin even incubated a Sybil-resistance solution called Passport. Gitcoin Passport is a form of “Proof-of-Personhood” solution that helps assign a score to a given address based on offchain attestations and onchain activity patterns. Passport is an example of one of the two major approaches to fighting Sybils today, which we explore in more detail below.

Fighting Back the Bots: The Current Approach

The current Sybil Resistance approaches can be bucketed into two major categories: analytics-based (used in the LayerZero airdrop) and Proof-of-Personhood (PoP) based systems (like Gitcoin Passport). Here, we dive into the Pros and Cons of each. Note that full KYC/whitelisting (e.g. Goldfinch UID) or the use of biometric indicators (e.g. Worldcoin) both represent forms of Sybil Resistance, but have in the past raised concerns about privacy, feasibility, adoption, or comfort for some users.

Analytics-Based Approach

As explored above in the LayerZero example, onchain analytics can be effective in finding Sybil attackers. Crypto data and analytics firms like Nansen and Chaos Labs have stepped in to offer services (in consulting-like arrangements) in detecting and dissuading Sybil attackers by leveraging onchain data to identify clusters and patterns associated with Sybils. In some cases, community bounty hunts have also been used to incentivize individuals to help identify Sybils.

There are a few pros of the analytics approach that we see.

Clustering heuristics can be highly effective at finding the most egregious Sybil attackers through common funding patterns
Blockchain data is highly accessible and transparent, and the data tooling is getting better for anyone to analyze onchain activity at scale
The mere existence of these measures alone can function as a disincentive to potential attackers and increase the cost of concealing Sybil activity through more sophisticated wallet funding strategies
By looking at an address’s behavior as the primary Sybil determinant, the pseudonymous privacy model is preserved

But there are also some cons.

Expensive: Projects must allocate a significant portion of their treasury to combat Sybils, which is costly, especially for smaller projects that do not have the resources to hire large analytics firms
Time consuming: Projects have to negotiate and strike ad hoc contracts with analytics firms and evaluate their performance. Even if the project chooses to do a community bounty hunt for Sybils, they have to sift through thousands of entries one by one and determine their validity

Overall, using onchain analytics is an important step forward in fighting Sybils, but is not scalable in its current form.

User Authentication/Proof-of-Personhood (PoP)

Another emerging approach to fighting Sybils involves user authentication and Proof-of-Personhood (PoP) methods using offchain attestations to data and credentials. Solutions like Gitcoin Passport and zkPass have users build up a wallet’s “humanity score” through attestations and “stamps” like social media accounts, POAPs, and real-world IDs.

These solutions aim to provide a balance between user privacy and identity verification without fully compromising decentralization, and have a number of benefits we see.

Pros:

Easier to implement compared to more sophisticated data analytics.
Provides a basic level of Sybil resistance, which is better than having no resistance.
Potentially cheaper, though pricing for large-scale use is still not well defined.

But there are also some limitations.

Low cost of forgery: It can be relatively inexpensive for attackers to create fake social accounts or identities to bypass these checks.
Adoption challenges: Encouraging users to adopt such systems can be difficult, as seen with Gitcoin Passport.
Cost of ZK: Cost to verify zero-knowledge proofs (ZKPs) onchain can be non-trivial at this stage of the technology.
Trust: Requires a trusted party to fetch credentials and attest to user legitimacy. While attestation committees and mechanisms like Eigen AVS can help minimize trust, it still introduces a dependency.
Requires Active User Engagement: Users must be involved in the verification process, which imposes a non-trivial cost in terms of setup and time. Users might also have to refresh these credentials which adds an additional burden to users.

The Future of Sybil Resistance: An Onchain CAPTCHA?

There are clearly key benefits to the two major types of Sybil resistance solutions in the market today, however, we believe that the next generation solution will need to combine aspects of the two to overcome some of the challenges discussed above.

At Portex, we’re excited about the potential for new data types and products to be commercialized onchain and push innovation in the space forward. We believe Sybil resistance is one of the most salient issues in crypto that can be addressed with better data.

This is why we’re working on a data product called Wilbur, which is a machine learning model trained on the transaction graph of real users. Wilbur functions like a CAPTCHA for onchain applications.

Wilbur builds on our work with Reputation Oracles, but instead of classifying contracts as reputable or malicious, gives a probabilistic assessment of an address being a Sybil or a Human. Wilbur essentially combines the best of the two leading approaches to Sybil resistance today by using connections with Web2 APIs to prove human activity, while creating a robust defense mechanism through the use of onchain data patterns that can help identify Sybil activity.

There are a number of benefits to this approach. First, for projects, it externalizes the cost of proving non-Sybil off to users and greatly reduces the amount of money and time needed to invest in Sybil resistance. As seen with LayerZero’s airdrop and its Proof-of-Donation component, many users are willing to accept a small, reasonable cost to claim tokens. For users, Wilbur is a far more passive option compared to PoP solutions, and because it relies on a user’s transaction graph, only requires action when that user’s baseline probability of being a human doesn't meet the project's requirement. But even in the case of false negatives, the user can always go and do something that helps improve their probability.

Wilbur also gives projects flexibility by allowing them to pick along the probability distribution how lenient or stringent they would like to be with Sybil detection. For example, a project could choose to only allow airdrop claims from addresses that have a probability of being human over 40%, 50%, 75% etc. This is illustrated in the hypothetical graph below. The x-axis represents the probability of an address being a human or not, while the y-axis represents where addresses in a hypothetical airdrop snapshot would fall. The project conducting the airdrop would simply select the location of the vertical dashed line determining which addresses are in, and which are out.

Conclusion

Sybil resistance has become one of the most pressing issues in crypto, threatening to kill the idealistic goals of turning protocol users into protocol owners. By now, there are numerous case studies showing the devastating effects of Sybil attacks across token distribution, governance, user analytics, and more. Some solutions have emerged, but none so far have been successful in thwarting attacks in a reasonably priced and scalable manner that retains pseudo-anonymity of users. While Sybil resistance will always be a game of cat-and-mouse, we believe there are new solutions to be explored, using data to solve these core issues.