Where Has Furong Been? Behind the Scenes of Our NeurIPS Competition

Most significantly, I’ve gained a deeper appreciation for the often-overlooked service work in academia. We typically celebrate intellectual achievements, yet organizing such events demands intellectual rigor that is every bit as challenging and crucial. It’s vital to recognize and value the efforts of those who step up to build platforms that foster the creation and exchange of ideas.

Lately, it seems I’ve mastered the art of being elusive. My family probably hasn’t noticed much change; to them, my absence is hardly out of the ordinary. But lately, even my colleagues are starting to wonder where I’ve disappeared to. Amidst a sabbatical at Capital One, I’m also steering my research group at UMD and navigating the adventurous world of parenting. My days start early, shuffling between prepping lunch boxes for my energetic pre-K toddler and beating the notorious DC traffic to dive into the real-world challenges of finance by 8 AM. After work, it’s all about quality time until my child’s bedtime, after which my academic cap goes back on for a night shift of paper reading, proposal writing, and engaging with my research and students. But things are still mildly under control—give or take a minor meltdown or two.

How did it all start?

The real chaos, however, kicked off with a casual chat at a PI meeting in San Jose. Alongside AI notables like Nicholas Carlini of Google Deepmind, Bo Li from U Chicago, and UMD colleague Tom Goldstein, we delved into the critical topic of distinguishing AI-generated content from human creations and the role of watermarks in digital products.

Modern watermarks, especially those invisibly embedded within images by generative AI models, are designed with the best intentions. However, they are not without their pitfalls. A primary concern is the false sense of security they may impart. You might believe your watermark’s detection accuracy—largely measured by the recall rate of watermarked instances—is reliably high. But what happens when these images are slightly altered using readily available editing tools? Can watermarks withstand such modifications? Furthermore, ensuring the precision remains high is crucial to avoid false positives, where genuine, non-AI-generated images are mistakenly flagged as synthetic. My colleagues might contend that the damage from a false positive—incorrectly accusing someone of using generative AI—is far greater than that of a false negative, where an AI-generated image goes undetected. In a world rife with misinformation and deceptive practices—where criminals might wear fake fingers to manipulate evidence, creating the illusion that security camera footage was AI-generated and therefore fake—the task of discerning synthetic from real images becomes even much more challenging.

That is not the end of the story yet. More critically, if a model claims not to have generated an image, can we ensure that no other model has produced it either? This is often why watermarks are criticized: it is nearly impossible to have universal, invisible watermarks (people may adopt their own choices), and it is nearly impossible to universally detect the watermarks given the diversity of the watermark type, model, messages, etc. The diversity actually makes watermark harder to be compromised. Ideally, if regulations were enforced, every model—both open and closed-source—would incorporate a watermark of its own (putting aside the feasibility of such enforcement in open source models), with detection APIs publicly available. However, with open APIs, the risk of malicious users reverse-engineering and removing these watermarks becomes significantly higher. Enhancing security through measures like seed randomization or increasing encryption bits are potential solutions, yet the issue remains largely unresolved.

The situation may appear chaotic and daunting. Yet, what if we could initiate a step towards addressing these challenges? Many studies assert the fragility of watermarks and the success of attacks against them, typically presupposing knowledge of the watermark or the model that generates or detects them, with fixed and/or known keys. However, real-world scenarios often lack such transparency. What if we could organize a competition that simulates this black-box environment, challenging participants to erase the watermark while preserving the image’s semantic integrity and quality?

Little did we know the sheer amount of work this endeavor would demand..

Inspired and perhaps a bit naïve about the workload, we decided to launch a competition at NeurIPS to tackle these challenges head-on. What started as a spirited discussion under the Californian twilight quickly spiraled into a frenzy of last-minute submissions that academics are not unfamiliar with. Tom missed half of his own group meeting as we scrambled to refine our competition design just hours before the submission deadline. At that point, we thought our draft was nearly ready—little did we know the sheer amount of work this endeavor would demand. Unlike submitting a conference paper, where you’re essentially done post-submission except for some rebuttals and social media promotion, this was just the beginning.

After we submitted in April, we were thrilled to hear, in May, that our competition was accepted—Hooray! But then reality struck: who was actually going to do the work? Our lead student authors were either wrapped up in internships or deep into their own research. Some were even on the job market, scouting their next big opportunity post-graduation. We began to recruit more intern students, sought assistance from fellow organizers, and held numerous brainstorming sessions. Eventually, we launched our competition website, set up a Google Form for submissions, and even published a starter kit online. We were proud of our progress and confident about meeting the upcoming deadlines. Easy-peasy—or so we thought.

The summer was bustling. We were all immersed in our projects, and the NeurIPS competition seemed like just another side project—until Kaggle reached out. Known for its robust competition platforms, Kaggle has both research and community tracks, and it works closely with organizers to tailor the competition experience. Their outreach and responsiveness were commendable, but setting things up as organizers proved to be a headache. Our point person, Mucong, faced numerous challenges with the platform’s user interface and getting technical support, which was only available for research competitions.

To Be, or Not To Be: choosing the right platform

Kaggle’s involvement presented an enticing opportunity: if we could secure over $50K in prize money and go with the research competition option, our competition would be featured prominently on their homepage. This visibility was crucial for a robust stress test of our watermark, and it was simply too tempting—it felt almost criminal not to seize the opportunity. However, turning this into a research competition required navigating a thicket of legal documents—a daunting task for someone more familiar with academic papers than contracts. As I liaised with legal teams from UMD and Kaggle, the complexities became overwhelming, and it was clear we might miss our planned launch date.

The financial gap was stark; initial sponsorship estimates were around $3K-$5K, far from the $50K needed for front-page status. Despite reaching out to numerous potential sponsors and even considering using foundation grants (a big no-no, our legal team advised), securing sufficient funds seemed impossible. We are grateful for the Department of Defense’s interest; however, the required bureaucratic processes unfortunately hinder timely funding.

Frustration mounted as the importance of the competition clashed with logistical realities. I considered personal sponsorship, and Tom kindly offered to share half the cost, but ultimately, UMD’s legal advisors rejected Kaggle’s contract terms. Reluctantly, we scaled back our ambitions for prominent Kaggle placement and shifted focus to making the best of a community competition.

Despite these setbacks, we persisted with Kaggle until the technical support limitations became insurmountable. Nightly stand-up meetings intended to be brief turned into lengthy strategy sessions, straining both personal time and team stamina. As the situation grew increasingly challenging, we reached a tipping point. As the launch deadline loomed, we faced a critical decision and opted to switch to Codabench—an open-source platform that allowed us greater control and customization. This move required rapid adaptation, assigning team members to front-end, back-end, and resource management roles just two weeks before going live. We still want to extend our sincere gratitude to the Kaggle team—thank you for all the help you’ve provided, which went beyond your obligations!

Continuing the nightly stand-up: the final push

Though more complex, this pivot to Codabench invigorated the team. We dove into coding and problem-solving with a clear goal: TO CREATE THE MOST ROBUST WATERMARK POSSIBLE. These days and nights of intense brainstorming were challenging yet strangely fulfilling, offering a stark contrast to the earlier logistical frustrations.

We successfully launched our competition on schedule, complete with a handy demo illustrating how to manipulate a watermarked image. Our team is diligently setting up an autonomous evaluation pipeline to enable a real-time leaderboard. We understand that participants are eager for immediate feedback, so we appreciate your patience. Currently, our organizers are juggling this project amidst the demanding ICLR submission cycle.

That sums it up. I’ve done quite a bit of reflecting, and yes, it’s been a mix of chaos and insights. However, organizing this competition has been profoundly educational. Most significantly, I’ve gained a deeper appreciation for the often-overlooked service work in academia. We typically celebrate intellectual achievements, yet organizing such events demands intellectual rigor that is every bit as challenging and crucial. It’s vital to recognize and value the efforts of those who step up to build platforms that foster the creation and exchange of ideas.

From the bottom of my heart, I extend my gratitude to all the organizers, with special thanks to Mucong Ding, Tahseen Rabbani, Bang An, Chenghao Deng, and Tom Goldstein. This endeavor has been a pivotal milestone in my career, enriching my experience in ways that will undoubtedly continue to resonate with me.

A Call for Sponsors!

As we advance our competition platforms, we are actively seeking partnerships with sponsors who share our commitment to integrity in AI. Personally, I am deeply concerned about the growing challenge of misinformation—the alarming prospect of not being able to distinguish synthesized content from reality motivates this competition. It’s a crucial effort to safeguard clarity and authenticity in our digital world. By supporting this initiative, you help soothe our collective soul and contribute to a clearer, more trustworthy future. Reach out to erasinginvisible@googlegroups.com; your support can make a significant difference.

Signing off (and finally getting some sleep),
Furong ‘Competition Conductor’ Huang

Tags: AI Content Detection, Generative AI, Image Watermarks, NeurIPS competition, Stress Test

Furong Huang

Associate Professor @ University of Maryland

Where Has Furong Been? Behind the Scenes of Our NeurIPS Competition

Where Has Furong Been? Behind the Scenes of Our NeurIPS Competition

Past News

NeurIPS ’22 Main Conference Papers from Huang Lab @UMD