Should you’ve been paying cautious consideration to YouTube just lately, you’ll have observed the rising pattern of so-called “faceless YouTube channels” that by no means characteristic a visual human speaking within the video body. Whereas a few of these channels are merely authored by camera-shy people, many extra are fully automated through AI-powered tools to craft all the pieces from the scripts and voiceovers to the imagery and music. Unsurprisingly, that is typically sold as a technique to make a quick buck off the YouTube algorithm with minimal human effort.
It isn’t onerous to find YouTubers complaining a few flood of those faceless channels stealing their embedded transcript files and running them through AI summarizers to generate their very own prompt knock-offs. However one YouTuber is trying to fight back, seeding her transcripts with junk knowledge that’s invisible to people however toxic to any AI that dares to attempt to work from a poached transcript file.
The facility of the .ass
YouTuber F4mi, who creates some excellent deep dives on obscure technology, just lately detailed her efforts “to poison any AI summarizers that had been attempting to steal my content material to make slop.” The important thing to F4mi’s technique is the .ass subtitle format, created a long time in the past as a part of fansubbing software program Superior SubStation Alpha. Not like less complicated and extra well-liked subtitle codecs, .ass helps fancy options like fonts, colours, positioning, daring, italic, underline, and extra.
It is these fancy options that permit F4mi conceal AI-confounding rubbish in her YouTube transcripts with out impacting the subtitle expertise for her human viewers. For every chunk of precise textual content in her subtitle file, she additionally inserted “two chunks of textual content out of bounds utilizing the positioning characteristic of the .ass format, with their measurement and transparency set to zero so they’re fully invisible.”
In these “invisible” subtitle packing containers, F4mi added textual content from public area works (with sure phrases changed with synonyms to keep away from detection) or her personal LLM-generated scripts stuffed with fully made-up details. When these transcript information had been fed into well-liked AI summarizer websites, that junk textual content ended up overwhelming the precise content material, creating a very unrelated script that may be ineffective to any faceless channel attempting to use it.