Big seeds and viral marketing

Having a product or campaign go viral is no small feat. As someone who has made a number of software tools that no one else on Earth has used (whether I've published the code to the world or not!), I can promise you that non-virality is a common state for a product to be in.

This lack of virality goes beyond my own failed efforts at helping the world with my code though, and extends to most products and campaigns. While there are numerous pieces written about how to create viral content, the simple truth is that not every ad gets watched a billion times on Youtube (and please don't click all those links, I was just trying to make a point).

To be certain there are some similarities between pieces of content that do go viral. They are fun or funny and they do resonate with their audiences. Some people have even created multiple campaigns that have all gone viral.

However, even if you integrate all of the advice that you find on how to create viral content, you still wouldn't be guaranteed of making a viral hit every time (life's unfair, I know). We know this scientifically from the great work that Bakshy, Hofman, Mason, and Watts did quantifying influence on Twitter. In this work there are two main take-aways:

  • Users that had started cascades in the past were likely to start another cascade (Unsurprising)
  • Features of the content do not improve predictions (Suprising)

So let's tackle these two points.

Some users are better at starting cascades than others

This is the part where you say "HA! THERE IS A KEY TO GOING VIRAL! YOU LIED TO ME!" and I say "Hold on, let's explain some boring stats".

In the paper, they construct a model to predict the influence that a single user will have. They feed into this model the following features for each user:

  • Number of followers
  • Number of friends
  • Number of tweets
  • Date of joining
  • Past total influence
  • Past local influence

where local influence is the number of reposts by immediate followers and total influence refers to the total cascade size. Once that model is fit to the data there are only two features that end up mattering:

  • Number of followers
  • Past local influence

All this points to the fact that paying someone that has a large number of followers and frequently posts content that its immediate followers repost is a solid strategy to go viral. And it is.

But that model (which is the best fitting one!) only explains 31% of the variation in the data. That means that 69% of the variance in the data is unexplained by the model. It also means that if you try to predict how big of a cascade your piece of content will get, you will have a huge error associated with the prediction.

So how does content not matter???

What the authors did is had a sample of the content labelled by Mechanical Turkers (a cheap way to have humans do simple tasks on Amazon). The turkers rated the content on a few different categories, specifically these ones:

  • Rated interestingness
  • Perceived interestingness to an average person
  • Rated positive feeling
  • Willingness to share
  • Type of URL (Social Media, Blog, Other, News)
  • Category of content (Sports, Business, Gaming, etc.)

When these different categories were analyzed there were differences in the average cascade size given the different types of URLs or the category of the content (See Figures 8 and 9). However, when these features were added to the existing model, none of the content features improved the predictive ability of the model. That point is where we derive the sensational headline that content does not matter.

Oh geez, what do these points mean?

These points inherently makes sense if we think about all of the data at our disposal. If you engage in tweeting, you know that some of your tweets are positive and engage with your audience but very few of them are retweeted broadly. Even if you had successes before, most of the content that you author doesn't go viral either. However, if you have created a cascade before I'd much rather have you tweet about something than an egg (a twitter egg is someone who hasn't even added a profile picture to their account).

And that is the really important point that this paper drove home. Most of the articles that you've read about creating viral content really only examined observed successes. The reality is that there are hundreds of thousands more pieces of content that have all of the same features as those success that went nowhere!

So how can we combat this? (Back to the post name)

Watts mentions this insight in the manuscript, but I feel like the HBR article is a more accessible way to read about it. The key when trying to conduct a successful 'viral' (note the quotes!) is to not plan on going viral, but trying to achieve as much reach as possible.

Typically when creating advertising content you would plan on just creating an ad that you show to specific people. But if you make that ad worth sharing (either through an explicit call to action to your viewers or through creating shareable content, like a gif of a cat using your product) then you will get your viewers to share your content for you. This sharing is what leads to increased reach for your content, even if it doesn't go viral.

What do you take away from this?

  • Don't build your campaign around the expectation that it will go viral. That's a fool's errand.
  • Do use viral features to increase your reach beyond your built-in audience. Hopefully this will lead to more subscriptions and a bigger audience for you to seed in future rounds (i.e. ever expanding reach until saturation)
  • Don't spend large amounts of money to seed a single user (i.e. paying for a promotional tweet from a celebrity) unless your acquisition costs are extravagant and that user is relevant to your product sector.
  • Do spend that same budget to seed hundreds or thousands of users. Expect most of those tweets to not travel very far, but at least you've gotten some decent reach for your budget.