Authors: Jakub Podolak, Krzysztof Chartanowicz
Every executive and marketing specialist knows how challenging it is to choose the best ad when managing multiple advertising campaigns. When you want to showcase your product using a visual ad, you want to choose one that gives you the highest possible performance, discarding the ones that are not engaging enough.
One of the most popular performance metrics is the Conversion Rate (CR). It measures how many views of the ad resulted in an action which was significant for the business (for example in a sale). The more engaging the ad, the higher the chance of a client to click and buy our product which also means higher CR. This is why the visual side of the ad is so important and CR depends on an image heavily.
However, seasonal sales, language variants, and tuning ads for specific markets can easily multiply the number of images and overwhelm media specialists. At some point, it is virtually impossible for a person or a team to know all the past trends, and what would be the expected CR of a given ad creative.
In this article, I would like to present how we can rate the performance of ads, based solely on an ad creative, using Robust Neural Networks. This could be a part of a broader algorithm that takes into account other data like text and metadata, but still makes a great standalone component that can identify more clickable features in the image. With the help of our solution, you can prepare better ad graphics or choose which ads are worth investing in, and which ones should be improved.
Everything Comes Down to Your Dataset
Our client manages about 150k different ad creatives in multiple ad campaigns covering several categories like weight supplements, cosmetics, hair conditioners, etc. Our goal was to develop an automated model that predicts CR for newly created ads and rates their performance. The module focused only on the image component of an ad, regardless of other metadata, text, product type, etc.
The biggest challenge we faced in that task was related to data quality. In our client’s dataset, we found large groups of similar pictures/ads, that only differed by the language used or their aspect ratios (Fig 1.). Some groups consisted of hundreds of such close matches.
Because the model focused only on image and its influence on performance with respect to CR, it should be giving similar predictions for similar images. This would give us confidence that new ads are indeed based on meaningful, interpretable visual features.
Neural Networks to the rescue
One of the desired properties of machine learning models, specifically neural networks, is that input data samples that are similar to each other should have similar outputs. This feature is called robustness, which means that even if we introduce small variations in input data, our results and performance remains stable.
There are several approaches to obtain this goal, like modifying training procedure or incorporating adversarial examples in the training dataset. The authors of this approach claim that one of its benefits is improving model performance by learning better feature representations. You can read more about robustness here.
In our experiments, we used a pre-trained ResNet18 classifier which was trained on the ImageNet dataset in a robust manner. This model is created to take an image as input and classify it as one of many classes, for example “dog”, “car”, etc. (Fig 2.).
Authors of robust convolutional neural networks claim that they are able to distinguish significant features in an image (by this they mean features which are obvious for a human analyzing image).
Creating a model to predict CR
Having an ad creative, our first approach to predict CR was to take historical CR data for each image and then use Transfer Learning to adapt the network to our task. We started with ResNet18 already trained for classification, and we removed the last layer specific for classification to adapt it. We exchanged it to a regression block, able to predict continuous CR values. Additionally, we trained this new network using images as input and historical CR values as output.
This approach is beneficial because pre-trained convolutional layers of the network still perform feature extraction, and then the last regression layer predicts CR. Unfortunately, this approach was not satisfying, because the predicted values of CR were not diverse enough and largely focused around the mean value of historical CR.
We could still utilize these pre-trained convolutional layers though, and we believed that they actually extracted significant features from an image – it was this last regression block that was the issue. We asked ourselves, what would happen if we removed it and got “raw” features from this neural network? Having such representation, we could estimate the CR by closeness of representations to other representations.
It turned out that if we pass our ad image through the network and retrieve penultimate layer values, we get a nice 512-dimensional feature representation of our image. Because of the robust nature of this network, similar ads should get similar feature representations.
We can see the results of feature extraction by calculating the euclidean distance between feature vectors of similar and different ads. Fortunately, similar images have lower distances, while different ones have bigger distances.
As a next step, we wanted to return a prediction for an ad – the CR of similar ad graphics. We can implement it by clustering ads’ features vectors using K-Means and calculating the mean CR in each cluster. We determine the number of clusters using the “elbow method” and Silhouette Score which says how good the particular clustering is. For the weight supplements group (the most important one) we got 4 distinct clusters with visual similarities and different mean CRs.
Now we can predict the CR of a new ad by assigning its image to the existing cluster and assuming it would perform similarly. This approach has its limitations – when we receive a novel ad graphic with a completely different feature set we may not have a good prediction for it. That’s why it’s important to regularly recalculate these clusters to update them and create new ones for new ad types.
Validating Business Value
We conducted an offline experiment on 900 ad campaigns for which we simulated what would happen if we predicted CR using our cluster similarity method. We used our method to find 30% worst performing ads and checked what would happen if we changed it to a better version recommended by our approach. Because this is a simulation based on historical data, we were able to check actual clicks and conversion distribution over these ads and see if changing those ads would be profitable or not.
For example, let’s suppose we have 3 ads from the same ad campaign (meaning the same product, market, and time period). We can predict the CR of these ads (for example: 0.1, 0.4, 0.2) and we also know how many clicks each one got in real life. Then we check what would happen if the worst one was blocked and received 0 clicks – then the second and third ones would receive more clicks because we will invest in their ad space. Since we know the real CR of these ads, e.g 0.2, 0.25, 0.3, we can calculate the number of conversions and profit in this alternate reality. After processing one campaign in that way, we update mean CR of our clusters, so the next campaign can use more recent data.
This approach resulted in a 3.5% increase in conversion number and an 11.5% total profit increase. The profit increase is a result of the fact that transferring traffic between ads does not increase their cost, but only improves their effectiveness.
Conclusions
In this article, we presented to you our algorithm for predicting conversion rate from ad creatives consisting of images only. With this algorithm, we can increase our profit by transferring traffic from less to more successful ad creatives, based on the prediction which ad creative is going to perform better.
To assure that the model will learn from image data only, we used robust networks, which are also perfect for datasets with a lot of noise. This solution can be a part of the bigger and more complex ad management system, like the one we developed for our client at MIM Solutions, or standalone automated help to gain even more profit from ads.