Why Not SOTA?

4 min readApr 22, 2021

When it comes to Deep Learning everyone who is building their model aim for SOTA level performance. But is it right or wrong to go for a SOTA model every time?

Let us Start with,

What is SOTA?

A SOTA or State Of The Art model often refers to a particular Neural Network model which aims at outscoring every other model that exists. Every year at top conferences we have seen hundreds of models being published each competing for that SOTA title in their respective fields for whatever the time period they have before there is a new model. But….. (this but will be answered later.)

Next, we come to,

Why is SOTA?

Every year at conferences like ICCV, ICML, NeurIPS, ECCV, IJCAI, ICLR, and several others we see different models with new architecture, better dataset size and resources, and most important more complex are introduced. Previously breakthrough in ML was mostly based on how coalesced and concrete were its results in a test environment. With deep learning, the paradigm has completely shifted. NVIDIA, Google, Facebook and independently-run labs like Deepmind and OpenAI have pushed the boundary on how far machine learning can be pushed. With examples like GO(also known as AlphaGO), GPT, etc, they have been breaking every limit that previously restrained how ML Researchers looked and traversed through problems.

And Finally, we are at,

The most important question that lingers around is

DO YOU REALLY NEED SOTA?

Machine Learning and especially Deep Learning promises to be such a revolutionary technology that some people term it as the “4th Industrial Revolution.”

Every company has or is trying to implement Machine learning in its services, platforms, or products. Now, most of these businesses do not use any kind of SOTA level Neural Network models and the reasons being:

Training Time
Computation Scaling/Power Needed
Deployability and Scalability
And, Functionality

But,

more often than not I have observed that beginners are trying to implement SOTA architectures for projects that might not even require such ordeal. In no way it is intended that SOTA architectures shouldn’t be used but one should understand that SOTA architectures are not the only solution out there.

What should be the main Focus?

Whenever you are trying to approach a problem instead of looking at some kind of state-of-the-art solution that exists, the focus should be on how can the problem be made simple. Sometimes even Linear Regression can provide promising results than some neural network based approach.

What You should include in your design?

As mentioned earlier,

Training time is a key component. It's like choosing between recursion or a loop. It is an important factor as it judges how many types of models you can actually try out before you publish your work. The lesser time equals more models but that doesn’t mean it will always provide better results. A trade-off that can be balanced by how carefully you design the model.

Computation scaling and power contribute majorly to the success or failure of a model. With large amounts of data a complex model can suffer both in terms of training time as well as its performance. Building large models have its own problem, like carbon footprint. The energy consumed by a multi-GPU model during training can be immense, and running those models, again and again, can sometimes be drastic.

Deployability and scalability should also linger in the minds of the developers. Without proper deployability and scalability features the model is lacking its ultimate feature which is to serve. Even if a model has a high performance score if it cannot be deployed or used in some kind of manner on a servable end like the web, mobile, or edge it falls short.

Finally, functionality. How diversely can a model be modified after its deployment. Whether is it prone to attacks. How far can you stretch the model's limits? Can the model be used for something else. All of these should be tested and experimented upon then only you will be able to understand where you, the designer, and the model can practically help each other. The more you spend time fine-tuning a model the better it will help you understand the depth of Machine Learning.

What Should you Do?

No one is stopping you from practicing or experimenting with SOTA models. But you should always keep in mind that SOTA is not the answer to every problem. The more you see the more you will be able to find the hidden secrets and treasures that guide you to better modeling. Like 5 Whys’ using simple questions like: Understanding what should be your approach, what to do with your data and how to structure it and how will you model understand everything, can help you in the long run.