Unsupervised Learning: A Deep Dive

Article contributed by:

Mario Favaits
Executive Education Fellow

Explore unsupervised learning with Mario Favaits, an executive education fellow at ACE.

When people talk about AI and Machine Learning they often refer to supervised learning and since recently also Generative AI. In supervised learning, we provide the machine a set of examples to learn from. Given enough examples to learn from during a process that we call training, the machine is able to predict an outcome or label (Diabetic or Non-Diabetic) given a set of inputs or features: Height, Weight and BMI.

The majority of datasets in this world however, are unlabelled. In this scenario, we would be given a very large list of individuals with Height, Weight and BMI. In order to train a model that is able to predict if somebody is Diabetic, we would need to add a label column to the dataset. This can be a very costly process for large datasets as the process is executed by humans.

Unsupervised learning is not constrained by predetermined outcomes or labels. It ventures into the uncharted data, discovering hidden structures and patterns that unlock untold insights. Given a large list of names including Height and Weight, a T-shirt manufacturer can rely on a clustering algorithm to find data clusters associated with T-shirt sizes: XS, S, M, L, and XL. 

Generative AI is unsupervised at its core, and has emerged as a creative force conjuring new instances of data that mirror the complexity and beauty of its training data. It crafts worlds of synthetic artistry and generates whispers of text, music, images and video, so realistic they blur the lines between the created and the actual.

Recommender engines are an application of semi-supervised learning as we can only rely on very small labelled dataset such as movie ratings by an individual. The algorithm however is able to predict how a specific user will rate a specific movie. This technology drives personalised experiences on platforms like e-commerce sites, streaming services, and social media, enhancing user engagement and satisfaction. 

Generative AI is unsupervised at its core, and has emerged as a creative force conjuring new instances of data that mirror the complexity and beauty of its training data. It crafts worlds of synthetic artistry and generates whispers of text, music, images and video, so realistic they blur the lines between the created and the actual.

Identifying outliers is another application that is widely used. An outlier is a datapoint that deviates from the norm, crucial for fraud detection in finance, identifying faulty equipment in manufacturing, and monitoring network security for unusual activities. Reinforcement learning is also unsupervised and the technology has been used by OpenAI to make sure that ChatGPT’s output is human-like, ethical and safe.

The above applications underscore the versatility and efficiency of unsupervised learning, harnessing the vast amounts of data available, thereby reducing the reliance on the costly process of annotating data by humans.

This makes unsupervised learning an indispensable tool in the advancement of AI technologies, offering scalable solutions across diverse domains.