Primer

Understanding Generative AI

Generative AI is nothing new in computer science circles however 2022 saw the proliferation of high quality tools designed for mass audiences. Meet the next pivotal technology.

OpenAI has been working in their corner of Silicon Valley pushing the boundaries of artificial intelligence through the application of machine learning to generative tools. Where does generative AI have it's roots, what should we expect as this wave of new tools and models are released, and what impacts on industriues in the future can we expect?

Pattern Recognition

Researchers have been pushing the boundaries for years as they took the computer and looked forward to ways it can understand the world. For a long time, computers were just calculators - complex entities that carried out rather straightforward calculations. As the calculations became more complex in nature the applications became more numerous. A turning point came when the computer was not only a way to carry out repeated tasks but also a device into which data could be fed. Those tasks, or patterns, were nothing more than an ordinary set of instructions and assets however the number of ways data was collected via sensors, databases, the internet, and by manual entry opened a new world to interact with data streams. Gone with the doorman or the price lookup from a book with motion sensors and barcode readers.

This exploded in the 70's and 80's as data streams from a wide variety of sources introduced new digital processing techniques and algorithms. The translation of a linear standard like barcodes and QR codes represent the linear, top down designed and specified systems that computers relied on for decades but the demand for more complex systems came as the computer took more of a generative role. Image editing techniques such as those employed in 3D modeling and image/photo manipulation pushed the boundaries of introducing patterns on the fly and made the process more dynamic, and even automatic. Photoshop, for example, has long had the ability to replicate and blend arrays of pixels in images to create images not renderable by hand or with a camera. The mathematical detection and replication of patterns that starts to set the course to where we are today.

Text AI

The first generative AI tools aren't nly focused.

Learning Models

These same techniques found their rise in rendering techniques and were pushed further through the development of machine learning algorithms that evaluated and discovered patterns as a part of the review of images. While the algorithms advance and become more capable in detecting patterns, the true value over the last 10 years has been associating metadata and "learning" about those patterns. Consider when Google's Recaptcha asks you if you are a robot and tests your abilities by asking you to click on those traffic lights or buses in an image grid. This is where we, the users, take on a new role in introducing computers to a basis of understanding of our world. It's not about the patterns as much as it is what those are patterns of.

What makes the computer so good at this is actually mimicking what makes humans so good at it. We know a dog when we see one because of the arrangement of light and shadows that reflect off the dog and reach our eyes. We think it is a dog because of it's physical characteristics (floppy ears, shape of nose and eyes, etc) but that can deceive us as we try to understand the patterns that actually separate dogs from other similarly sized or shaped animals. It may have nothing to do with the shape and everything to do with it's size or behavior. When I look out the window at night and see only a light bouncing along in the dark street around 6pm, I know it is a dog wearing a light on it's collar but not because of the dog itself. The bounce of the light, the characteristics of it's movement, past sightings of dog walkers at this time of day it appears, etc all work together to convince me it is, in fact, a dog. Similarly, this is how we know that a Great Dane and a Chihuahua are both dogs even though they vary in color, markings, shape, facial features among many other characteristics. They vary more than other pairings of distinct species of animals. My ability to identify a dog is the culmination of observations, teachings, instructions, and other patterns that contribute to my mental model for identifying objects and behaviors.

Now that we know how to identify a dog, the next big step becomes knowing that the word "dog" matches the patterns we know as identifying dogs. This is a bit of meta data that computers don't know without implicit instruction.

Google's Recaptcha does more than test if you are human. The challenges are serving two purposes - one is to see if your behavior matches those of an automated system or robot, and the other is to continue to train computers and help them establish a mental training model of their own. If I handed you a stack of cards with pictures and names of birds, and another stack with pictures and names of dogs, you could review them and when tested later, could easily identify if an image was a dog or a bird. Think of these stacks of cards as "training models". The names of the images are labels and provide the missing piece as computers "learn" about the world and the patterns they find and evaluate. Without these labels, or stacks of cards with images and names for each, the computer doesn't know what the images are. Generating models can be time consuming and tedious but in our age of big data, there are ways to speed this process up.

The biggest shortcut is offering free software for the purpose of training models and the big tech companies have been harnessing this for a while as they develop their own learning models. The early days of Facebook and Apple's photo albums allowed users to tag friends as a way to identify people in images. They made software to find patterns in these tagged images that matched faces. This pattern recognition was a process of looking at an image and separating out pixels in an image that were a face compared to pixels that aren't. As more images were tagged, the algorithms became better trained on how to find faces in images that were not tagged. Companies and researchers went a step further and started to match objects by using information in the descriptions and text associated with the uploaded image. Instead of comparing pixels against each other in a single image, they started to compare them across many many images. This represented the improved abilities of algorithms to find known objects without being specifically told where they are in a photo. Modern sharing platforms have expanded the meta data or labeling process to include location, coordinates, and other collections for information that increase the understanding of the content of an image by simply finding patterns in matching images.

Pattern Finding + Learning Models

Machine learning algorithms are amazing at mimicking this same process. When yo

 


Photo Credit and Caption: Underwater image of fish in Moofushi Kandu, Maldives, by Bruno de Giusti (via Wikimedia Commons)

Cite this page:

Wittmeyer, S. (2023, 16 January). Understanding Generative AI. Retrieved from https://seanwittmeyer.com/article/generative-ai-primer

Understanding Generative AI was updated January 16th, 2023.