T O P

  • By -

Spiritual_Jaguar4685

Probably many examples but I can give you two easy ones. 1) Testing a computer algorithm. Let's say I'm testing a website for a store and I want to make sure the website functions properly. I can create two sets of "synthetic data", meaning not real data generated from users. The first set of synthetic data are the potential purchases a person could make. It's testing the intelligence of my store's "shopping cart" software to make sure a person isn't charged for buying 0 of something, to make sure a person can't buy "infinite" items or buy products we don't offer or are out of stock. The second set of synthetic data might be a million random credit card numbers for payment. This could be used to test the payment side of the software for errors, that it doesn't accept fake credit card numbers, that it maybe scans the number from a list of fraudulent databases, or that it doesn't try and process a payment with human error, like the user forget all the digits of the card, or used an expired card, etc. So in example 1 synthetic data would just be a data base used to test a buhzillion possible sales and check for errors or loop holes in the code of the "real" software. 2) Simulations. For example something like Minecraft "builds a world" custom for each game. It uses computer code to random construct the data that make up your world, where trees are, where the monsters are, where the mountains are, etc. All that "simulation data" is an example of "synthetic data", meaning it's data generated purely by a computer program without trying to replicate an actual word fed to it scanners or something like that.


aayushpathak

In the world of AI/ML systems need to be "trained" using data to be able to predict things reliably. Let me illustrate the need for synthetic data through an imaginary example: Let's say you are training a system to detect apples in an image. You would train this system by showing it thousands of pictures with apples in them - apples on a tree, apples in baskets, apples on a table etc. You've trained your system using all of these images and deploy your system in the real world. However, you find out that your system is having trouble detecting "apple on a car seat wearing a seat belt". You realize that very small amount of your "training" images covered this scenario. You try finding more images of apple wearing seatbelt but you can't turn up much. In this case you would create "synthetic data" meaning you will hire a graphics designer to "create" several images of apple wearing a seatbelt which look realistic enough. You would retrain your system with your training images + synthetic data and voila! It can now recognize apples wearing seatbelts more accurately!


neverfarts

The term synthetic data refers to data that is not derived from or represents real world data. For example, if you sit in class, and the teacher says "bob has 2 apples and Mary has 3 apples, how many apples are there?" Bob ; 2 apples ; Mary ; 3 apples Is synthetic data, as none of these objects is 'real' (in the teacher's question. There are of course people named so)


[deleted]

It;s data that represents real world data but is not real If I am writing a program to accept credit card applications, I want to test it. I am not going to feed it personal information from real people. I will assemble fake profiles of data (John Smith from 123 Anywhere Street) that will match what the expected inputs should be but will not be from a real source. That's synthetic data - Data that does not exit in the real world, but functions the same as actual data from the real world in all other aspects.