{"id":9983,"date":"2023-07-28T13:00:00","date_gmt":"2023-07-28T13:00:00","guid":{"rendered":"https:\/\/nft.runfyers.com\/index.php\/2023\/07\/28\/if-ai-image-generators-are-so-smart-why-do-they-struggle-to-write-and-count\/"},"modified":"2023-07-28T13:00:00","modified_gmt":"2023-07-28T13:00:00","slug":"if-ai-image-generators-are-so-smart-why-do-they-struggle-to-write-and-count","status":"publish","type":"post","link":"https:\/\/nft.runfyers.com\/index.php\/2023\/07\/28\/if-ai-image-generators-are-so-smart-why-do-they-struggle-to-write-and-count\/","title":{"rendered":"If AI Image Generators Are So Smart, Why Do They Struggle to Write and\u00a0Count?"},"content":{"rendered":"<p><\/p>\n<div>\n<p class=\"has-drop-cap\">Generative AI tools such as <a href=\"https:\/\/nftnow.com\/ai\/now-ai-midjourney-multi-prompt-secrets-googles-news-writing-ai-more\/\" target=\"_blank\" rel=\"noopener\">Midjourney<\/a>, Stable Diffusion, and DALL-E 2 have astounded us with their ability to produce remarkable images in <a href=\"https:\/\/www.zdnet.com\/article\/best-ai-art-generator\/\" target=\"_blank\" rel=\"noopener\">a matter of seconds<\/a>.<\/p>\n<p>Despite their achievements, however, there remains a puzzling disparity between what AI image generators can produce and what we can. For instance, these tools often won\u2019t deliver satisfactory results for seemingly simple tasks such as counting objects and producing accurate text.<\/p>\n<p>If <a href=\"https:\/\/nftnow.com\/features\/an-ai-generated-nude-just-sold-for-340000-so-why-are-people-angry\/\" target=\"_blank\" rel=\"noopener\">generative AI<\/a> has reached such unprecedented heights in creative expression, why does it struggle with tasks even a primary school student could complete?<\/p>\n<p>Exploring the underlying reasons helps sheds light on the complex numerical nature of AI, and the nuance of its capabilities.<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-ai-s-limitations-with-writing\">AI\u2019s limitations with writing<\/h2>\n<p>Humans can easily recognize text symbols (such as letters, numbers, and characters) written in various different fonts and handwriting. We can also produce text in different contexts, and understand how context can change meaning.<\/p>\n<p>Current AI image generators lack this inherent understanding. They have no true comprehension of what text symbols mean. These generators are built on artificial neural networks <a href=\"https:\/\/www.assemblyai.com\/blog\/how-dall-e-2-actually-works\/\" target=\"_blank\" rel=\"noopener\">trained on<\/a> massive amounts of image data, from which they \u201clearn\u201d associations and make predictions.<\/p>\n<p>Combinations of shapes in the training images are associated with various entities. For example, two inward-facing lines that meet might represent the tip of a pencil or the roof of a house.<\/p>\n<p>But when it comes to text and quantities, the associations must be incredibly accurate, since even minor imperfections are noticeable. Our brains can overlook slight deviations in a pencil\u2019s tip or a roof \u2013 but not as much when it comes to how a word is written, or the number of fingers on a hand.<\/p>\n<p>As far as text-to-image models are concerned, text symbols are just combinations of lines and shapes. Since text comes in so many different styles \u2013 and since letters and numbers are used in seemingly endless arrangements \u2013 the model often won\u2019t learn how to effectively reproduce text.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><figcaption class=\"wp-element-caption\">AI-generated image produced in response to the prompt \u2018KFC logo.\u2019 | Credit: The Conversation<\/figcaption><\/figure>\n<\/div>\n<p>The main reason for this is insufficient training data. AI image generators require much more training data to accurately represent text and quantities than they do for other tasks.<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-the-tragedy-of-ai-hands\">The tragedy of AI hands<\/h2>\n<p>Issues also arise when dealing with smaller objects that require intricate details, <a href=\"https:\/\/www.buzzfeednews.com\/article\/pranavdixit\/ai-generated-art-hands-fingers-messed-up\" target=\"_blank\" rel=\"noopener\">such as hands<\/a>.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1160\" height=\"552\" src=\"https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-Hands.png\" alt=\"\" class=\"wp-image-47742\" srcset=\"https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-Hands.png 1160w, https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-Hands-700x333.png 700w, https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-Hands-768x365.png 768w, https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-Hands-150x71.png 150w\" sizes=\"(max-width: 1160px) 100vw, 1160px\"\/><figcaption class=\"wp-element-caption\">Two AI-generated images produced in response to the prompt \u2018young girl holding up ten fingers, realistic.\u2019 | Credit: The Conversation<\/figcaption><\/figure>\n<\/div>\n<p>In training images, hands are often small, holding objects, or partially obscured by other elements. It becomes challenging for AI to associate the term \u201chand\u201d with the exact representation of a human hand with five fingers.<\/p>\n<p>Consequently, AI-generated hands <a href=\"https:\/\/twitter.com\/cantdohands?lang=en\" target=\"_blank\" rel=\"noopener\">often look misshapen<\/a>, have additional or fewer fingers, or have hands partially covered by objects such as sleeves or purses.<\/p>\n<p>We see a similar issue when it comes to quantities. AI models lack a clear understanding of quantities, such as the abstract concept of \u201cfour.\u201d As such, an image generator may respond to a prompt for \u201cfour apples\u201d by drawing on learning from myriad images featuring many quantities of apples \u2013 and return an output with the incorrect amount.<\/p>\n<p>In other words, the huge diversity of associations within the training data impacts the accuracy of quantities in outputs.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1156\" height=\"364\" src=\"https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-soda.png\" alt=\"\" class=\"wp-image-47743\" srcset=\"https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-soda.png 1156w, https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-soda-700x220.png 700w, https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-soda-768x242.png 768w, https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/AI-soda-150x47.png 150w\" sizes=\"(max-width: 1156px) 100vw, 1156px\"\/><figcaption class=\"wp-element-caption\">Three AI-generated images produced in response to the prompt \u20185 soda cans on a table.\u2019 | Credit: The Conversation<\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\" id=\"h-will-ai-ever-be-able-to-write-and-count\">Will AI ever be able to write and count?<\/h2>\n<p>It\u2019s important to remember text-to-image and text-to-video conversion is a relatively new concept in AI. Current generative platforms are \u201clow-resolution\u201d versions of what we can expect in the future.<\/p>\n<p>With <a href=\"https:\/\/theconversation.com\/ai-develops-human-like-number-sense-taking-us-a-step-closer-to-building-machines-with-general-intelligence-116820\" target=\"_blank\" rel=\"noopener\">advancements being made<\/a> in training processes and AI technology, future AI image generators will likely be much more capable of producing accurate visualizations.<\/p>\n<p>It\u2019s also worth noting most publicly accessible AI platforms don\u2019t offer the highest level of capability. Generating accurate text and quantities demands highly optimized and tailored networks, so paid subscriptions to more advanced platforms will likely deliver better results.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<p>This article is republished from <a href=\"https:\/\/theconversation.com\" target=\"_blank\" rel=\"noopener\">The Conversation<\/a> under a Creative Commons license. Read the <a href=\"https:\/\/theconversation.com\/if-ai-image-generators-are-so-smart-why-do-they-struggle-to-write-and-count-208485\" target=\"_blank\" rel=\"noopener\">original article<\/a> by <a href=\"https:\/\/theconversation.com\/profiles\/seyedali-mirjalili-1320951\" target=\"_blank\" rel=\"noopener\">Seyedali Mirjalili<\/a>, Professor, Director of Centre for Artificial Intelligence Research and Optimisation, <em><a href=\"https:\/\/theconversation.com\/institutions\/torrens-university-australia-899\" target=\"_blank\" rel=\"noopener\">Torrens University Australia<\/a><\/em>.<\/p>\n<\/p><\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><br \/>\n<br \/><a href=\"https:\/\/nftnow.com\/features\/if-ai-image-generators-are-so-smart-why-do-they-struggle-to-write-and-count\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Generative AI tools such as Midjourney, Stable Diffusion, and DALL-E 2 have astounded us with their ability to produce remarkable images in a matter of seconds. Despite their achievements, however, there remains a puzzling disparity between what AI image generators can produce and what we can. For instance, these tools often won\u2019t deliver satisfactory results [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":9984,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[10],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/nftnow.com\/wp-content\/uploads\/2023\/07\/072523_AI_Editorial_Graphic_2_feature-scaled.jpg","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/9983"}],"collection":[{"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=9983"}],"version-history":[{"count":0,"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/9983\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/9984"}],"wp:attachment":[{"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=9983"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=9983"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nft.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=9983"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}