<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AI XHIELD]]></title><description><![CDATA[AI Security and Safety newsletter by Alde]]></description><link>https://blog.aixhield.com</link><image><url>https://substackcdn.com/image/fetch/$s_!lW8x!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f65eb9e-4c53-4c6f-95f0-09e3a632d060_2000x2000.jpeg</url><title>AI XHIELD</title><link>https://blog.aixhield.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 02 May 2026 11:46:29 GMT</lastBuildDate><atom:link href="https://blog.aixhield.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Alde Gonzalez]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[alde@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[alde@substack.com]]></itunes:email><itunes:name><![CDATA[Alde]]></itunes:name></itunes:owner><itunes:author><![CDATA[Alde]]></itunes:author><googleplay:owner><![CDATA[alde@substack.com]]></googleplay:owner><googleplay:email><![CDATA[alde@substack.com]]></googleplay:email><googleplay:author><![CDATA[Alde]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Understanding the Black Box - Part 2]]></title><description><![CDATA[Agents are opaque and we are embedding them into every digital interaction that we have]]></description><link>https://blog.aixhield.com/p/understanding-the-black-box-part-26d</link><guid isPermaLink="false">https://blog.aixhield.com/p/understanding-the-black-box-part-26d</guid><dc:creator><![CDATA[Alde]]></dc:creator><pubDate>Sun, 09 Nov 2025 17:08:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!q3OO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Before we jump right in the next steps a quick recap of Part 1</p><div><hr></div><p>Since their rise in 2022, LLMs built on the transformer architecture such as ChatGPT, Gemini, and Claude have revolutionized how humans interact with AI, software and computers. By 2025, their influence has expanded into image and video generation with systems like OpenAI&#8217;s Sora, Meta&#8217;s Vibes, and xAI&#8217;s Grok. Yet, despite their transformative capabilities, the mechanisms driving their intelligence remain largely mysterious. Unlike traditional software, which follows explicit, human-written instructions, LLMs learn from vast amounts of text data. Through this training process, they develop a dense network of trillions of parameters capable of <strong>encoding knowledge, reasoning, and creativity</strong>, but with <strong>little </strong><em><strong>interpretability</strong></em>. This opacity has given rise to the field of mechanistic interpretability, which aims to uncover how these systems actually work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.aixhield.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI XHIELD is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The first article in this series introduced how transformers process information during training. It explained how text is first tokenized into numerical representations, how those tokens are transformed into embedding vectors that capture meaning, and how information flows through the residual stream, a shared workspace where<strong> each transformer layer refines understanding</strong>. Together, these steps form the foundation for how transformers represent meaning and context.</p><div><hr></div><h4>Step 4: How attention heads let transformers use context and move information between tokens</h4><p>The embedding matrix gives each word its standalone meaning, but understanding language requires more than that. The real breakthrough in transformers is the <strong>attention mechanism</strong>, <em>which enables models to connect words across a sentence and interpret them in context</em>.</p><p>Take the word <strong>&#8220;bank&#8221;</strong>. It means something entirely different in I swam near the river bank versus I got cash from the bank. Attention allows the model to figure out which meaning fits by relating words to one another.</p><p>An attention layer contains multiple attention heads that operate in parallel, each focusing on different relationships between tokens. Every head has two core components:</p><ul><li><p><strong>QK (Query&#8211;Key) circuit:</strong> Decides where to look for relevant information. For each token being processed (the query), it scores how related it is to every previous token (the keys). These scores turn into probabilities, effectively telling the model how much attention to give to each earlier token.</p></li><li><p><strong>OV (Output&#8211;Value) circuit:</strong> Determines what information to bring over. Each source token (key) produces a value vector. The destination token (query) then receives a weighted average of these values, with weights coming from the attention pattern learned by the QK circuit. This new information is added back into the residual stream at that token&#8217;s position.</p></li></ul><p>When a token gives another a high attention score, it&#8217;s like saying, &#8220;That&#8217;s the information I need.&#8221; </p><p>Importantly, a query token can only attend to tokens that came before it, never to future ones.</p><p>Intuition: Think of each query as asking a question about all earlier words, and the keys and values as providing the answers.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qnfe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qnfe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp 424w, https://substackcdn.com/image/fetch/$s_!Qnfe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp 848w, https://substackcdn.com/image/fetch/$s_!Qnfe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp 1272w, https://substackcdn.com/image/fetch/$s_!Qnfe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qnfe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp" width="720" height="432" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Dot-product attention procedure&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Dot-product attention procedure" title="Dot-product attention procedure" srcset="https://substackcdn.com/image/fetch/$s_!Qnfe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp 424w, https://substackcdn.com/image/fetch/$s_!Qnfe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp 848w, https://substackcdn.com/image/fetch/$s_!Qnfe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp 1272w, https://substackcdn.com/image/fetch/$s_!Qnfe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde67aea-2776-4f00-a26a-f69007fcc06c_720x432.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>A key mechanism: Induction heads</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uqSk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uqSk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png 424w, https://substackcdn.com/image/fetch/$s_!uqSk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png 848w, https://substackcdn.com/image/fetch/$s_!uqSk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png 1272w, https://substackcdn.com/image/fetch/$s_!uqSk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uqSk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png" width="1158" height="583" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:583,&quot;width&quot;:1158,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;From Magic to Mechanics: The Induction Head Hypothesis Explained |  DataDrivenInvestor&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="From Magic to Mechanics: The Induction Head Hypothesis Explained |  DataDrivenInvestor" title="From Magic to Mechanics: The Induction Head Hypothesis Explained |  DataDrivenInvestor" srcset="https://substackcdn.com/image/fetch/$s_!uqSk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png 424w, https://substackcdn.com/image/fetch/$s_!uqSk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png 848w, https://substackcdn.com/image/fetch/$s_!uqSk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png 1272w, https://substackcdn.com/image/fetch/$s_!uqSk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d923dd-81db-4619-8b05-1969aec904cc_1158x583.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One particularly interesting type of attention head is the <strong>induction head</strong>, which powers what&#8217;s known as in-context learning, a model&#8217;s ability to pick up patterns or rules directly from examples in the prompt.</p><p>An induction head follows a simple algorithm:</p><p>If token A was followed by token B earlier in the text, then the next time A appears, predict that B will follow again.</p><p>This allows the model to generalize patterns it has never explicitly seen during training.</p><p>In practice, the induction circuit involves two heads:</p><ol><li><p>The previous-token head in the first layer copies information from one token to the next (for example, copying from sat to on).</p></li><li><p>The induction head in the second layer looks back to find where the current token appeared before, attends to the token that followed it (on in this case), and boosts the probability of generating that token next.</p></li></ol><p>This behaviour shows that transformers can learn algorithms, not just memorize data, and since induction heads only appear in models with at least two layers, they&#8217;re evidence that deeper models develop qualitatively new reasoning abilities.</p><p>In attention visualisations, induction heads appear as off-center diagonal patterns, showing how tokens in repeated phrases attend to the next token in their earlier counterparts.</p><h4>Understanding attention through indirect object identification (IOI)</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q3OO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q3OO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg 424w, https://substackcdn.com/image/fetch/$s_!q3OO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg 848w, https://substackcdn.com/image/fetch/$s_!q3OO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!q3OO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q3OO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg" width="1456" height="737" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:737,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!q3OO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg 424w, https://substackcdn.com/image/fetch/$s_!q3OO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg 848w, https://substackcdn.com/image/fetch/$s_!q3OO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!q3OO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7551ae1d-9c8f-4ea0-a076-ae11c4b80948_2334x1182.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><strong>Rowan Wang Tweet https://x.com/rowankwang/status/1587601532639494146</strong></figcaption></figure></div><p>Another fascinating example of how attention works comes from a task called indirect object identification (IOI), for instance:</p><p>When Mary and John went to the store, John gave a drink to...</p><p>The correct answer is Mary. In 2022, Redwood Research reverse-engineered how transformers solve this using a network of specialized attention heads arranged in a three-step circuit:</p><ol><li><p>Identify all names in the sentence (Mary, John, John).</p></li><li><p>Filter out duplicates (John).</p></li><li><p>Output the remaining name (Mary).</p></li></ol><p>These steps are carried out by three main groups of heads:</p><ul><li><p>Duplicate Token Heads: Detect repeated names and connect the later one to its earlier instance.</p></li><li><p>S-Inhibition Heads: Suppress duplicate tokens, preventing them from influencing the model&#8217;s next prediction.</p></li><li><p>Name Mover Heads: Copy the correct (non-duplicated) name to the final position, ensuring the model predicts Mary.</p></li></ul><p>This IOI circuit highlights how complex reasoning can emerge from the coordination of many attention heads, each performing a small, specialized role within the larger mechanism of understanding.</p><p><strong>Source</strong></p><ul><li><p>Indirect Object Identification in GPT-2: https://arxiv.org/abs/2211.00593</p></li><li><p>Neel Nanda: <em><a href="https://www.neelnanda.io/mechanistic-interpretability/walkthrough-ioi">A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy &amp; Alexandre Variengien)</a></em></p></li><li><p>https://www.alignmentforum.org/posts/3ecs6duLmTfyra3Gp/some-lessons-learned-from-studying-indirect-object</p></li><li><p>https://transformer-circuits.pub/2021/framework/index.html</p></li><li><p>https://aignishant.medium.com/unraveling-the-magic-of-q-k-and-v-in-the-attention-mechanism-with-formulas-035cb0781905</p></li><li><p>https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html</p></li><li><p>https://medium.com/data-science/what-are-query-key-and-value-in-the-transformer-architecture-and-why-are-they-used-acbe73f731f2</p></li><li><p>https://www.lesswrong.com/posts/XGHf7EY3CK4KorBpw/understanding-llms-insights-from-mechanistic</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.aixhield.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI XHIELD is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Understanding the Black Box - Part 1]]></title><description><![CDATA[Agents are opaque and we are embedding them into every digital interaction that we have]]></description><link>https://blog.aixhield.com/p/understanding-the-black-box-part</link><guid isPermaLink="false">https://blog.aixhield.com/p/understanding-the-black-box-part</guid><dc:creator><![CDATA[Alde]]></dc:creator><pubDate>Sat, 18 Oct 2025 16:02:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3-Bj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Since the launch 2022, LLMs built on the transformer architecture such as ChatGPT, Gemini, and Claude have reshaped the world with their ability to produce remarkably human-like text. Today 2025, we are witnessing their rapid adoption into images and videos, through OpenAI&#8217;s Sora, Meta&#8217;s Vibes, and xAI&#8217;s Grok. Yet behind this astonishing capability lies a deep mystery: </p><div class="pullquote"><p><em><strong>we still don&#8217;t fully understand how these systems function.</strong></em></p></div><p>Traditional software is explicitly programmed by humans, written line by line in interpretable code. LLMs, however, are not designed in this way; they are trained. Their behaviour emerges from learning to predict the next word across immense amounts of internet text, producing a dense web of trillions of parameters that somehow encode knowledge, reasoning, and creativity. This process yields extraordinary performance but little transparency. These models are undeniably powerful, yet the mechanisms driving their success remain largely opaque.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.aixhield.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI XHIELD is a reader-supported publication. To receive new posts and support my work, consider becoming a subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>As AI adoption accelerates understanding why LLMs say what they say has become paramount. This is where <strong>mechanistic interpretability</strong> comes in: the field dedicated to uncovering the inner workings of these black boxes and bringing clarity to the most powerful technology of our time.</p><p>As an investor in AI, I often find it difficult to distinguish <strong>genuine innovation from noise</strong>. Every technological wave attracts opportunistic or casual entrepreneurs , and the AI boom is no exception. </p><p>With software itself becoming increasingly Agentic, understanding the brain behind these agents has never been more crucial so these series of essays explores the inner mechanics of LLMs: how they learn, represent knowledge, and generate meaning. </p><div class="pullquote"><p>My goal is to explores the inner mechanics of LLM, both to deepen my understanding and to help navigate the AI investment landscape with greater insight.</p></div><p>Today the transformer, an ML model architecture introduced in 2017, is the most popular architecture for building LLMs. How a transformer LLM works depends on whether the model is generating text (inference) or learning from training data (training).</p><h2>LLMs during Training</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3-Bj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3-Bj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png 424w, https://substackcdn.com/image/fetch/$s_!3-Bj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png 848w, https://substackcdn.com/image/fetch/$s_!3-Bj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png 1272w, https://substackcdn.com/image/fetch/$s_!3-Bj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3-Bj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png" width="394" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/539b895e-5508-4e14-bdea-dc11786166fd_394x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:394,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:52159,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.aixhield.com/i/175898507?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3-Bj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png 424w, https://substackcdn.com/image/fetch/$s_!3-Bj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png 848w, https://substackcdn.com/image/fetch/$s_!3-Bj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png 1272w, https://substackcdn.com/image/fetch/$s_!3-Bj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539b895e-5508-4e14-bdea-dc11786166fd_394x512.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>During training, the transformer produces predictions for every token in a sentence. For each input position <em>i</em>, the model predicts the token that follows it, <em>i + 1</em>. Generating multiple predictions simultaneously allows for more efficient training.</p><p>These predictions are compared against the actual tokens in the training data, and the resulting errors are used to adjust the model&#8217;s parameters to improve its performance.</p><h4>Step 1 - Tokenization: converting text into tokens </h4><p>When the model receives a sentence such as &#8220;Bright stars shine tonight,&#8221; it first splits the text into smaller units called tokens. A token could be a complete word (e.g., &#8220;bright&#8221;), a segment of a word (e.g., &#8220;shine&#8221; and &#8220;s&#8221; from &#8220;shines&#8221;), or punctuation.</p><p>Each token in the model&#8217;s vocabulary is then mapped to a unique numeric ID. For example, &#8220;Bright stars shine tonight&#8221; might be represented as [21, 58, 77, 204]. This numeric sequence is the tokenized version of the text, produced by the tokenizer.</p><p>The model then adds positional embeddings to these token vectors to encode the order in which the tokens appear in the sentence..</p><h4><strong>Step 2 - Embeddings: giving meaning to tokens</strong></h4><p>After text is broken into tokens, each token is turned into an embedding vector, which is a list of numbers that represents its meaning. By multiplying the list of token IDs by an embedding matrix. </p><p>The embedding matrix has a size of<code>[vocabulary size, embedding dimension]</code>, meaning:</p><p>&#8226; Each word in the vocabulary has one row.</p><p>&#8226; Each row is the embedding vector for that token.</p><h5>So how do these vectors capture meaning?</h5><p>During training, the model learns to assign similar vectors to words with similar meanings such as &#8220;see,&#8221; &#8220;look,&#8221; and &#8220;watch.&#8221; In this high-dimensional space, similar words end up pointing in similar directions, so the angle between their vectors is small.</p><h5>What is an embedding vector?</h5><p>An embedding vector is a list of numbers that represents each token&#8217;s meaning.</p><h4><strong>Step 3 - The residual stream: How data flows</strong></h4><p>Inside a transformer, information moves through something called the residual stream. They are a shared workspace where different parts of the model write down and read information. Each transformer layer takes the current information in the stream, updates it, and passes it along.</p><p>At the start, the residual stream only contains the individual meaning of each word, without context. As the data flows through the transformer blocks, each layer refines those meanings by taking previous words into account. Over time, the model builds a richer understanding of each token in context.</p><h5>Residual Stream Technical Architecture:</h5><p>This stream is simply a list of vectors, one for each token in the input. Its shape is <code>[sequence length, model dimension],</code> which matches the shape of the embedding layer&#8217;s output.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aSeP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aSeP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png 424w, https://substackcdn.com/image/fetch/$s_!aSeP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png 848w, https://substackcdn.com/image/fetch/$s_!aSeP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png 1272w, https://substackcdn.com/image/fetch/$s_!aSeP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aSeP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png" width="615" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:615,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Refer to caption&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Refer to caption" title="Refer to caption" srcset="https://substackcdn.com/image/fetch/$s_!aSeP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png 424w, https://substackcdn.com/image/fetch/$s_!aSeP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png 848w, https://substackcdn.com/image/fetch/$s_!aSeP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png 1272w, https://substackcdn.com/image/fetch/$s_!aSeP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff28ae06c-d42c-4c3a-b95b-61d84af2c58c_615x813.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h6><em>                                                            Source: <a href="https://arxiv.org/html/2312.12141v1">Exploring the Residual Stream of Transformers</a></em></h6><p></p><div><hr></div><p><em>LLM architecture are an advance deep learning model, my intention is that this does not become overwhelming for readers, I keep dissecting their inner workings in following posts</em></p><div><hr></div><h6>If you want to dig deeper and check sources:</h6><ul><li><p>https://arxiv.org/html/2312.12141v1</p></li><li><p>https://arbs.io/2024-01-14-demystifying-tokens-and-embeddings-in-llm</p></li><li><p>https://www.lesswrong.com/posts/XGHf7EY3CK4KorBpw/understanding-llms-insights-from-mechanistic</p><div id="youtube2-7xTGNNLPyMI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;7xTGNNLPyMI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/7xTGNNLPyMI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.aixhield.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Project Castellana: Safety implementation of a VC Agent]]></title><description><![CDATA[Implementation of an AI Agent with Open Source Models and Single-Agent Safety]]></description><link>https://blog.aixhield.com/p/project-castellana-safety-implementation</link><guid isPermaLink="false">https://blog.aixhield.com/p/project-castellana-safety-implementation</guid><dc:creator><![CDATA[Alde]]></dc:creator><pubDate>Sun, 25 May 2025 06:15:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/405640fe-e74a-4762-a40e-739d9a7e681a_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AI Agents are rapidly transforming how software is built. As this is the Agentic era and software engineering has change I wanted to create a project that teach me how to write software securely</p><p>Software ate the world and AI is eating software and venture capital is no exception. At our firm, writing memos is a core part of our investment process. These memos explain our analysis, due diligence, and investment thesis, regardless of the startup&#8217;s stage.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.aixhield.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI XHIELD is a newsletter about AI Security. It is not investment advice</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>To support this process, I&#8217;ve been using tools like Perplexity to assist with market analysis. It has significantly reduced research time from around one week to just a few hours. However, while Perplexity is great for accelerating research, it doesn&#8217;t meet all the requirements needed for seamless integration into our internal investment workflows.</p><p>This led to <em><strong>Project Castellana</strong></em>, a prototype AI agent that can help write investment memos, built with safety engineering principles from day one.</p><h2><strong>The Problem: How Do We Actually Build Useful, Safe AI Agents?</strong></h2><p>To build a functioning AI agent, we need a few key components:</p><ol><li><p>An agentic framework &#8211; a software development kit (SDK) that lets us orchestrate interactions between tools and large language models (LLMs).</p></li><li><p>A clear role and task division &#8211; defining what each agent in the system should do.</p></li><li><p>Tools &#8211; custom-built or external tools that each agent can use to complete its tasks safely and accurately.</p></li></ol><p>Some of the most popular open-source agentic frameworks include:</p><ul><li><p>LangChain</p></li><li><p>LlamaIndex</p></li><li><p>CrewAI</p></li><li><p>AgentStack</p></li></ul><p>For Project Castellana, I chose CrewAI because it allows for structured, multi-agent collaboration in a modular way.</p><pre><code><code>from crewai import Agent
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from crewai_tools import EXASearchTool</code></code></pre><div><hr></div><h2><strong>The Agent Architecture</strong></h2><p>The system follows a hierarchical multi-agent approach, where each agent has a well-defined responsibility:</p><h3>Strategic Advisor Agent</h3><ul><li><p>Goal: Oversee and coordinate the crew&#8217;s work, ensuring high-quality, relevant, and non-generic outputs.</p></li><li><p>Context: Acts as an experienced project manager focused on aligning output with market-specific investment needs.</p></li></ul><pre><code>def get_strategy_advisor(trace_id=None):</code></pre><pre><code>return create_agent(
  role='Project Manager',

  goal='Efficiently manage the crew and ensure high-quality task completion with a focus on ensuring that the results are very specific and relevant and not generic and too zoom out',

  backstory="""You're an experienced project manager, skilled in overseeing complex projects and guiding teams to success. Your role is to coordinate the efforts of the crew members, ensuring that each task is completed on time and that the results are relevant and specific to  the market.""",

  tools=[],
  trace_id=trace_id,
  agent_name='strategy_advisor'
)</code></pre><h3>Competitor Research Agent</h3><ul><li><p>Goal: Identify and analyze real startups in defined AI subsegments.</p></li><li><p>Context: Specialized in spotting emerging, verifiable startups excluding well-known players like Google, Meta, Anthropic, OpenAI, etc.</p></li></ul><pre><code>def <strong>get_competitor_analyst</strong>(<em>trace_id</em>=None):
<em>return</em> create_agent(

  <em>role</em>='AI Startup Intelligence Specialist',

  <em>goal</em>='Identify and analyze relevant AI startups within specific AI subsegment markets',

  <em>backstory</em>="""Expert in mapping competitive landscapes for specific AI verticals. Specialized in identifying real, named emerging startups and scale-ups rather than tech giants like IBM, OpenAI, Google, META, Anthropic, HuggingFace. Known for finding verifiable information about startups' funding, technology, and market focus.""",

  <em>tools</em>=[exa_search_tool],
  <em>trace_id</em>=<em>trace_id</em>,
  <em>agent_name</em>='competitor_analyst'
)</code></pre><h3><strong>Tools the Agents Use</strong></h3><p>To support the above agents, I developed the following tools:</p><ul><li><p>Market Size Tool &#8211; Estimates the total addressable market for a given segment.</p></li></ul><pre><code>def <strong>estimate_market_size</strong>(<em>data</em>: str) -&gt; str:

return f"Estimated market size based on: {data}"
  market_size_tool = Tool(
  name="Market Size Estimator",
  func=estimate_market_size,
  description="Estimates market size based on provided data."
)</code></pre><ul><li><p>CAGR Calculator &#8211; Automatically computes compound annual growth rates from public or private data sources.</p></li></ul><pre><code>def <strong>calculate_cagr</strong>(<em>initial_value</em>: float, <em>final_value</em>: float, <em>num_years</em>: int) -&gt; float:

cagr = (<em>final_value</em> / <em>initial_value</em>) ** (1 / <em>num_years</em>) - 1

<em>return</em> cagr</code></pre><pre><code>cagr_tool = Tool(

  <em>name</em>="CAGR Calculator",
  <em>func</em>=calculate_cagr,
  <em>description</em>="Calculates CAGR given initial value, final value, and number of years."
)</code></pre><ul><li><p>Search Tool (via Exa) &#8211; Allows agents to access real-time web search results, optimized for sourcing startup-specific information.</p></li></ul><pre><code>class CustomEXASearchTool(EXASearchTool):
  def <strong>__init__</strong>(<em>self</em>):
  super().__init__(
    <em>type</em>='neural',
    <em>use_autoprompt</em>=True,
    <em>startPublishedDate</em>='2021-10-01T00:00:00.000Z',
    <em>endPublishedDate</em>='2023-10-31T23:59:59.999Z',
  <em>excludeText</em>=['OpenAI', 'Anthropic', 'Google', 'Mistral', 'Microsoft', 'Nvidia','general AI market', 'overall AI industry', 'IBM', 'Mistral'],
  <em>numResults</em>=10
 )

exa_search_tool = CustomEXASearchTool()</code></pre><p><strong>Embedding Safety Engineering Principles in Project Castellana</strong></p><p>The objective of Project Castellana is that the agentic system is built with <strong>safety engineering principles</strong> to make AI agents reliable and deployable in high-stakes professional contexts like investment decision-making.</p><div><hr></div><h3><strong>Risk Decomposition</strong></h3><p>Project Castellana starts by identifying potential failure points:</p><ul><li><p><strong>Data inaccuracy</strong> (e.g., hallucinated market size)</p></li><li><p><strong>Non-compliant output</strong> (e.g., biased or misleading content)</p></li><li><p><strong>Oversight failures</strong> (e.g., one agent missing red flags)</p></li></ul><p>These are broken down in terms of <em>likelihood</em>, <em>severity</em>, and <em>exposure</em>, allowing the design to target the most impactful risks early.</p><div><hr></div><h3><strong>Safe Design Principles</strong></h3><p><strong>Redundancy</strong></p><p>Outputs of the agent are in place to support cross-verification of key findings by triggering human-in-the-loop reviews of the sources used by the agent.</p><p><strong>Separation of Duties</strong></p><p>The multi-agent structure ensures no single agent performs all tasks. Each agent has a tightly scoped responsibility, which limits cascading failure risks.</p><p><strong>Principle of Least Privilege</strong></p><p>Agents only have access to the tools and data relevant to their roles. For instance, the Strategic Advisor cannot directly query Exa&#8212;it relies on outputs from specialized agents.</p><p><strong>Fail-Safes</strong> (In Progress)</p><p>Future iterations may include uncertainty estimates that flag outputs for human review if the confidence falls below a defined threshold.</p><p><strong>Transparency</strong></p><p>Outputs include tool provenance (e.g., &#8220;Market Size sourced from X, calculated via Y&#8221;), and internal reasoning steps can be logged and reviewed. This improves human interpretability.</p><p><strong>Defense in Depth</strong></p><p>The system is being designed to include multiple validation layers before an output is accepted into a memo&#8212;agent-level verification, tool-level checks, and optional human review.</p><div><hr></div><h3><strong>Systemic Safety and Accident Models</strong></h3><p>Rather than focusing solely on the reliability of individual components&#8212;such as the Get Competitors Agent&#8212;Project Castellana is being developed with <strong>systemic risk</strong> in mind: the kinds of failures that emerge not from a single malfunction, but from the interactions and dependencies between agents, tools, and user feedback loops.</p><p>This mirrors safety models used in high-stakes domains like aviation, where accidents typically arise from a chain of events rather than one isolated breakdown. In complex systems, failures rarely occur in isolation; they are often the result of cascading errors, misaligned assumptions, or silent coordination breakdowns.</p><p>Castellana applies principles from systems engineering and accident modeling to proactively manage these risks, ensuring the entire agentic workflow behaves robustly and predictably&#8212;even under pressure.</p><p>Here's how:</p><p><em><strong>1. Agent-to-Agent Communication Monitoring</strong></em></p><p>Each agent in Castellana operates with a well-defined role, but their outputs are often inputs for others. For example, the Get Competitors Agent provides findings to the Strategic Advisor, who integrates them into the memo. Systemic risks arise if:</p><ul><li><p>The Get Competitors Agent misinterprets the prompt and outputs incomplete data.</p></li><li><p>The Strategic Advisor assumes the data is comprehensive and doesn't seek corroboration.</p></li></ul><p>To counteract this, Castellana introduces explicit handoff protocols, where agents pass metadata along with their output (e.g., source quality, timestamp, uncertainty), giving downstream agents richer context to assess validity.</p><p><em><strong>2. Tool-Agent Interaction Governance</strong></em></p><p>Agents rely on external tools&#8212;like Exa for search or a CAGR calculator&#8212;for critical data. Systemic risk surfaces when tools fail silently, return outdated data, or are misused. For example:</p><ul><li><p>If Exa delivers results from 2020 without date metadata, an agent might incorrectly interpret them as current.</p></li><li><p>A parsing error in the Market Size Tool could propagate false estimates across the memo.</p></li></ul><p>Castellana addresses this by:</p><ul><li><p>Adding tool wrappers that enforce input/output validation and context tagging.</p></li><li><p>Logging all tool interactions so anomalies can be traced post-hoc.</p></li></ul><div><hr></div><h3><strong>Tail Events and Black Swans</strong></h3><p>Even if 99% of memos are accurate, the 1% that are confidently wrong pose significant reputational or financial risk. Black swan scenarios could include:</p><ul><li><p>A flawed valuation that makes it into a partner meeting</p></li><li><p>A hallucinated startup cited as a key competitor</p></li><li><p>An inappropriate thesis generated from faulty data</p></li></ul><p>By embracing the <strong>precautionary principle</strong> and horizon scanning (e.g., agents flagging &#8220;unknown unknowns&#8221; or anomalous outputs), Castellana aims to mitigate such risks even if they can&#8217;t be predicted.</p><div><hr></div><h3><strong>Implementation Gaps and Next Steps</strong></h3><p>While the <strong>structure and intent</strong> of Project Castellana align strongly with safety engineering principles, not all principles are fully implemented yet. For instance:</p><ul><li><p><strong>Fail-safe mechanisms and confidence thresholds</strong> are being explored.</p></li><li><p><strong>Redundancy and defense in depth</strong> are currently manual but will be automated.</p></li><li><p><strong>Comprehensive logging and explainability</strong> will require further development.</p></li></ul><div><hr></div><h3><strong>Application of Single Agent Safety</strong></h3><p>Beyond classical safety engineering, the sources describe AI-specific safety concerns such as monitoring, robustness, alignment, and systemic safety. Here&#8217;s how these apply to <em>Project Castellana</em>:</p><h4><strong>Monitoring</strong></h4><p>Monitoring involves identifying hazards, reducing exposure, understanding internal representations, detecting anomalies, and increasing transparency.</p><ul><li><p><em>Project Castellana</em> already emphasizes <strong>transparency</strong> as a safety feature, with outputs indicating the <strong>tool provenance</strong> (e.g., &#8220;Market Size sourced from X&#8221;) to improve human interpretability and accountability.</p></li></ul><p>To support monitoring and observability, <em>Project Castellana</em> uses<a href="https://portkey.ai/"> Portkey.ai</a>, a platform for managing and monitoring LLM-based agents in production. Portkey provides telemetry, error tracking, and prompt/response inspection capabilities that align with the <strong>monitoring</strong> and <strong>systemic safety</strong> goals described above. This operational layer helps bridge theory (AI safety principles) and practice (safe deployment of Castellana agents)</p><pre><code>try:
   from portkey_ai import createHeaders, PORTKEY_GATEWAY_URL
   PORTKEY_AVAILABLE = True
except ImportError:
   PORTKEY_AVAILABLE = False
   print("Portkey not available, falling back to direct OpenAI usage")


def get_portkey_llm(trace_id=None, span_id=None, agent_name=None):
   if PORTKEY_AVAILABLE:
       headers = createHeaders(
           provider="openai",
           api_key=os.getenv("PORTKEY_API_KEY"),
           trace_id=trace_id,
       )
       if span_id:
           headers['x-portkey-span-id'] = span_id
       if agent_name:
           headers['x-portkey-span-name'] = f'Agent: {agent_name}'


       return ChatOpenAI(
           model="gpt-4o",
           base_url=PORTKEY_GATEWAY_URL,
           default_headers=headers,
           api_key=os.getenv("t")
       )
   else:
       # Fallback to direct OpenAI usage
       return ChatOpenAI(
           model="gpt-4",
           api_key=os.getenv("OPENAI_API_KEY")
       )</code></pre><p>Future enhancements could include:</p><ul><li><p>Developing <strong>benchmarks and evaluations</strong> to assess the accuracy and quality of investment memo outputs.</p></li><li><p>Implementing <strong>anomaly detection</strong> to flag unexpected or potentially hazardous agent behavior.</p></li><li><p>Exploring <strong>mechanistic interpretability</strong> to better understand agents&#8217; decision processes, though this remains a challenging area.</p></li></ul><h4><strong>Robustness</strong></h4><p>Robustness addresses vulnerabilities in AI systems, including resistance to adversarial examples and Trojans.</p><ul><li><p><em>Project Castellana</em> acknowledges key risks like <strong>data inaccuracies</strong> and <strong>non-compliant outputs</strong>.</p></li><li><p>It applies <strong>redundancy</strong> (cross-verifying information across sources) and <strong>defense in depth</strong> (multiple validation layers, such as automated consistency checks and human-in-the-loop reviews), both critical in mitigating robustness failures.</p></li><li><p>Further steps could involve:</p><ul><li><p>Ensuring <strong>adversarial robustness</strong> for the models and tools used.</p></li><li><p>Auditing against <strong>Trojans</strong>, especially if open-source or externally trained models are incorporated.</p></li></ul></li></ul><h4><strong>Alignment</strong></h4><p>Alignment is about ensuring that AI agents act in line with human intent, avoiding deceptive or unintended behavior.</p><ul><li><p><em>Castellana</em> uses <strong>separation of duties</strong> and the <strong>principle of least privilege</strong> to constrain agent behavior.</p></li></ul><p>A <strong>Strategic Advisor Agent</strong> oversees outputs for quality and specificity, supporting high-level alignment with the memo-writing goal.<br></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.aixhield.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI XHIELD is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[AI Xhield]]></title><description><![CDATA[AI for Security, Security for AI and AI Infrastructure insights]]></description><link>https://blog.aixhield.com/p/ai-shield</link><guid isPermaLink="false">https://blog.aixhield.com/p/ai-shield</guid><dc:creator><![CDATA[Alde]]></dc:creator><pubDate>Mon, 27 Jan 2025 16:50:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0Jj4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aixhield.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aixhield.com/subscribe?"><span>Subscribe now</span></a></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;07b54a6d-4b4e-4319-a94d-ea495a33f4e6&quot;,&quot;duration&quot;:null}"></div><blockquote><p><strong>Hello, network!</strong></p><p>It&#8217;s been two years since <a href="https://www.linkedin.com/article/edit/7253384228376129536/#">OpenAI</a>'s ChatGPT was launched, and the world has embraced AI like never before.</p><p>We&#8217;ve seen tech giants investing heavily in infrastructure, particularly by purchasing NVIDIA H100s and making them available in their cloud services.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Jj4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Jj4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0Jj4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0Jj4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0Jj4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Jj4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg" width="1456" height="816" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!0Jj4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0Jj4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0Jj4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0Jj4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17422975-59cf-4034-9c92-165cf24cefdf_1488x834.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://www.reddit.com/r/pcmasterrace/comments/1awtso6/nvidia_made_29b_from_gaming_last_quarter_vs_184b/</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q4Jm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q4Jm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png 424w, https://substackcdn.com/image/fetch/$s_!q4Jm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png 848w, https://substackcdn.com/image/fetch/$s_!q4Jm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png 1272w, https://substackcdn.com/image/fetch/$s_!q4Jm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q4Jm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png" width="1456" height="815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:815,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!q4Jm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png 424w, https://substackcdn.com/image/fetch/$s_!q4Jm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png 848w, https://substackcdn.com/image/fetch/$s_!q4Jm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png 1272w, https://substackcdn.com/image/fetch/$s_!q4Jm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bf0eba-5eff-45d3-87af-141cf65bfda9_1473x825.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://www.jika.io/post/c90a9ecc-427f-11ee-8080-80013ec0134c</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C5eO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C5eO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png 424w, https://substackcdn.com/image/fetch/$s_!C5eO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png 848w, https://substackcdn.com/image/fetch/$s_!C5eO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!C5eO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C5eO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png" width="1213" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:1213,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!C5eO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png 424w, https://substackcdn.com/image/fetch/$s_!C5eO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png 848w, https://substackcdn.com/image/fetch/$s_!C5eO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!C5eO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61665e-eede-4e83-bddf-db35125382f0_1213x1000.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://sherwood.news/tech/meta-amazon-microsoft-massive-ai-capex-spending-quarterly-earnings/</figcaption></figure></div><blockquote><p>But what really stands out to me is Meta&#8217;s approach. They&#8217;re acquiring the infrastructure (H100s) to train models and seamlessly integrating them into consumer apps, offering these models almost for free in a semi-open-source format. This strategy is pushing the industry forward in a big way.</p><p>From an investment perspective, we&#8217;ve witnessed an explosion of startups focused on both the application layer and infrastructure, simplifying the creation of AI-native companies and products with GenAI through various models and techniques.</p><p>However, the topic I&#8217;m most interested in is <strong>security in AI systems</strong>. This is where we need more innovation to enable enterprises, mid-sized businesses, and SMEs to integrate AI securely.</p><p>After meeting with many companies and reading cybersecurity blogs such as <a href="https://www.linkedin.com/article/edit/7253384228376129536/#">Francis Odum</a>, <a href="https://www.linkedin.com/article/edit/7253384228376129536/#">Ross Haleliuk</a> , <a href="https://www.linkedin.com/article/edit/7253384228376129536/#">Return on Security</a>, <a href="https://www.linkedin.com/article/edit/7253384228376129536/#">Strategy of Security</a> and <a href="https://www.linkedin.com/article/edit/7253384228376129536/#">Altitude Cyber</a>, I believe there is a growing opportunity in <strong>AI Security</strong> and <strong>Security for AI</strong>. As AI becomes the next compute platform, this focus will be critical. The resources mentioned already provide great coverage of the current cybersecurity ecosystem, but more attention is needed in this specific area.</p></blockquote><div><hr></div><blockquote><p>Please let me in the comments the topics you want us to discuss:</p></blockquote><ul><li><p>Generative AI Regulation and Compliance</p></li><li><p>State of the Art on AI Explainability</p></li><li><p>Data Security for Generative AI</p></li><li><p>Other</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.aixhield.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Venture Engineer! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>