<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by David Mezzetti on Medium]]></title>
        <description><![CDATA[Stories by David Mezzetti on Medium]]></description>
        <link>https://medium.com/@davidmezzetti?source=rss-bd6fa5e6030------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/2*3fwd97jd7MTO4d26XMZk1w.png</url>
            <title>Stories by David Mezzetti on Medium</title>
            <link>https://medium.com/@davidmezzetti?source=rss-bd6fa5e6030------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 20 Jun 2026 08:34:33 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@davidmezzetti/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Introducing ncoder]]></title>
            <link>https://medium.com/neuml/introducing-ncoder-c3d2dff7f55b?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/c3d2dff7f55b</guid>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[jupyter-notebook]]></category>
            <category><![CDATA[open-code]]></category>
            <category><![CDATA[ai-coding]]></category>
            <category><![CDATA[open-source]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Thu, 22 Jan 2026 18:34:24 GMT</pubDate>
            <atom:updated>2026-01-22T21:12:54.291Z</atom:updated>
            <content:encoded><![CDATA[<h4>💫 Open-Source AI coding agent that integrates with Jupyter Notebooks</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Q73TFljRv2ilHGiMM3SlRQ.png" /></figure><p><em>A </em><a href="https://www.youtube.com/watch?v=S5k-vrukEJk"><em>video version of this article</em></a><em> is also available.</em></p><p><a href="https://github.com/neuml/ncoder">ncoder is an open-source</a> AI coding agent that integrates with Jupyter Notebooks. This project uses the OpenAI API client to connect to any OpenAI-compatible endpoint and enable collaborative coding with AI.</p><p>ncoder provides a sandboxed <a href="https://hub.docker.com/r/neuml/ncoder">base Docker image</a> that supports coding with <a href="https://opencode.ai/">OpenCode</a> in <a href="https://opencode.ai/docs/server/">server mode</a>, a <a href="https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF">quantized Qwen3-Coder 30B model</a> for lightweight local inference and/or any other <a href="https://github.com/neuml/txtai">txtai process</a>.</p><p>ncoder is designed for Developers, AI Engineers and Data Scientists that spend a lot of their time inside of Jupyter Notebooks. If you do your research and/or prototyping inside of notebooks, this gives you an easy way to pull in new ideas.</p><h3>Getting Started</h3><p>ncoder consists of two parts: a sandboxed Docker image with an AI coding agent and a local Jupyter Notebook.</p><p>The coding agent can be started using one of the following ways.</p><pre># DEFAULT: Run with opencode backend, sends data to `opencode serve` endpoint<br>docker run -p 8000:8000 --gpus all --rm -it neuml/ncoder<br><br># ALTERNATIVE 1: Run with qwen3-coder, keeps all data local<br>docker run -p 8000:8000 -e CONFIG=qwen3-coder.yml -gpus all \<br>--rm -it neuml/ncoder<br><br># ALTERNATIVE 2: Run with a custom txtai workflow<br>docker run -p 8000:8000 -v config:/config -e CONFIG=/config/config.yml \<br> --gpus all --rm -it neuml/ncoder</pre><p>Running in a sandboxed environment decouples AI coding from your local working environment. Running in isolation provides assurance that it won’t modify your workspace directly.</p><p>Next, install the Jupyter Notebook extension on your local machine.</p><pre>pip install ncoder</pre><p>Jupyter Notebooks can be created in <a href="https://code.visualstudio.com/docs/datascience/jupyter-notebooks">Visual Studio Code</a> or your preferred notebook platform. Add the following two sections to any notebook to test.</p><pre># Load ncoder extension<br>%load_ext ncoder</pre><pre># Test it out<br>%ncoder Write a Python Hello World Example</pre><p>An <a href="https://github.com/neuml/ncoder/blob/master/example.ipynb">example notebook</a> is available for reference.</p><p>The ncoder Jupyter Notebook extension works with any LLM API that has OpenAI API compatibility. It’s simply a matter of setting the correct environment variables.</p><pre>%env OPENAI_BASE_URL=LLM API URL (e.g. https://api.openai.com/v1)<br>%env OPENAI_API_KEY=api-key<br>%env API_MODEL=gpt-5.2<br><br>%load_ext ncoder</pre><p>These same parameters can be used if the sandboxed Docker coding agent is being run using a different configuration (<em>the default url is </em><a href="http://localhost:8000/v1)."><em>http://localhost:8000/v1</em>).</a></p><h3>Demo</h3><p>The short video clip below gives a brief overview on how to use ncoder.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Ab4hk-Z9Y5ci93rb.gif" /></figure><h3>Wrapping up</h3><p>This article introduced ncoder, an open-source AI coding agent that integrates with Jupyter Notebooks. ncoder also provides a sandboxed Docker image to decouple code generation from your working environment.</p><p>If you do your research and/or prototyping inside of notebooks, this gives you an easy way to pull in new ideas. Give it a try!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c3d2dff7f55b" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/introducing-ncoder-c3d2dff7f55b">Introducing ncoder</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[NeuML — 2025 Year in Review]]></title>
            <link>https://medium.com/neuml/neuml-2025-year-in-review-019158d998f7?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/019158d998f7</guid>
            <category><![CDATA[year-in-review]]></category>
            <category><![CDATA[2025]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Thu, 01 Jan 2026 13:05:40 GMT</pubDate>
            <atom:updated>2026-01-06T20:49:51.128Z</atom:updated>
            <content:encoded><![CDATA[<h3>NeuML — 2025 Year in Review</h3><h4>Recapping 2025 and looking ahead to 2026</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*hS6oR2jv_xVC8XsLPL3Kjg.png" /></figure><p><em>Check out </em><a href="https://www.youtube.com/watch?v=lnaZ94s0RZ8"><em>this video recap</em></a><em> for a more in depth view of NeuML’s 2025.</em></p><p><a href="https://neuml.com">NeuML</a> is the company behind txtai, an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.</p><p>In 2025, NeuML continued to deliver AI-driven functionality both in open source and with paid consulting efforts. This article recaps the progress made in 2025 and looks ahead to 2026.</p><h3>TxtAI</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*G7J-woRxJzAlxxmQV1tC0w.png" /><figcaption><a href="https://github.com/neuml/txtai">https://github.com/neuml/txtai</a></figcaption></figure><p><a href="https://github.com/neuml/txtai">txtai</a> is an all-in-one AI framework for semantic search, LLM orchestration and language model workflows. This is the foundational piece of software that all of our work stands on.</p><p>Highlights for txtai in 2025:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DnljEuaPj9tJrc_Z4x9Rtg.png" /><figcaption><a href="https://www.star-history.com/#neuml/txtai&amp;type=date">https://www.star-history.com/#neuml/txtai&amp;type=date</a></figcaption></figure><ul><li>⭐<strong>2,142</strong> stars on GitHub to bring the total to ⭐<strong>11,975</strong></li><li><strong>270</strong> total commits on GitHub</li><li><strong>145</strong> total issues resolved on GitHub</li><li><strong>11</strong> releases. Entered the year at v8.1 and finished at v9.3</li><li><strong>10</strong> articles and example notebooks added</li></ul><p>Let’s recap the major functionality added.</p><h4>New in 2025</h4><p>In 2025, txtai had one major release — 9.0 along with 10 minor releases.</p><p><a href="https://medium.com/neuml/whats-new-in-txtai-9-0-d522bb150afa">💡 What’s new in txtai 9.0</a></p><p>txtai 9.0 was released in <strong>August 2025</strong>. This release added SPLADE, ColBERT, MUVERA and Reranking pipelines.</p><p>Below is a summary of the major features added in 2025.</p><ul><li><a href="https://github.com/neuml/txtai/blob/master/examples/76_Whats_new_in_txtai_9_0.ipynb">SPLADE / ColBERT / MUVERA Vectorization</a></li><li><a href="https://github.com/neuml/txtai/blob/master/examples/79_RAG_is_more_than_Vector_Search.ipynb">RAG with any function</a></li><li><a href="https://github.com/neuml/txtai/blob/master/examples/71_Analyzing_LinkedIn_Company_Posts_with_Graphs_and_Agents.ipynb">smolagents as agents backend</a></li><li><a href="https://github.com/neuml/txtai/blob/master/examples/77_GraphRAG_with_Wikipedia_and_GPT_OSS.ipynb">Graph Vector Search</a></li><li><a href="https://github.com/neuml/txtai/blob/master/examples/73_Chunking_your_data_for_RAG.ipynb">Improved Chunking</a></li><li><a href="https://github.com/neuml/txtai/blob/master/examples/78_Accessing_Low_Level_Vector_APIs.ipynb">GGUF Vector Store</a></li><li><a href="https://neuml.github.io/txtai/observability/">Observability</a></li><li><a href="https://github.com/neuml/txtai/issues/695">Embeddings indexing checkpoints</a></li><li><a href="https://github.com/neuml/txtai/blob/master/examples/74_OpenAI_Compatible_API.ipynb">OpenAI Compatible API</a></li><li><a href="https://neuml.github.io/txtai/api/mcp/">Model Context Protocol (MCP)</a></li></ul><p>At the end of 2025, txtai is a growing and major player in the AI Orchestration Framework space. This illustration is frequently cited on LinkedIn and other social platforms showing txtai&#39;s place in the RAG ecosystem.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/0*PW_I93IG7eYf3s6n.jpeg" /></figure><p>NeuML is one of the few bootstrapped non-VC backed companies on this list!</p><p>Looking ahead into 2026, we’ll continue to evangelize txtai as the premier minimalist AI framework for semantic search, LLM orchestration and language model workflows.</p><p>In addition to txtai, NeuML has a number of other open source projects that continue to evolve. This includes <a href="https://github.com/neuml/paperai">PaperAI</a>, <a href="https://github.com/neuml/paperetl">PaperETL</a>, <a href="https://github.com/neuml/rag">RAG</a> and <a href="https://github.com/neuml/annotateai">AnnotateAI</a>. Check out our <a href="https://github.com/neuml">GitHub page</a> for more.</p><h3>Open Models</h3><p>NeuML believes in open-source AI. As part of that, we’ve released a number of public models on the Hugging Face Hub. At the end of 2025, NeuML has <strong>61 models available</strong> on the Hub and <strong>12 datasets</strong>.</p><p>Here’s the highlights from 2025.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/680/0*R27WCfLBO4666OnF" /><figcaption><a href="https://huggingface.co/NeuML/pubmedbert-base-embeddings">https://huggingface.co/NeuML/pubmedbert-base-embeddings</a></figcaption></figure><p>Our most popular model on the Hub! This model receives over <strong>150K downloads a month</strong> and has almost <strong>10 million lifetime downloads</strong>! It’s also been <a href="https://scholar.google.com/scholar?start=0&amp;q=%22pubmedbert+embeddings%22&amp;hl=en&amp;as_sdt=0,47">cited at least 56 times according to Google Scholar</a>. This includes popular journals such Nature, Springer and Elsevier.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/680/0*-9FVXFFKKmAiDZtd" /><figcaption><a href="https://huggingface.co/collections/NeuML/bert-hash-nano-models">https://huggingface.co/collections/NeuML/bert-hash-nano-models</a></figcaption></figure><p>Next up is a model series we’re particularly proud of. We’ve long discussed the concept of “micromodels” all the way back to 2023 (see article below).</p><p><a href="https://medium.com/neuml/the-big-and-small-of-txtai-4ca405c1b82">The big and small of txtai</a></p><p>The BERT Hash Nano Models series introduces a simple technique to significantly reduce model parameter sizes. Instead of the embeddings layer mapping directly to the hidden size, a projection layer is added in.</p><p>The article below discusses this is in more detail.</p><p><a href="https://medium.com/neuml/training-tiny-language-models-with-token-hashing-b744aa7eb931">Training Tiny Language Models with Token Hashing</a></p><p>These models perform surprisingly well and come in under 1 million parameters. There are also <a href="https://huggingface.co/NeuML/colbert-muvera-nano">fine-tunes available for ColBERT</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/680/0*cjdkS8Jap83kv2FR" /><figcaption><a href="https://huggingface.co/NeuML/biomedbert-hash-nano">https://huggingface.co/NeuML/biomedbert-hash-nano</a></figcaption></figure><p>Building on the BERT Hash Series is BiomedBERT Nano. This is a 970K parameter <a href="https://arxiv.org/abs/1810.04805">BERT</a> encoder-only model trained on data from <a href="https://pubmed.ncbi.nlm.nih.gov/">PubMed</a>.</p><p>Additional fine-tunes are also available.</p><ul><li><a href="https://huggingface.co/NeuML/biomedbert-hash-nano-embeddings">BiomedBERT Hash Nano Embeddings</a></li><li><a href="https://huggingface.co/NeuML/biomedbert-hash-nano-colbert">BiomedBERT Hash Nano ColBERT</a></li></ul><p>See the following article for all the details on these models.</p><p><a href="https://huggingface.co/blog/NeuML/biomedbert-hash-nano">Encoding the World&#39;s Medical Knowledge into 970K</a></p><p>There are also a <a href="https://huggingface.co/NeuML/models">number of other models available</a> for Text to Speech, Static Vectorization, Language Identification and more.</p><h3>Consulting Services</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*h2_g0uOo7Ha-5fOk" /><figcaption><a href="https://neuml.com">https://neuml.com</a></figcaption></figure><p>NeuML provides consulting services around our open-source stack:</p><ul><li><strong>Generative AI</strong> Build agents, retrieval-augmented generation (RAG), large language model (LLM) orchestration and chat with your data systems</li><li><strong>AI-driven Literature Analysis</strong> Automate analysis of unstructured medical, scientific and technical literature</li><li><strong>Model Development</strong> Create AI, Embeddings and/or LLM models that excel in industry-specific domains</li><li><strong>Advisory and Strategy</strong> Leverage our expertise to plan your data, engineering and AI strategy</li><li><strong>Proposals</strong> Integrate AI into your technical proposals utilizing our knowledge of industry trends</li><li><strong>Development Support</strong> Meet with us, get txtai implementation guidance and/or outsource development</li></ul><p>While we keep the details of our consulting engagements private, this is the primary revenue stream for NeuML. It’s crucial that both our open source projects and open models are demonstrative of our core capabilities and that can be translated into paid work.</p><p>The ideal consulting client for NeuML is small to medium companies. One misconception is that some projects “aren’t worth our time”. NeuML sets out to<em> apply machine learning to solve everyday problems</em>. We’re interested in solving real problems!</p><p>While we’ve developed Subject Matter Expertise in the medical space, our techniques apply to almost any business area. <a href="https://cal.com/neuml/intro">Schedule a meeting</a> or <a href="mailto:info@neuml.com">send a message</a> to learn more.</p><h3>Rating our progress in 2025</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*YpqmSWlr5wRRXxrZ.png" /></figure><p>We’ve covered quite a lot of information recapping 2025. Next, let’s discuss how we stacked up against what we set out to do back in January. Each goal will be rated from 1–5 with 5 being the highest and 1 the lowest.</p><p>These were the goals set at the beginning of the year. Each goal is an abbreviated version from <a href="https://medium.com/neuml/neuml-2024-year-in-review-d446deaf5390">NeuML’s 2024 Year in Review</a> article.</p><h4>TxtAI 10K</h4><p><em>Surpass 10K stars on GitHub</em></p><p>⭐⭐⭐⭐⭐ (5 of 5)</p><p>🚀 Mission accomplished! We’re ending the year a little under 12K stars.</p><p>While the overall star growth was lower in 2025 vs previous years, this is mainly due to the lack of trending posts. Our efforts were focused elsewhere vs working to get txtai to trend on social.</p><p>With that being said, 10K stars is a great accomplishment and helps validate txtai as a major player in the space.</p><h4>NeuML as a leading voice in AI Community</h4><p><em>Be a vocal leader in the AI Community and a trusted voice</em></p><p>⭐⭐⭐⭐ (4 of 5)</p><p>NeuML strives to be a voice of reason in a space of unreasonable expectations that is AI. We’re measured and realistic. With that being said, we do believe in the promise of AI.</p><p>We’ve built a vibrant community of followers on <a href="https://www.linkedin.com/company/neuml/">LinkedIn</a>, <a href="https://x.com/neumll">X</a>, <a href="https://www.facebook.com/people/NeuML/100057403391445/">Facebook</a> and more recently <a href="https://www.reddit.com/r/txtai/">Reddit</a>. In late 2025, Reddit experienced a large surge in activity.</p><p>Below is where we stand as of late 2025.</p><p><strong>LinkedIn</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/564/1*7_mXkwJt6yXHv6anaZ2tgA.png" /><figcaption><a href="https://www.linkedin.com/company/neuml">https://www.linkedin.com/company/neuml</a></figcaption></figure><p><strong>X aka Twitter</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/590/1*StpIpUYmjRCaeJJbycRsSg.png" /><figcaption><a href="https://x.com/neumll">https://x.com/neumll</a></figcaption></figure><p><strong>Facebook</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/543/1*suM4kI4ZJHGxovMtUfxyyg.png" /><figcaption><a href="https://www.facebook.com/people/NeuML/100057403391445/">https://www.facebook.com/people/NeuML/100057403391445/</a></figcaption></figure><p><strong>Reddit</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/916/1*yegDmK5pVOd1_UJ1WfTIEw.png" /><figcaption><a href="https://reddit.com/r/txtai">https://reddit.com/r/txtai</a></figcaption></figure><p>While LinkedIn is our largest audience, Reddit is surging rapidly! We’ve added almost 800 members to <em>r/txtai</em> in the last month of 2025. If this pace holds, Reddit will overtake our other social platforms in 2026.</p><p>Overall, our engagement has been great. The only reason this isn’t a 5 is that we didn’t do in-person conferences or speaking engagements.</p><h4>Monetization of our place in the AI space</h4><p><em>Convert open source and open model work into revenue streams</em></p><p>⭐⭐⭐ (3 of 5)</p><p>While this data is not being shared publicly, we’re generally on the right track. It’s notoriously difficult to translate open source work into paying customer streams. Many open source companies build a large following and project only to find there isn’t a viable path to income. People like the project but aren’t going to pay for it.</p><p>Our current focus on consulting work has added value. We’ve also received 300+ submissions of interest for <a href="http://txtai.cloud">txtai.cloud</a>. So that is also an area to explore.</p><p>Custom models fine-tuned to specific business areas or tasks are also a growth area. Custom models are AI framework agnostic, which potentially could add customers who aren’t using our stack.</p><h4>Overall</h4><p><em>In 2025, the self-proclaimed score for NeuML is </em>🥁 🎶</p><p>⭐⭐⭐⭐ (12 of 15)</p><p>This averages out to a <strong>4 out of 5.</strong></p><p>NeuML has much to be proud of on this journey to date. Building an open source project with over 10K stars back in 2020 seemed unimaginable. Even a 1,000 stars seemed far fetched. That’s a result of an amazing amount of dedicated effort over a long period of time. But there is still much to do.</p><h3>Playbook for 2026</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*GtnMSso_q-eg_U-l.png" /></figure><p>Looking ahead to 2026, we’ll focus on the following areas.</p><h4>Monetization of our place in the AI space</h4><p>Normally we don’t like duplicating our goals year over year but this one deserves continued emphasis.</p><p>More customers, more projects, more revenue is the mantra here. Making it easier to engage with NeuML is also important. Not all consulting engagements have to be long lasting. We should ensure we have methods to make working with NeuML in a limited manner easy (i.e. a simple payment page).</p><p>We should also investigate non-consulting revenue streams such as deciding what to do with <a href="http://txtai.cloud">txtai.cloud</a>.</p><h4>Be a leader in the vector retrieval space</h4><p>txtai has it’s roots as a vector database and a retrieval platform. While it has many pipelines for AI orchestration, it’s built on a foundation of an embeddings database.</p><p>In 2026, we’ll work to grow our presence in the vector retrieval space. This is less about frameworks and more about developing models and techniques. Similar to our work with BERT Hash.</p><h4>Publish papers covering our work</h4><p>While we publish plenty of blog posts summarizing our work, in order to maximize visibility, we need to publish our work. We can start with submitting to pre-publication servers such as <a href="https://arxiv.org/">ArXiv</a>.</p><p>Even AI has <a href="https://gist.githubusercontent.com/davidmezzetti/153b016f5f97b7072d589ab3a138a077/raw/8ed38ae88b7f5dcc6cc73118828a0c01af636df0/txtai.pdf">written a paper covering our work</a> 😀</p><h3>Ways to find NeuML</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*2p4gTy22E4DE-a3S.png" /></figure><p>The full list of ways to interact with NeuML is shown is below.</p><p><strong>Contact us<br></strong><a href="https://neuml.com/">Website</a> | <a href="https://cal.com/neuml/intro">Meet</a> | <a href="mailto:info@neuml.com">Email</a> | <a href="https://join.slack.com/t/txtai/shared_invite/zt-37c1zfijp-Y57wMty6YOx_hyIHEQvQJA">Slack</a></p><p><strong>Code<br></strong><a href="https://github.com/neuml">GitHub</a> | <a href="https://hub.docker.com/u/neuml">Docker Hub</a> | <a href="https://hf.co/neuml">HF Spaces</a> | <a href="https://txtai.cloud">Cloud</a></p><p><strong>Social Media<br></strong><a href="https://www.linkedin.com/company/neuml">LinkedIn</a> | <a href="https://twitter.com/neumll">Twitter</a> | <a href="https://www.facebook.com/neuml-106140420955354">Facebook</a> | <a href="https://www.youtube.com/@neuml">YouTube</a> | <a href="https://reddit.com/r/txtai">Reddit</a></p><p><strong>Articles<br></strong><a href="https://medium.com/neuml">Medium</a> | <a href="https://neuml.hashnode.dev">Hashnode</a> | <a href="https://dev.to/neuml">dev.to</a> | <a href="https://neuml.substack.com/">Newsletter</a></p><p><strong>Consulting Support<br></strong>Need help with txtai? Struggling to build your own datasets to power AI systems? Want to train your own embeddings models? Need AI strategy support?</p><p><a href="https://cal.com/neuml/intro">Book an intro meeting</a> or <a href="mailto:info@neuml.com">email us</a> to discuss how NeuML can provide advisory support and/or development assistance.</p><h3>Wrapping up</h3><p>This article covered the state of NeuML at the end of 2025 and our plans for 2026. Thank you for reading. Please follow along and check in on how we’re doing over the course of 2026.</p><p><em>Interested in NeuML’s history? Then read the recaps from </em><a href="https://medium.com/neuml/being-thankful-in-2020-c69a0bc1f67e"><em>2020</em></a><em>, </em><a href="https://medium.com/neuml/neuml-2021-year-in-review-ef051fcdbbda"><em>2021</em></a><em>, </em><a href="https://medium.com/neuml/neuml-2022-year-in-review-b17787179b7e"><em>2022</em></a><em>, </em><a href="https://medium.com/neuml/neuml-2023-year-in-review-560457b97fdb"><em>2023</em></a><em> and </em><a href="https://medium.com/neuml/neuml-2024-year-in-review-d446deaf5390"><em>2024</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=019158d998f7" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/neuml-2025-year-in-review-019158d998f7">NeuML — 2025 Year in Review</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Training Tiny Language Models with Token Hashing]]></title>
            <link>https://medium.com/neuml/training-tiny-language-models-with-token-hashing-b744aa7eb931?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/b744aa7eb931</guid>
            <category><![CDATA[search]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[vector-database]]></category>
            <category><![CDATA[large-language-models]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Thu, 09 Oct 2025 17:42:50 GMT</pubDate>
            <atom:updated>2025-10-10T12:47:38.838Z</atom:updated>
            <content:encoded><![CDATA[<h4>Learn how a simple tweak can drastically reduce model sizes</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*_UNlOc5dk8y7JQwHUO8thg.png" /></figure><p>This article introduces the <a href="https://huggingface.co/collections/NeuML/bert-hash-nano-models-68e4101a43f17f06ef5198aa">BERT Hash series</a> of encoder models. Encoder models are the engine behind generating vector embeddings, text classification, entity extraction and more. The BERT Hash architecture is a simple tweak of the <a href="https://arxiv.org/abs/1810.04805">well known and legendary BERT model</a>.</p><p>The Generative AI era is all about building bigger models — more parameters, more GPUs and more training data. Bigger is better is the mantra. Few are exploring the other side of the spectrum, efficiency and doing more with less. Sure LLMs can do some of the same tasks an encoder model can do but the encoder model is often a better choice.</p><p>Let’s jump in.</p><h3>Challenges with Tiny Language Models</h3><p>Language models can be made small in a number of ways. The following is a non-exhaustive list.</p><ul><li>Changing the vocabulary size</li><li>Changing the number of hidden dimensions</li><li>Changing the number of attention heads or layers</li><li>Create a custom model architecture</li></ul><p>When models get into the single digit millions of parameters, the vast majority of the parameter count is dedicated to the token embeddings layer.</p><p>With <a href="https://huggingface.co/google/bert_uncased_L-2_H-128_A-2">BERT Tiny</a>, a 4.43M parameter model, 3.94M of the parameters are allocated to the token embeddings as shown below.</p><blockquote>30,522 Vocabulary Size * 128 Hidden Dimensions = 3,936,640 parameters</blockquote><p>The first step with BERT is running the tokens through a 30,522 x 128 embeddings matrix. These parameters are learned at training time and are known as the token embeddings.</p><p>The BERT Tiny model already has reduced the side of the hidden dimensions from 768 to 128 and the number of layers and attention heads from 12 each to 2. This is how the model size was reduced from 110M parameters to 4.43M.</p><p>Changing the vocabulary size is one way to reduce this. Let’s say we limit the vocabulary to 1000 tokens.</p><blockquote>1,000 Vocabulary Size * 128 Hidden Dimensions = 128,000 parameters</blockquote><p>The problem with this is that now we’ll generate many more tokens to represent the same text, which could in the end <em>increase </em>overall<em> </em>computation time.</p><p>What if the same vocabulary could be used but we can still reduce the number of parameters used by the token embeddings layer? Enter BERT Hash.</p><h3>BERT Hash Architecture</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/562/1*5H2FFLdBfc--72lQk2UVJQ.png" /></figure><p>BERT Hash is a very straightforward modification of the token embeddings layer of a BERT model. Instead of the embeddings layer mapping directly to the hidden size, a projection layer is added in.</p><p>Let’s say we use 16 projections with 128 hidden dimensions.</p><blockquote>30,522 Vocabulary size * 16 projections + 16 * 128 hidden = 490,400 parameters</blockquote><p>This change along brings parameter count for a BERT Tiny model down to 950K from 4.4M, only 22% of the original model size.</p><p>The code for this component is quite simple.</p><pre>from torch import nn<br><br>class BertHashTokens(nn.Module):<br>    def __init__(self, config):<br>        super().__init__()<br>        self.config = config<br><br>        # Token embeddings<br>        self.embeddings = nn.Embedding(<br>          config.vocab_size,<br>          config.projections,<br>          padding_idx=config.pad_token_id<br>        )<br><br>        # Token embeddings projections<br>        self.projections = nn.Linear(<br>           config.projections,<br>           config.hidden_size<br>        )<br><br>    def forward(self, input_ids):<br>        # Project embeddings to hidden size<br>        return self.projections(self.embeddings(input_ids))</pre><p>That’s it. Everything else stays the same! This method is inspired by <a href="https://arxiv.org/abs/2405.19504">MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings</a>.</p><h3>BERT Hash Nano Model Series</h3><p>Pre-trained models are available on the Hugging Face Model Hub for a number of configurations using this strategy.</p><p><a href="https://huggingface.co/collections/NeuML/bert-hash-nano-models-68e4101a43f17f06ef5198aa">BERT Hash Nano Models - a NeuML Collection</a></p><p>These models are pre-trained on the same training corpus as BERT (with a copy of Wikipedia from 2025) as recommended in the paper <a href="https://arxiv.org/abs/1908.08962">Well-Read Students Learn Better: On the Importance of Pre-training Compact Models</a>.</p><p>Below is a subset of GLUE scores for these models.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/849/1*HHPVwIrcxYcLRvjDpwLQyQ.png" /></figure><p>Note that the <em>nano</em> model has a small drop off from the original BERT Tiny model but remember it’s only 22% of the size.</p><p>The training scripts are available on the model pages. It’s extremely straightforward. See <a href="https://neuml.hashnode.dev/train-a-language-model-from-scratch">this article</a> for more on training language models from scratch.</p><p>ColBERT models are also trained on top of these models. See this collection for more.</p><p><a href="https://huggingface.co/collections/NeuML/colbert-68cb248ce424a6d6d8277451">ColBERT - a NeuML Collection</a></p><h3>Ideas for Future Work</h3><p>These models were trained on the standard BERT training dataset. But we can take advantage of all the great open datasets being released for LLM training.</p><p>The dataset below, fine web could be a great place to start for a general model.</p><p><a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb">HuggingFaceFW/fineweb · Datasets at Hugging Face</a></p><p>There is also a version of this with domain labels. This would enable training a tiny specific model on say the medical or sports domain.</p><p><a href="https://huggingface.co/datasets/m-a-p/FineFineWeb">m-a-p/FineFineWeb · Datasets at Hugging Face</a></p><p>While the method describe here is for an encoder model, the same idea could be explored for decoder / generative models. Perhaps it could be combined with the ideas from this paper to build a tiny reasoning LLM.</p><p><a href="https://arxiv.org/abs/2510.04871v1">Less is More: Recursive Reasoning with Tiny Networks</a></p><h3>Wrapping Up</h3><p>This article introduced the BERT Hash model series. It explores the often under-explored area of small models and getting more for less.</p><p>Let’s see what you can do with it!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b744aa7eb931" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/training-tiny-language-models-with-token-hashing-b744aa7eb931">Training Tiny Language Models with Token Hashing</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[ What’s new in txtai 9.0]]></title>
            <link>https://medium.com/neuml/whats-new-in-txtai-9-0-d522bb150afa?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/d522bb150afa</guid>
            <category><![CDATA[large-language-models]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[vector-database]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <category><![CDATA[agents]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Thu, 28 Aug 2025 17:16:17 GMT</pubDate>
            <atom:updated>2025-08-28T17:16:17.008Z</atom:updated>
            <content:encoded><![CDATA[<h4>SPLADE, ColBERT, MUVERA and Reranking pipelines</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*itzB2oPDkMj509N9b4f09Q.png" /></figure><p><a href="https://github.com/neuml/txtai">txtai</a> is an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.</p><p>The 9.0 release adds first class support for sparse vector models (i.e. <a href="https://en.wikipedia.org/wiki/Learned_sparse_retrieval">SPLADE</a>), late interaction models (i.e. <a href="https://huggingface.co/colbert-ir/colbertv2.0">ColBERT</a>), fixed dimensional encoding (i.e. <a href="https://arxiv.org/abs/2405.19504">MUVERA</a>) and reranking pipelines ✨</p><p>The embeddings framework was overhauled to seamlessly support both sparse and dense vector models. Previously, sparse vector support was limited to keyword/term indexes. Now learned sparse retrieval models such as SPLADE are supported. These models can help improve the accuracy of retrieval/search operations, which also improves RAG and Agents.</p><p>Support for late interaction models, such as ColBERT, were also added to the embeddings framework. Unlike traditional vector models that pool outputs into single vector outputs, late interaction models produce multiple vectors. These models are paired with the MUVERA algorithm to transform multiple vectors into fixed dimensional single vectors for search.</p><p>LLMs are quickly converging to produce similar outputs for similar inputs and becoming standard commodities. The retrieval or context layer makes or breaks projects. This is known as putting the R in RAG!</p><p><strong>Standard upgrade disclaimer below</strong></p><p>While everything is backwards compatible, it’s prudent to backup production indexes before upgrading and test before deploying.</p><h3>Install dependencies</h3><p>Install txtai and all dependencies.</p><pre>pip install txtai[ann,vectors]</pre><h3>Sparse vector indexes</h3><p>The first major change added with this release is learned sparse retrieval (aka sparse vector indexes) models. This effort was multi-faceted in that it required both changes to how vectors were generated as well as how they are stored.</p><p>txtai uses approximate nearest neighbor (ANN) search for it&#39;s vector search operations. The default library is <a href="https://github.com/facebookresearch/faiss">Faiss</a>. There is support for other libraries but in all cases the existing ANN backends only supported dense (i.e. NumPy) vectors.</p><p>There aren’t many options out there for sparse ANN search that supports txtai requirements, so IVFSparse was introduced. IVFSparse is an Inverted file (IVF) index with flat vector file storage and sparse array support. There is also support for storing sparse vectors in Postgres via <a href="https://github.com/pgvector/pgvector">pgvector</a>.</p><p>Let’s see it in action.</p><pre>from txtai import Embeddings<br><br># Works with a list, dataset or generator<br>data = [<br>  &quot;US tops 5 million confirmed virus cases&quot;,<br>  &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;,<br>  &quot;Beijing mobilises invasion craft along coast as Taiwan tensions escalate&quot;,<br>  &quot;The National Park Service warns against sacrificing slower friends in a bear attack&quot;,<br>  &quot;Maine man wins $1M from $25 lottery ticket&quot;,<br>  &quot;Make huge profits without work, earn up to $100,000 a day&quot;<br>]<br># Create an embeddings<br>embeddings = Embeddings(sparse=True, content=True)<br>embeddings.index(data)<br>embeddings.search(&quot;North America&quot;, 10)</pre><pre>[{&#39;id&#39;: &#39;0&#39;,<br>  &#39;text&#39;: &#39;US tops 5 million confirmed virus cases&#39;,<br>  &#39;score&#39;: 0.019873601198196412},<br> {&#39;id&#39;: &#39;1&#39;,<br>  &#39;text&#39;: &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;,<br>  &#39;score&#39;: 0.018737798929214476}]</pre><h3>Late interaction models</h3><p>Late interaction models encode data into multi-vector outputs. In other words, multiple input tokens map to multiple output vectors. Then at search time, the maximum similarity algorithm is used to find the best matches between the corpus and a query. This algorithm has achieved excellent results on retrieval benchmarks such as <a href="https://github.com/embeddings-benchmark/mteb">MTEB</a>.</p><p>The downside of this approach is that it produces multiple vectors as opposed a single vector for each input. For example, if a text element tokenizes to many input tokens, there will be many output vectors vs a single one as with standard pooled vector approaches.</p><p>Starting with the 9.0 release, late interaction models are supported with embeddings instances. Late interaction vectors will be transformed into fixed dimensional vectors using the MUVERA algorithm. See below.</p><pre>from txtai import Embeddings<br><br># Works with a list, dataset or generator<br>data = [<br>  &quot;US tops 5 million confirmed virus cases&quot;,<br>  &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;,<br>  &quot;Beijing mobilises invasion craft along coast as Taiwan tensions escalate&quot;,<br>  &quot;The National Park Service warns against sacrificing slower friends in a bear attack&quot;,<br>  &quot;Maine man wins $1M from $25 lottery ticket&quot;,<br>  &quot;Make huge profits without work, earn up to $100,000 a day&quot;<br>]<br># Create an embeddings<br>embeddings = Embeddings(path=&quot;colbert-ir/colbertv2.0&quot;, content=True)<br>embeddings.index(data)<br>embeddings.search(&quot;North America&quot;, 10)</pre><pre>[{&#39;id&#39;: &#39;0&#39;,<br>  &#39;text&#39;: &#39;US tops 5 million confirmed virus cases&#39;,<br>  &#39;score&#39;: 0.04216160625219345},<br> {&#39;id&#39;: &#39;1&#39;,<br>  &#39;text&#39;: &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;,<br>  &#39;score&#39;: 0.029944246634840965},<br> {&#39;id&#39;: &#39;3&#39;,<br>  &#39;text&#39;: &#39;The National Park Service warns against sacrificing slower friends in a bear attack&#39;,<br>  &#39;score&#39;: 0.015931561589241028}]</pre><h3>Reranking pipeline</h3><p>Another major new component in this release is the Reranker pipeline. This pipeline takes an embeddings instance, a similarity instance and uses the similarity instance to rerank outputs. This is a key component of the MUVERA paper — using the standard vector index to retrieve candidates then reranking the outputs using the late interaction model.</p><pre>from txtai import Embeddings<br>from txtai.pipeline import Reranker, Similarity<br><br># Works with a list, dataset or generator<br>data = [<br>  &quot;US tops 5 million confirmed virus cases&quot;,<br>  &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;,<br>  &quot;Beijing mobilises invasion craft along coast as Taiwan tensions escalate&quot;,<br>  &quot;The National Park Service warns against sacrificing slower friends in a bear attack&quot;,<br>  &quot;Maine man wins $1M from $25 lottery ticket&quot;,<br>  &quot;Make huge profits without work, earn up to $100,000 a day&quot;<br>]<br># Create an embeddings<br>embeddings = Embeddings(path=&quot;colbert-ir/colbertv2.0&quot;, content=True)<br>embeddings.index(data)<br>similarity = Similarity(path=&quot;colbert-ir/colbertv2.0&quot;, lateencode=True)<br>ranker = Reranker(embeddings, similarity)<br>ranker(&quot;North America&quot;)</pre><pre>[{&#39;id&#39;: &#39;1&#39;,<br>  &#39;text&#39;: &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;,<br>  &#39;score&#39;: 0.3324427008628845},<br> {&#39;id&#39;: &#39;0&#39;,<br>  &#39;text&#39;: &#39;US tops 5 million confirmed virus cases&#39;,<br>  &#39;score&#39;: 0.24423550069332123},<br> {&#39;id&#39;: &#39;3&#39;,<br>  &#39;text&#39;: &#39;The National Park Service warns against sacrificing slower friends in a bear attack&#39;,<br>  &#39;score&#39;: 0.16353240609169006}]</pre><p>Notice that while the outputs are the same, the scoring and order is different.</p><p>Let’s try a more interesting example.</p><pre>from txtai import Embeddings<br>from txtai.pipeline import Reranker, Similarity<br><br># Create an embeddings<br>embeddings = Embeddings()<br>embeddings.load(provider=&quot;huggingface-hub&quot;, container=&quot;neuml/txtai-wikipedia&quot;)<br>similarity = Similarity(path=&quot;colbert-ir/colbertv2.0&quot;, lateencode=True)<br>ranker = Reranker(embeddings, similarity)<br>ranker(&quot;Tell me about ChatGPT&quot;)</pre><pre>[{&#39;id&#39;: &#39;ChatGPT&#39;,<br>  &#39;text&#39;: &#39;ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other multimodal models to create human-like responses in text, speech, and images. It has access to features such as searching the web, using apps, and running programs. It is credited with accelerating the AI boom, an ongoing period of rapid investment in and public attention to the field of artificial intelligence (AI). Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.&#39;,<br>  &#39;score&#39;: 0.6639302968978882},<br> {&#39;id&#39;: &#39;ChatGPT Search&#39;,<br>  &#39;text&#39;: &#39;ChatGPT Search (originally SearchGPT) is a search engine developed by OpenAI. It combines traditional search engine features with generative pretrained transformers (GPT) to generate responses, including citations to external websites.&#39;,<br>  &#39;score&#39;: 0.6477508544921875},<br> {&#39;id&#39;: &#39;ChatGPT in education&#39;,<br>  &#39;text&#39;: &#39;The usage of ChatGPT in education has sparked considerable debate and exploration. ChatGPT is a chatbot based on large language models (LLMs) that was released by OpenAI in November 2022.&#39;,<br>  &#39;score&#39;: 0.5918337106704712}]</pre><h3>Wrapping up</h3><p>This article gave a quick overview of txtai 9.0. Updated documentation and more examples will be forthcoming. There is much to cover and much to build on!</p><p>See the following links for more information.</p><ul><li><a href="https://github.com/neuml/txtai/releases/tag/v9.0.0">9.0 Release on GitHub</a></li><li><a href="https://neuml.github.io/txtai">Documentation site</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d522bb150afa" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/whats-new-in-txtai-9-0-d522bb150afa">💡 What’s new in txtai 9.0</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Introducing txtai, the all-in-one AI framework]]></title>
            <link>https://medium.com/neuml/introducing-txtai-the-all-in-one-ai-framework-0660ecfc39d7?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/0660ecfc39d7</guid>
            <category><![CDATA[vector-database]]></category>
            <category><![CDATA[large-language-models]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <category><![CDATA[agents]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Wed, 23 Apr 2025 12:25:02 GMT</pubDate>
            <atom:updated>2025-12-15T18:22:46.644Z</atom:updated>
            <content:encoded><![CDATA[<h4>Build autonomous agents, retrieval augmented generation (RAG) processes, multi-model workflows and more</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FKTJPlgKN9BSGv30BmYbZw.png" /></figure><p><em>This is an updated version of the </em><a href="https://medium.com/neuml/introducing-txtai-the-all-in-one-embeddings-database-c721f4ff91ad"><em>original article</em></a><em>.</em></p><p>AI is rapidly evolving with a number of new developments. Large-scale generative language models are an exciting new capability allowing us to add amazing functionality. Innovation continues with new models and advancements coming in at what seems a weekly basis.</p><p>It’s hard to filter through the noise and know what is realistic today. While we’re not yet at full AI automation, there are plenty of ways to integrate AI into business workflows.</p><p>This article introduces txtai, an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.</p><h3>Introducing txtai</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cBxmgtJEZsgk2sbUAYyMOw.png" /></figure><p><a href="https://github.com/neuml/txtai">txtai</a> is an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.</p><p>The key component of txtai is an embeddings database, which is a union of vector indexes (sparse and dense), graph networks and relational databases.</p><p>This foundation enables vector search and/or serves as a powerful knowledge source for large language model (LLM) applications.</p><p>Build autonomous agents, retrieval augmented generation (RAG) processes, multi-model workflows and more.</p><p>The following is a summary of key features:</p><ul><li>🔎 Vector search with SQL, object storage, topic modeling, graph analysis and multimodal indexing</li><li>📄 Create embeddings for text, documents, audio, images and video</li><li>💡 Pipelines powered by language models that run LLM prompts, question-answering, labeling, transcription, translation, summarization and more</li><li>↪️️ Workflows to join pipelines together and aggregate business logic. txtai processes can be simple microservices or multi-model workflows.</li><li>🤖 Agents that intelligently connect embeddings, pipelines, workflows and other agents together to autonomously solve complex problems</li><li>⚙️ Web and Model Context Protocol (MCP) APIs. Bindings available for <a href="https://github.com/neuml/txtai.js">JavaScript</a>, <a href="https://github.com/neuml/txtai.java">Java</a>, <a href="https://github.com/neuml/txtai.rs">Rust</a> and <a href="https://github.com/neuml/txtai.go">Go</a>.</li><li>🔋 Batteries included with defaults to get up and running fast</li><li>☁️ Run local or scale out with container orchestration</li></ul><p>txtai is built with Python 3.10+, <a href="https://github.com/huggingface/transformers">Hugging Face Transformers</a>, <a href="https://github.com/UKPLab/sentence-transformers">Sentence Transformers</a> and <a href="https://github.com/tiangolo/fastapi">FastAPI</a>. txtai is open-source under an Apache 2.0 license.</p><blockquote><a href="https://neuml.com">NeuML</a> is the company behind txtai and we provide AI consulting services around our stack. <a href="https://cal.com/neuml/intro">Schedule a meeting</a> or <a href="mailto:info@neuml.com">send a message </a>to learn more.</blockquote><blockquote>We’re also building an easy and secure way to run hosted txtai applications with <a href="https://txtai.cloud">txtai.cloud</a>.</blockquote><h3>Install and run txtai</h3><p>txtai can be installed via <a href="https://neuml.github.io/txtai/install/">pip</a> or <a href="https://neuml.github.io/txtai/cloud/">Docker</a>. The following shows how to install via pip.</p><pre>pip install txtai</pre><h3>Semantic search</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*p_gInlWA9OTHoet3LKr2wA.png" /></figure><p>Embeddings databases are the engine that delivers semantic search. Data is transformed into embeddings vectors where similar concepts will produce similar vectors. Indexes both large and small are built with these vectors. The indexes are used to find results that have the same meaning, not necessarily the same keywords.</p><p>The basic use case for an embeddings database is building an approximate nearest neighbor (ANN) index for semantic search. The following example indexes a small number of text entries to demonstrate the value of semantic search.</p><pre>from txtai import Embeddings<br><br># Works with a list, dataset or generator<br>data = [<br>  &quot;US tops 5 million confirmed virus cases&quot;,<br>  &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;,<br>  &quot;Beijing mobilises invasion craft along coast as Taiwan tensions escalate&quot;,<br>  &quot;The National Park Service warns against sacrificing slower friends in a bear attack&quot;,<br>  &quot;Maine man wins $1M from $25 lottery ticket&quot;,<br>  &quot;Make huge profits without work, earn up to $100,000 a day&quot;<br>]<br><br># Create an embeddings<br>embeddings = Embeddings(path=&quot;sentence-transformers/nli-mpnet-base-v2&quot;)</pre><pre># Create an index for the list of text<br>embeddings.index(data)<br><br>print(&quot;%-20s %s&quot; % (&quot;Query&quot;, &quot;Best Match&quot;))<br>print(&quot;-&quot; * 50)<br><br># Run an embeddings search for each query<br>for query in (&quot;feel good story&quot;, &quot;climate change&quot;, <br>    &quot;public health story&quot;, &quot;war&quot;, &quot;wildlife&quot;, &quot;asia&quot;,<br>    &quot;lucky&quot;, &quot;dishonest junk&quot;):<br>  # Extract uid of first result<br>  # search result format: (uid, score)<br>  uid = embeddings.search(query, 1)[0][0]<br><br>  # Print text<br>  print(&quot;%-20s %s&quot; % (query, data[uid]))</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*Q_jg8PA8WQktAehN.png" /></figure><p>The example above shows that for all of the queries, the query text isn’t in the data. This is the true power of transformers models over token based search. What you get out of the box is 🔥🔥🔥!</p><h3>Updates and deletes</h3><p>Updates and deletes are supported for embeddings. The upsert operation will insert new data and update existing data</p><p>The following section runs a query, then updates a value changing the top result and finally deletes the updated value to revert back to the original query results.</p><pre># Run initial query<br>uid = embeddings.search(&quot;feel good story&quot;, 1)[0][0]<br>print(&quot;Initial: &quot;, data[uid])<br><br># Create a copy of data to modify<br>udata = data.copy()<br><br># Update data<br>udata[0] = &quot;See it: baby panda born&quot;<br>embeddings.upsert([(0, udata[0], None)])<br>uid = embeddings.search(&quot;feel good story&quot;, 1)[0][0]<br>print(&quot;After update: &quot;, udata[uid])<br><br># Remove record just added from index<br>embeddings.delete([0])<br><br># Ensure value matches previous value<br>uid = embeddings.search(&quot;feel good story&quot;, 1)[0][0]<br>print(&quot;After delete: &quot;, udata[uid])</pre><pre>Initial:  Maine man wins $1M from $25 lottery ticket<br>After update:  See it: baby panda born<br>After delete:  Maine man wins $1M from $25 lottery ticket</pre><h3>Persistence</h3><p>Embeddings can be saved to storage and reloaded.</p><pre>embeddings.save(&quot;index&quot;)<br><br>embeddings = Embeddings()<br>embeddings.load(&quot;index&quot;)<br><br>uid = embeddings.search(&quot;climate change&quot;, 1)[0][0]<br>print(data[uid])</pre><pre>Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a<br>Manhattan-sized iceberg</pre><h3>Hybrid search</h3><p>While dense vector indexes are by far the best option for semantic search systems, sparse keyword indexes can still add value. There may be cases where finding an exact match is important.</p><p>Hybrid search combines the results from sparse and dense vector indexes for the best of both worlds.</p><pre># Create an embeddings<br>embeddings = Embeddings(<br>  hybrid=True,<br>  path=&quot;sentence-transformers/nli-mpnet-base-v2&quot;<br>)<br><br># Create an index for the list of text<br>embeddings.index(data)<br><br>print(&quot;%-20s %s&quot; % (&quot;Query&quot;, &quot;Best Match&quot;))<br>print(&quot;-&quot; * 50)<br><br># Run an embeddings search for each query<br>for query in (&quot;feel good story&quot;, &quot;climate change&quot;, <br>    &quot;public health story&quot;, &quot;war&quot;, &quot;wildlife&quot;, &quot;asia&quot;,<br>    &quot;lucky&quot;, &quot;dishonest junk&quot;):<br>  # Extract uid of first result<br>  # search result format: (uid, score)<br>  uid = embeddings.search(query, 1)[0][0]<br><br>  # Print text<br>  print(&quot;%-20s %s&quot; % (query, data[uid]))</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*Q_jg8PA8WQktAehN.png" /></figure><p>Same results as with semantic search. Let’s run the same example with just a keyword index to view those results.</p><pre># Create an embeddings<br>embeddings = Embeddings(keyword=True)<br><br># Create an index for the list of text<br>embeddings.index(data)<br><br>print(embeddings.search(&quot;feel good story&quot;))<br>print(embeddings.search(&quot;lottery&quot;))</pre><pre>[]<br>[(4, 0.5234998733628726)]</pre><p>See that when the embeddings instance only uses a keyword index, it can’t find semantic matches, only keyword matches.</p><h3>Content storage</h3><p>Up to this point, all the examples are referencing the original data array to retrieve the input text. This works fine for a demo but what if you have millions of documents? In this case, the text needs to be retrieved from an external datastore using the id.</p><p>Content storage adds an associated database (i.e. SQLite, DuckDB) that stores associated metadata with the vector index. The document text, additional metadata and additional objects can be stored and retrieved right alongside the indexed vectors.</p><pre># Create embeddings with content enabled.<br># The default behavior is to only store indexed vectors.<br>embeddings = Embeddings(<br>  path=&quot;sentence-transformers/nli-mpnet-base-v2&quot;,<br>  content=True,<br>  objects=True<br>)<br><br># Create an index for the list of text<br>embeddings.index(data)<br><br>print(embeddings.search(&quot;feel good story&quot;, 1)[0][&quot;text&quot;])</pre><pre>Maine man wins $1M from $25 lottery ticket</pre><p>The only change above is setting the <em>content</em> flag to True. This enables storing text and metadata content (if provided) alongside the index. Note how the text is pulled right from the query result!</p><p>Let’s add some metadata.</p><h3>Query with SQL</h3><p>When content is enabled, the entire dictionary is stored and can be queried. In addition to vector queries, txtai accepts SQL queries. This enables combined queries using both a vector index and content stored in a database backend.</p><pre># Create an index for the list of text<br>embeddings.index([{&quot;text&quot;: text, &quot;length&quot;: len(text)} for text in data])<br><br># Filter by score<br>print(embeddings.search(&quot;select text, score from txtai where similar(&#39;hiking danger&#39;) and score &gt;= 0.15&quot;))<br><br># Filter by metadata field &#39;length&#39;<br>print(embeddings.search(&quot;select text, length, score from txtai where similar(&#39;feel good story&#39;) and score &gt;= 0.05 and length &gt;= 40&quot;))<br><br># Run aggregate queries<br>print(embeddings.search(&quot;select count(*), min(length), max(length), sum(length) from txtai&quot;))</pre><pre>[{&#39;text&#39;: &#39;The National Park Service warns against sacrificing slower friends in a bear attack&#39;, &#39;score&#39;: 0.3151373863220215}]<br>[{&#39;text&#39;: &#39;Maine man wins $1M from $25 lottery ticket&#39;, &#39;length&#39;: 42, &#39;score&#39;: 0.08329027891159058}]<br>[{&#39;count(*)&#39;: 6, &#39;min(length)&#39;: 39, &#39;max(length)&#39;: 94, &#39;sum(length)&#39;: 387}]</pre><p>This example above adds a simple additional field, text length.</p><p>Note the second query is filtering on the metadata field length along with a similar query clause. This gives a great blend of vector search with traditional filtering to help identify the best results.</p><h3>Object storage</h3><p>In addition to metadata, binary content can also be associated with documents. The example below downloads an image, upserts it along with associated text into the embeddings index.</p><pre>import urllib<br><br>from IPython.display import Image<br><br># Get an image<br>request = urllib.request.urlopen(&quot;https://raw.githubusercontent.com/neuml/txtai/master/demo.gif&quot;)<br><br># Upsert new record having both text and an object<br>embeddings.upsert([(&quot;txtai&quot;, {&quot;text&quot;: &quot;txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.&quot;, &quot;object&quot;: request.read()}, None)])<br><br># Query txtai for the most similar result to &quot;machine learning&quot; and get associated object<br>result = embeddings.search(&quot;select object from txtai where similar(&#39;machine learning&#39;) limit 1&quot;)[0][&quot;object&quot;]<br><br># Display image<br>Image(result.getvalue(), width=600)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*K4Lxy5G5uwOKG9Ep.gif" /></figure><h3>Topic modeling</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6zQ9_AO4Vh5fqeFEXrzu_A.png" /></figure><p>Topic modeling is enabled via semantic graphs. Semantic graphs, also known as knowledge graphs or semantic networks, build a graph network with semantic relationships connecting the nodes. In txtai, they can take advantage of the relationships inherently learned within an embeddings index.</p><pre># Create embeddings with a graph index<br>embeddings = Embeddings(<br>  path=&quot;sentence-transformers/nli-mpnet-base-v2&quot;,<br>  content=True,<br>  functions=[<br>    {&quot;name&quot;: &quot;graph&quot;, &quot;function&quot;: &quot;graph.attribute&quot;},<br>  ],<br>  expressions=[<br>    {&quot;name&quot;: &quot;category&quot;, &quot;expression&quot;: &quot;graph(indexid, &#39;category&#39;)&quot;},<br>    {&quot;name&quot;: &quot;topic&quot;, &quot;expression&quot;: &quot;graph(indexid, &#39;topic&#39;)&quot;},<br>  ],<br>  graph={<br>    &quot;topics&quot;: {<br>      &quot;categories&quot;: [&quot;health&quot;, &quot;climate&quot;, &quot;finance&quot;, &quot;world politics&quot;]<br>    }<br>  }<br>)<br><br>embeddings.index(data)<br>embeddings.search(&quot;select topic, category, text from txtai&quot;)</pre><pre>[{&#39;topic&#39;: &#39;confirmed_cases_us_5&#39;,<br>  &#39;category&#39;: &#39;health&#39;,<br>  &#39;text&#39;: &#39;US tops 5 million confirmed virus cases&#39;},<br> {&#39;topic&#39;: &#39;collapsed_iceberg_ice_intact&#39;,<br>  &#39;category&#39;: &#39;climate&#39;,<br>  &#39;text&#39;: &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;},<br> {&#39;topic&#39;: &#39;beijing_along_craft_tensions&#39;,<br>  &#39;category&#39;: &#39;world politics&#39;,<br>  &#39;text&#39;: &#39;Beijing mobilises invasion craft along coast as Taiwan tensions escalate&#39;}]</pre><p>When a graph index is enabled, topics are assigned to each of the entries in the embeddings instance. Topics are dynamically created using a sparse index over graph nodes grouped by <a href="https://en.wikipedia.org/wiki/Community_structure">community detection algorithms</a>.</p><p>Topic categories are also be derived as shown above.</p><h3>Subindexes</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/925/1*Lu6ZJpKBoaoI-njFgFW-og.png" /></figure><p>Subindexes can be configured for an embeddings. A single embeddings instance can have multiple subindexes each with different configurations.</p><p>We’ll build an embeddings index having both a keyword and dense index to demonstrate.</p><pre># Create embeddings with subindexes<br>embeddings = Embeddings(<br>  content=True,<br>  defaults=False,<br>  indexes={<br>    &quot;keyword&quot;: {<br>      &quot;keyword&quot;: True<br>    },<br>    &quot;dense&quot;: {<br>      &quot;path&quot;: &quot;sentence-transformers/nli-mpnet-base-v2&quot;<br>    }<br>  }<br>)<br>embeddings.index(data)</pre><pre>embeddings.search(&quot;feel good story&quot;, limit=1, index=&quot;keyword&quot;)</pre><pre>[]</pre><pre>embeddings.search(&quot;feel good story&quot;, limit=1, index=&quot;dense&quot;)</pre><pre>[{&#39;id&#39;: &#39;4&#39;,<br>  &#39;text&#39;: &#39;Maine man wins $1M from $25 lottery ticket&#39;,<br>  &#39;score&#39;: 0.08329027891159058}]</pre><p>Once again, this example demonstrates the difference between keyword and semantic search. The first search call uses the defined keyword index, the second uses the dense vector index.</p><h3>LLM orchestration</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/911/1*QqGBVPHZirxvBEYy5-qoSg.png" /></figure><p>txtai is an all-in-one AI framework. txtai supports building autonomous agents, retrieval augmented generation (RAG), chat with your data, pipelines and workflows that interface with large language models (LLMs).</p><p>The <a href="https://neuml.github.io/txtai/pipeline/llm/rag/">RAG pipeline</a> is txtai’s spin on retrieval augmented generation (RAG). This pipeline extracts knowledge from content by joining a prompt, context data store and generative model together.</p><p>The following example shows how a large language model (LLM) can use an embeddings database for context.</p><pre>from txtai import RAG<br><br># Create embeddings<br>embeddings = Embeddings(path=&quot;sentence-transformers/nli-mpnet-base-v2&quot;, content=True, autoid=&quot;uuid5&quot;)<br><br># Create an index for the list of text<br>embeddings.index(data)<br><br># RAG Prompt Template<br>template = &quot;&quot;&quot;<br>  Answer the following question using the provided context.<br><br>  Question:<br>  {question}<br><br>  Context:<br>  {context}<br>&quot;&quot;&quot;<br><br># Create and run RAG instance<br>rag = RAG(embeddings, &quot;Qwen/Qwen3-0.6B&quot;, template=template, output=&quot;reference&quot;)<br>rag(&quot;What country is having issues with climate change?&quot;)</pre><pre>{&#39;answer&#39;: &#39;Canada is having issues with climate change.&#39;,<br> &#39;reference&#39;: &#39;da633124-33ff-58d6-8ecb-14f7a44c042a&#39;}</pre><p>The logic above first builds an embeddings index. It then loads a LLM and uses the embeddings index to drive a LLM prompt.</p><p>The RAG pipeline can optionally return a reference to the id of the best matching record with the answer. That id can be used to resolve the full answer reference. Note that the embeddings above used an <a href="https://neuml.github.io/txtai/embeddings/configuration/general/#autoid">uuid autosequence</a>.</p><pre>uid = rag(prompt(&quot;What country is having issues with climate change?&quot;))[&quot;reference&quot;]<br>embeddings.search(f&quot;select id, text from txtai where id = &#39;{uid}&#39;&quot;)</pre><pre>[{&#39;id&#39;: &#39;da633124-33ff-58d6-8ecb-14f7a44c042a&#39;,<br>  &#39;text&#39;: &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;}]</pre><p>LLM inference can also be run standalone.</p><pre>from txtai import LLM<br><br># Default LLM is granite-4.0-350m<br># Supports any LLM (Hugging Face, llama.cpp, Ollama, vLLM, OpenAI, Claude etc)<br># See https://neuml.github.io/txtai/pipeline/llm/llm<br>llm = LLM()<br>llm(&quot;Say the name of 1 place to visit in Washington, DC&quot;)</pre><pre>One of the most popular and iconic places to visit in Washington, DC is the National Mall.</pre><h3>Language model workflows</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*p8tO__oIOutrVd3n0oV17A.png" /></figure><p>Language model workflows, also known as semantic workflows, connect language models together to build intelligent applications.</p><p>Workflows can run right alongside an embeddings instance, similar to a stored procedure in a relational database. Workflows can be written in either Python or YAML. We’ll demonstrate how to write a workflow with YAML.</p><pre># Embeddings instance<br>writable: true<br>embeddings:<br>  path: sentence-transformers/nli-mpnet-base-v2<br>  content: true<br>  functions:<br>    - {name: translation, argcount: 2, function: translation}<br><br># Translation pipeline<br>translation:<br><br># Workflow definitions<br>workflow:<br>  search:<br>    tasks:<br>      - search<br>      - action: translation<br>        args:<br>          target: fr<br>        task: template<br>        template: &quot;{text}&quot;</pre><p>The workflow above loads an embeddings index and defines a search workflow. The search workflow runs a search and then passes the results to a translation pipeline. The translation pipeline translates results to French.</p><pre>from txtai import Application<br><br># Build index<br>app = Application(&quot;embeddings.yml&quot;)<br>app.add(data)<br>app.index()<br><br># Run workflow<br>list(app.workflow(<br>  &quot;search&quot;, <br>  [&quot;select text from txtai where similar(&#39;feel good story&#39;) limit 1&quot;]<br>))</pre><pre>[&#39;Maine homme gagne $1M à partir de $25 billet de loterie&#39;]</pre><p>SQL functions, in some cases, can accomplish the same thing as a workflow. The function below runs the translation pipeline as a function.</p><pre>app.search(&quot;select translation(text, &#39;fr&#39;) text from txtai where similar(&#39;feel good story&#39;) limit 1&quot;)</pre><pre>[{&#39;text&#39;: &#39;Maine homme gagne $1M à partir de $25 billet de loterie&#39;}]</pre><p>LLM chains with templates are also possible with workflows. Workflows are self-contained, they operate both with and without an associated embeddings instance. The following workflow uses a LLM to conditionally translate text to French and then detect the language of the text.</p><pre>llm:<br>  path: Qwen/Qwen3-4B-Instruct-2507<br><br>workflow:<br>  chain:<br>    tasks:<br>      - task: template<br>        template: Translate text &#39;{statement}&#39; to {language} if the text is English, otherwise keep the original text<br>        action: llm<br>      - task: template<br>        template: What language is the following text. Only print the answer? {text}<br>        action: llm</pre><pre>inputs = [<br>  {&quot;statement&quot;: &quot;Hello, how are you&quot;, &quot;language&quot;: &quot;French&quot;},<br>  {&quot;statement&quot;: &quot;Hallo, wie geht&#39;s dir&quot;, &quot;language&quot;: &quot;French&quot;}<br>]</pre><pre>app = Application(&quot;workflow.yml&quot;)<br>list(app.workflow(&quot;chain&quot;, inputs))</pre><pre>[&#39;French&#39;, &#39;German&#39;]</pre><h3>Wrapping up</h3><p>AI is advancing at a rapid pace. Things not possible even a year ago are now possible. This article introduced txtai, an all-in-one AI framework. The possibilities are limitless and we’re excited to see what can be built on top of txtai!</p><p>Visit the links below for more.</p><p><a href="https://github.com/neuml/txtai">GitHub</a> | <a href="https://neuml.github.io/txtai">Documentation</a> | <a href="https://neuml.github.io/txtai/examples/">Examples</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=0660ecfc39d7" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/introducing-txtai-the-all-in-one-ai-framework-0660ecfc39d7">Introducing txtai, the all-in-one AI framework</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to succeed with AI Agents — it starts with your data]]></title>
            <link>https://medium.com/neuml/ai-agents-how-to-be-successful-e8087b35f90d?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/e8087b35f90d</guid>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[agents]]></category>
            <category><![CDATA[large-language-models]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <category><![CDATA[generative-ai-tools]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Fri, 18 Apr 2025 17:03:24 GMT</pubDate>
            <atom:updated>2025-04-18T17:14:35.073Z</atom:updated>
            <content:encoded><![CDATA[<h4>Let’s talk about the 🔥 topic, what’s next and how to win in 2025</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rMREysq2b1d1UcEQONo25A.png" /></figure><p>We’re about a third of the way through 2025 (at the time of this article). One topic we can’t hide from is “AI Agents”. It’s an overused term that means many things to many people.</p><p>For the purpose of this article, we’ll define an AI Agent as:</p><ul><li>Connects to a Large Language Model (LLM)</li><li>Has access to a list of tools and/or other agents</li><li>Breaks a task into an iterative series of steps, looping until completion</li></ul><p>Anyone can come up with a simple demo that shows how a build a trip booking assistant, a web research agent or coding assistant. The real power for enterprise businesses is the ability to connect an agent with internal data and knowledge.</p><p>In this article, we’ll step through the main components of an AI Agent, talk about where this is all heading and how to be successful.</p><h3>Large Language Models (LLMs)</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/781/1*Gg4S7CgKUw-5SYaKJckoUg.png" /></figure><p>LLMs are the engine behind most AI Agents. The LLM takes a generated prompt with data, tools and a user query then generates actions.</p><p>An AI Agent shouldn’t be hardwired to a specific LLM. It should be easy to switch between LLM providers. An AI Agent shouldn’t care if the LLM is local or run via an API.</p><p>So while it’s easy to just use the OpenAI Python library, it’s best to use a provider that abstracts LLM inference. Examples of this are <a href="https://github.com/neuml/txtai">txtai</a> (a library built by NeuML), <a href="https://www.langchain.com/">LangChain</a>, <a href="https://www.llamaindex.ai/">Llama Index</a> and <a href="https://www.litellm.ai/">LiteLLM</a>.</p><p>In the case of txtai agents, the LLM provider is automatically inferred.</p><pre>from txtai import Agent<br><br># Local Transformers model<br>agent = Agent(model=&quot;meta-llama/Meta-Llama-3.1-8B-Instruct&quot;)<br><br># LLM APIs - must also set API key via environment variable<br>agent = Agent(model=&quot;gpt-4o&quot;)<br>agent = Agent(model=&quot;claude-3-5-sonnet-20240620&quot;)</pre><h3>Access to tools and data</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/910/1*fLSmBgUvZuIp77wwvOzujQ.png" /></figure><p>In the world of AI Agents, tools are how external knowledge and data is integrated. A tool could be a web search, vector search, API call or another agent.</p><p>For an enterprise business, this is by far the most crucial step of the process. Internal data needs to be made accessible to the AI Agent. There are number of ways to do this. One successful pattern is building compendiums of knowledge that helps the AI Agent access the right data more quickly.</p><p>With txtai, internal functions and embeddings databases are supported. Services available via the <a href="https://modelcontextprotocol.io/introduction">Model Context Protocol (MCP)</a> are also supported.</p><p>What exactly is a compendium of knowledge? Here are a few examples:</p><ul><li><a href="https://huggingface.co/NeuML/txtai-wikipedia">NeuML/txtai-wikipedia · Hugging Face</a></li><li><a href="https://huggingface.co/NeuML/txtai-arxiv">NeuML/txtai-arxiv · Hugging Face</a></li><li><a href="https://huggingface.co/NeuML/txtai-neuml-linkedin">NeuML/txtai-neuml-linkedin · Hugging Face</a></li><li><a href="https://huggingface.co/NeuML/txtai-astronomy">NeuML/txtai-astronomy · Hugging Face</a></li></ul><p>Each of these datasets are preprocessed and summarized. For example, the Wikipedia embeddings database stores knowledge-dense abstracts for every Wikipedia article. The same for the ArXiv dataset. The LinkedIn dataset is a list of all NeuML LinkedIn posts and the Astronomy dataset is a curated dataset joining astronomy datasets together.</p><p>The goal here is the make it easier for the Agent to reach a final answer. This reduces the overall latency and cost as it lowers the number of LLM calls.</p><h3>Where is this all heading?</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/811/1*4IN9CcqHZuDBJfZcua05bg.png" /></figure><p>The name of the game is augmentation. The end goal of AI Agents is automating enterprise tasks and augmenting humans.</p><p>Many talk about AI Agents as replacements for humans. That conversation is a moral and ethical one. But at the time of this article, we’re still a ways away from that even for those who want it.</p><p>In 2025, there are a number of tangible and realistic ways AI Agents can help your business.</p><ul><li><em>Augmented data analysis</em> — Enable your analysts to go through and understand data more quickly.</li><li><em>Research Agents</em> — Domain-specific agents that can provide an initial automated analysis of datasets. Example domains include medical, scientific, legal and national security.</li><li><em>Software issue triage </em>— Task agents to analyze software bug reports and provide an initial triage.</li></ul><p>These are all examples of augmenting your workforce. It’s about taking an individual, giving them new capabilities and freeing them up to focus on other matters and do more.</p><h3>How to be successful</h3><p>A successful strategy involves selecting the right tools, enabling Agents to have access to the best and most concise data and setting realistic expectations.</p><p>If one expects AI Agents to fully replace a development team, heartache is ahead. Coding assistants certainly can augment humans but there are a host of issues as of 2025 regarding the reliability, security and effectiveness of code generated by AI.</p><p>If one expects AI Agents to replace your team of data analysts, it likely doesn’t end well.</p><p>If one wants to use AI Agents to augment an existing team and perhaps enable a small team to punch above their weight class, now we’re talking. There is much to gain with a strategy like this. This strategy revolves around preparing your data for success with AI Agents and using the right toolkit.</p><p>There is much to do and many gains ahead. If you set yourself on the right path and have the right mindset, you’ll be ahead of the curve. Good luck!</p><p><em>Want help with your AI Strategy? Need development guidance and assistance with txtai? Then </em><a href="https://cal.com/neuml/intro"><em>book a meeting</em></a><em> or </em><a href="mailto:info@neuml.com"><em>email us</em></a><em> to hear more!</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e8087b35f90d" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/ai-agents-how-to-be-successful-e8087b35f90d">How to succeed with AI Agents — it starts with your data</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[NeuML — 2024 Year in Review]]></title>
            <link>https://medium.com/neuml/neuml-2024-year-in-review-d446deaf5390?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/d446deaf5390</guid>
            <category><![CDATA[year-in-review]]></category>
            <category><![CDATA[2024]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Wed, 01 Jan 2025 20:36:47 GMT</pubDate>
            <atom:updated>2025-01-01T20:36:47.138Z</atom:updated>
            <content:encoded><![CDATA[<h3>NeuML — 2024 Year in Review</h3><h4>Recapping 2024 and looking ahead to 2025</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*5slMdbkyCiWyyIdI.png" /></figure><p><a href="https://neuml.com">NeuML</a> is the company behind txtai, an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. We are building a suite of applications to bridge the gap between research and production.</p><p>NeuML continued to build on it’s strong open-source foundation in 2024. The majority of our focus throughout the year was on txtai and our consulting efforts. This article will recap the progress made in 2024 and look ahead to 2025.</p><h3>TxtAI</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*9CZnd9zxlez98nNF.png" /></figure><p><a href="https://github.com/neuml/txtai">txtai</a> is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. This is the foundational piece of software that all of our work stands on.</p><p>Highlights for txtai in 2024:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KwcNpnEg4f5WFX7HN5nHFA.png" /><figcaption><a href="https://star-history.com/#neuml/txtai&amp;Date">https://star-history.com/#neuml/txtai&amp;Date</a></figcaption></figure><ul><li>⭐<strong>4,043</strong> stars on GitHub to bring the total to ⭐<strong>9,833</strong></li><li><strong>284</strong> total commits on GitHub</li><li><strong>232</strong> total issues resolved on GitHub</li><li><strong>10</strong> releases. Entered the year at v6.2.0 and finished at v8.1.0</li><li><strong>16</strong> articles and example notebooks added</li></ul><p>Let’s recap the major functionality added.</p><h4>New in 2024</h4><p>In 2024, txtai had two major releases, 7.0 and 8.0.</p><p><a href="https://medium.com/neuml/whats-new-in-txtai-7-0-855ad6a55440">💡 What’s new in txtai 7.0</a></p><p>txtai 7.0 was released in <strong>February 2024</strong>. This release added GraphRAG, LoRA / QLoRA training support, an improved embeddings storage format when content is disabled and binary content support via the API.</p><p><a href="https://medium.com/neuml/whats-new-in-txtai-8-0-2d7d0ab4506b">💡 What’s new in txtai 8.0</a></p><p>txtai 8.0 was released in <strong>November 2024</strong>. This release added Agents and Model2Vec support.</p><p>Below is a list of all the major features added in 2024.</p><ul><li><a href="https://neuml.hashnode.dev/whats-new-in-txtai-80">Agents</a></li><li><a href="https://neuml.hashnode.dev/advanced-rag-with-graph-path-traversal">Graph RAG</a></li><li><a href="https://neuml.hashnode.dev/speech-to-speech-rag">Speech to Speech RAG</a></li><li><a href="https://neuml.hashnode.dev/integrate-txtai-with-postgres">PGVector and Postgres persistence</a></li><li><a href="https://neuml.hashnode.dev/rag-with-llamacpp-and-external-api-services">LLM backends for llama.cpp and LLM API services (i.e. OpenAI, Claude, AWS Bedrock etc)</a></li><li><a href="https://neuml.hashnode.dev/speech-to-speech-rag">Streaming LLM generation</a></li><li><a href="https://gist.github.com/davidmezzetti/235be648308f2f151d5224fc709c2da8">New Textractor integration with Docling and improved Markdown support</a></li><li><a href="https://neuml.hashnode.dev/whats-new-in-txtai-80#heading-vectorization-with-model2vec">Model2Vec, llama.cpp and LLM API embeddings vectorization</a></li></ul><p>It’s hard to believe none of this existed on Jan 1, 2024! Let’s briefly cover each of these new additions.</p><p><strong>Agents </strong><em>(</em><a href="https://github.com/neuml/txtai/releases/tag/v8.0.0"><em>Added in 8.0</em></a><em>)</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*UM3kk2pPD8sps_7I" /></figure><p>Agents automatically create workflows to answer multi-faceted user requests. Agents iteratively prompt and/or interface with tools to step through a process and ultimately come to an answer for a request.</p><p>The following example articles show how txtai uses agents to iteratively solve complex multi-hop problems.</p><ul><li><a href="https://neuml.hashnode.dev/analyzing-hugging-face-posts-with-graphs-and-agents">Analyzing Hugging Face Posts with Graphs and Agents</a></li><li><a href="https://neuml.hashnode.dev/granting-autonomy-to-agents">Granting autonomy to agents</a></li></ul><p><strong>GraphRAG </strong><em>(</em><a href="https://github.com/neuml/txtai/releases/tag/v7.0.0"><em>Added in 7.0</em></a><em>)</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*cK847Zy-OUFrrLA1" /></figure><p>Graph path traversal opens up a different type of RAG process. A standard RAG process typically runs a single vector search query and returns the closest matches. Those matches are then passed into a LLM prompt and used to limit the context and help ensure more factually correct answers are generated. Graphs enable more complex analysis.</p><p><a href="https://neuml.hashnode.dev/advanced-rag-with-graph-path-traversal">Advanced RAG with graph path traversal</a></p><p><strong>Speech to Speech RAG </strong><em>(</em><a href="https://github.com/neuml/txtai/releases/tag/v7.5.0"><em>Added in 7.5</em></a><em>)</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*q8bcFus4nTur-ES5" /></figure><p>A Speech to Speech (S2S) RAG workflow starts with a microphone pipeline, which streams and processes input audio. The microphone pipeline has voice activity detection (VAD) built-in. When speech is detected, the pipeline returns the captured audio data. Next, the speech is transcribed to text and then passed to a RAG pipeline prompt. Finally, the RAG result is run through a text to speech (TTS) pipeline and streamed to an output audio device.</p><p><a href="https://neuml.hashnode.dev/speech-to-speech-rag">Speech to Speech RAG</a></p><p><strong>PGVector and Postgres persistence </strong><em>(</em><a href="https://github.com/neuml/txtai/releases/tag/v7.2.0"><em>Added in 7.2</em></a><em>)</em></p><p>txtai can now integrate with <a href="https://www.postgresql.org/">Postgres</a>, a powerful, production-ready and open source object-relational database system. All major components can be stored in Postgres: vectors, content and graphs.</p><p><a href="https://neuml.hashnode.dev/integrate-txtai-with-postgres">Integrate txtai with Postgres</a></p><p><strong>LLM backends for llama.cpp and LLM API services</strong> (<a href="https://github.com/neuml/txtai/releases/tag/v6.3.0">Added in 6.3</a>)</p><p>txtai has been and always will be a local-first framework. It was originally designed to run models on local hardware using Hugging Face Transformers. As the AI space has evolved over the last year, so has txtai. Recent changes have added the ability to use these frameworks for vectorization and made it easier to use for LLM inference.</p><p><a href="https://neuml.hashnode.dev/rag-with-llamacpp-and-external-api-services">RAG with llama.cpp and external API services</a></p><p><strong>Streaming LLM generation</strong><em> (</em><a href="https://github.com/neuml/txtai/releases/tag/v7.3.0"><em>Added in 7.3</em></a><em>)</em></p><p>Prior to this change, all LLM inference calls had to fully wait for the entire LLM response. Streaming generation enables getting results token by token, which reduces the perceived response time to a user.</p><p>The Speech to Speech RAG workflow chains a number of streaming pipelines together. See below.</p><p><a href="https://neuml.hashnode.dev/speech-to-speech-rag#heading-s2s-workflow-in-yaml">Speech to Speech RAG</a></p><p><strong>Textractor integration with Docling </strong><em>(</em><a href="https://github.com/neuml/txtai/releases/tag/v8.1.0"><em>Added in 8.1</em></a><em>)</em></p><p>Up until v8.1, txtai only supported text extraction via <a href="https://tika.apache.org/">Apache Tika</a>. While Apache Tika is a battle-tested project, it depends on Java. This has proven to be problematic for some integrations. Additionally, it doesn’t have support for complex PDF elements such as tables.</p><p><a href="https://github.com/DS4SD/docling">Docling</a> is a new open-source text extraction library that gained popularity in late 2024. It has impressive support for complex PDFs (supports tables, formatting, sections).</p><p>See <a href="https://gist.github.com/davidmezzetti/235be648308f2f151d5224fc709c2da8">this link</a> for an in-depth review.</p><p><strong>Model2Vec </strong><em>(</em><a href="https://github.com/neuml/txtai/releases/tag/v8.0.0"><em>Added in 8.0</em></a><em>)</em></p><p><a href="https://github.com/MinishLab/model2vec">Model2Vec</a> is a technique to turn any sentence transformer into a really small static model, reducing model size by 15x and making the models up to 500x faster, with a small drop in performance.</p><p>We’ve long had a <a href="https://neuml.hashnode.dev/train-a-language-model-from-scratch">goal to build micromodels</a>. Model2Vec is one way we’re on this path. We’re also planning to release an exciting Model2Vec model based on work in late 2024 in early 2025. Stay tuned!</p><p><a href="https://neuml.hashnode.dev/whats-new-in-txtai-80#heading-vectorization-with-model2vec">💡 What&#39;s new in txtai 8.0</a></p><p>Phew — that was a lot! Let’s now take a look at other projects.</p><h3>Other Projects</h3><p>In addition to txtai, a number of subprojects have been created over the years. The strategy with each of these projects is to build an initial implementation and support future work based on interest. This is an evolving target, some projects fade away and then even come back!</p><p>The following sections cover the major “other project” initiatives in 2024.</p><h4>RAG</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*lwHouz5BiyrBxt9o.png" /></figure><p>We introduced an <a href="https://github.com/neuml/rag">open-source Retrieval Augmented Generation (RAG) application</a> built on top of txtai. This application adds a front-end with Streamlit to txtai RAG. It supports Vector RAG and Graph RAG.</p><p>This application had <strong>9 releases</strong> in 2024 and earned <strong>309 ⭐’s</strong> on GitHub. More can be read in the article below.</p><p><a href="https://medium.com/neuml/introducing-rag-with-txtai-f3456977cf91">Introducing RAG with txtai</a></p><h4><strong>AnnotateAI</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*xoUo7ozO5i7lsDrM.png" /></figure><p><a href="https://github.com/neuml/annotateai">annotateai</a> automatically annotates papers using Large Language Models (LLMs).</p><p>A one-line call does the following:</p><ul><li>Reads the paper</li><li>Finds the title and important key concepts</li><li>Goes through each page and finds sections that best emphasis the key concepts</li><li>Reads the section and builds a concise short topic</li><li>Annotates the paper and highlights those sections</li></ul><p>annotateai incorporates <a href="https://github.com/neuml/txtmarker">txtmarker</a> to highlight PDFs. This is notable as 2024 was the first new release of txtmarker since 2020! This illustrates the point that some projects fade away then come back.</p><p>annotateai had <strong>2 releases</strong> in 2024 and earned <strong>206 ⭐’s</strong> on GitHub. More can be read in the article below.</p><p><a href="https://medium.com/neuml/introducing-annotateai-aecda8851ce5">Introducing AnnotateAI</a></p><h4><strong>PaperAI / PaperETL</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*2C_U0Dtq_oGdqvzA.png" /></figure><p>The biggest current downstream project is <a href="https://github.com/neuml/paperai">paperai</a>. paperai is a semantic search and workflow application for medical/scientific papers. It helps automate tedious literature reviews allowing researchers to focus on their core work. <a href="https://github.com/neuml/paperetl">paperetl</a> is a companion project for parsing medical literature. The paperai stack has been integrated in a number of NeuML’s consulting efforts.</p><p>paperai had <strong>1 release</strong> in 2024 and earned <strong>212 ⭐’s</strong> on GitHub. paperetl has <strong>1 release</strong> in 2024 and earned <strong>93 ⭐’s</strong> on GitHub.</p><h4>Public Models</h4><p>NeuML believes in open-source AI. As part of that, we’ve released a number of public models on the Hugging Face Hub. In 2024, we released <strong>8 new or updated models</strong> to the Hub!</p><p><a href="https://huggingface.co/NeuML">NeuML (NeuML)</a></p><p><a href="https://github.com/orgs/neuml/repositories">Our repositories page on GitHub</a> is continuously updated and projects that are no longer supported are marked as “Public Archive”. Otherwise, projects on that page will continue to be supported on an <em>as-needed</em> basis.</p><h3>Consulting Services</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ajGRdnTYyVXRexzcmmWx_w.png" /></figure><p>NeuML provides consulting services around our open-source stack:</p><ul><li><strong>Generative AI</strong> Build agents, retrieval-augmented generation (RAG), large language model (LLM) orchestration and chat with your data systems</li><li><strong>AI-driven Literature Analysis</strong> Automate analysis of unstructured medical, scientific and technical literature</li><li><strong>Model Development</strong> Create AI, Machine Learning and/or NLP models that excel in industry-specific domains</li><li><strong>Advisory and Strategy</strong> Leverage our expertise to plan your data, engineering and AI strategy</li><li><strong>Speaking Engagement</strong>s Discuss txtai, industry trends, insights and developments in the space with your team</li><li><strong>Paid Support</strong> Meet with us, receive a private Slack channel to ask questions and get implementation guidance</li></ul><p>Our efforts in 2024 were once again centered around txtai. Consulting work is symbiotic with our open-source projects, each helping to push the other ahead.</p><p>Revenue separates hobby projects from projects that are part of a company. Consulting is the main source of revenue for NeuML and required for the viability of the company, as it is structured today.</p><p>In 2024, NeuML had a mix of hands-on consulting projects, primarily focused on Medical RAG and advisory support. Some projects are long-term and continuing, others are shorter periods of support.</p><h3>Rating our progress in 2024</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*aDcMpTIrfQRvWpof.png" /></figure><p>We’ve covered quite a lot of information already recapping 2024. Next, let’s discuss how we stacked up against what we set out to do back in January. Each goal will be rated from 1–5 with 5 being the highest and 1 the lowest.</p><p>These were the goals set at the beginning of the year. Each goal is an abbreviated version from <a href="https://medium.com/neuml/neuml-2023-year-in-review-560457b97fdb">NeuML’s 2023 Year in Review</a> article.</p><h4>Generative knowledge graphs</h4><p><em>Retrieval augmented generation (RAG) powered by knowledge graphs.</em></p><p>⭐⭐⭐⭐⭐ (5 of 5)</p><p>txtai was one of the first if not <em>the</em> first project that saw the promise of graphs for search and RAG. GraphRAG was added in txtai 7.0.</p><p>This is powered by graph path traversal. Using a path of related nodes as context enables a breadth of knowledge not possible with simple vector search.</p><p>GraphRAG was featured in a number of articles and examples in 2024. Switching between vector search and knowledge graph search is now seamless.</p><h4><strong>Micromodels</strong></h4><p><em>Models that can run on limited-resourced systems such as microcontrollers, phones and embedded devices.</em></p><p>⭐⭐⭐⭐⭐ (5 of 5)</p><p>Model2Vec was integrated in txtai 8.0. This adds the possibility of embeddings models in the 1M+ parameter range. Work was done on this in late 2024 and expected to be released in early 2025. Stay tuned!</p><p>NeuML also added a <a href="https://huggingface.co/NeuML/Llama-3.1_OpenScholar-8B-AWQ">quantized version of the OpenScholar model</a>. While this isn’t for embedded devices, it does enable running on lesser resourced devices.</p><p>Last but not least, we released <a href="https://huggingface.co/NeuML/pubmedbert-base-embeddings-matryoshka">PubMedBERT Embeddings Matryoshka</a> which allows creating vectors as small as 64 dimensions. It’s important to note this only helps with storage, not processing speed. But that is where the Model2Vec model will come in soon!</p><h4>Cloud offering</h4><p><em>Adding a cloud offering enables rapid development, especially for those with small and/or overloaded technical teams.</em></p><p>⭐⭐ (2 of 5)</p><p>As of January 1, 2025 we’ve received over<strong> 180 responses </strong>to our <a href="https://txtai.cloud">txtai.cloud form</a>. There has been a sizable interest in this initiative.</p><p>There has also been a strong push by the large cloud vendors with new offerings such as <a href="https://aws.amazon.com/bedrock/">AWS Bedrock</a>, <a href="https://cloud.google.com/vertex-ai">GCP Vertex AI</a> and <a href="https://azure.microsoft.com/en-us/solutions/ai">Azure AI</a>.</p><p>We’ve taken a “wait and see” approach to decide the best way to do this.</p><h4><strong>Consulting 2x</strong></h4><p><em>In 2024, we’ll set out to double our consulting efforts over what was done in 2023.</em></p><p>⭐⭐⭐⭐ (4 of 5)</p><p>It was mission accomplished in terms of doubling our consulting efforts. 2024 brought a good mix of short-term advisory work paired with longer-term development projects. With this approach, there just needs to be a constant focus on building a pipeline of customers given that needs change over time.</p><p>There’s certainly room for growth but it was a solid effort in 2024.</p><h4><strong>Community engagement and training</strong></h4><p><em>Speaking engagements and training provide immense value to our open-source projects. We’ll look to do a higher volume of this in 2024.</em></p><p>⭐⭐⭐ (4 of 5)</p><p>We’ve been actively engaged online and on social media. Our social media presence grew a great deal in 2024. NeuML just needs to literally not only virtually “get out of the building”.</p><p>This will continue to be an area of focus in 2025.</p><h4>Overall</h4><p><em>In 2024, the self-proclaimed score for NeuML is </em>🥁 🎶</p><p>⭐⭐⭐⭐ (20 of 25)</p><p>This averages out to a <strong>4 out of 5. </strong>Goals are that, goals. Sometimes you hit them and other times you don’t and you learn how to do better.</p><p>It was a <strong>5 out of 5 for </strong>txtai<strong> in 2024</strong>, couldn’t have been better! But in terms of capitalizing on this project from a business perspective, we’re looking to do even better in 2025!</p><h3>Playbook for 2025</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*TNoFAvvHnhEfkt-M.png" /></figure><p>Looking ahead to 2025, we’ll focus on the following areas. Our goals are more concise this year than years past. And all three goals are symbiotic and work in tandem together.</p><h4>TxtAI 10K</h4><p>With txtai sitting at 9.8K stars, this doesn’t seem very ambitious. This goal is here to celebrate the hard work and fortitude it takes to grow a project with <em>real</em> interest in 2024. While many projects have more stars, txtai has grown organically with <em>real</em> people.</p><p>It’s also expected that reaching this milestone will bring new people in and build confidence that txtai is indeed a project that enterprises can build around. It’s been supported now for 5 years and just keeps getting better!</p><h4>NeuML as a leading voice in AI Community</h4><p>NeuML has a growing following in what is now a crowded space. In 2025, more should be done to let the world know what NeuML can do, what txtai can do and what our other projects can do.</p><p>In addition to our robust online and social media presence, NeuML needs to do more in engaging with customers and the community. Conferences, meet ups and other opportunities to meet directly with those looking to integrate AI into their workflows in 2025.</p><h4>Monetization of our place in the AI space</h4><p>Our primary revenue-generating activity to date has been providing consulting support. These consulting projects have driven a number of new initiatives in txtai as it illuminates real-world challenges.</p><p>Goals like “Consulting 2x”, “txtai.cloud” are great but it ultimately comes down how to best “monetize” NeuML’s work. AI is a very dynamic space and things change fast, certainly over the course of 12 months.</p><p>For the value that txtai is bringing to the space, it still feels like NeuML is leaving upside on the table. The goal is to be better here in 2025.</p><p>To sum it up, growing txtai to 10K stars brings new interest to NeuML, which should help the company become a larger voice in the AI space. That foundation and a more concerted focus on customer engagement, will lead to more revenue-generating activities.</p><h3>Ways to find NeuML</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*chLJl8T4PqOgVPnH.png" /></figure><p>The full list of ways to interact with NeuML is shown is below.</p><p><strong>Contact us<br></strong><a href="https://neuml.com/">Website</a> | <a href="mailto:info@neuml.com">Email</a> | <a href="https://txtai.slack.com/join/shared_invite/zt-1cagya4yf-DQeuZbd~aMwH5pckBU4vPg#/shared-invite/email">Slack</a></p><p><strong>Code<br></strong><a href="https://github.com/neuml">GitHub</a> | <a href="https://hub.docker.com/u/neuml">Docker Hub</a> | <a href="https://hf.co/neuml">HF Spaces</a> | <a href="https://txtai.cloud">Cloud</a></p><p><strong>Social Media<br></strong><a href="https://www.linkedin.com/company/neuml">LinkedIn</a> | <a href="https://twitter.com/neumll">Twitter</a> | <a href="https://www.facebook.com/neuml-106140420955354">Facebook</a> | <a href="https://www.youtube.com/@neuml">YouTube</a> | <a href="https://reddit.com/r/txtai">Reddit</a></p><p><strong>Articles<br></strong><a href="https://medium.com/neuml">Medium</a> | <a href="https://neuml.hashnode.dev">Hashnode</a> | <a href="https://dev.to/neuml">dev.to</a> | <a href="https://neuml.substack.com/">Newsletter</a></p><p><strong>Consulting Support<br></strong>Need help with txtai? Struggling to build your own datasets to power AI systems? Want a fractional CTO to help with your overall direction?</p><p><a href="mailto:info@neuml.com">Reach out</a> to discuss how NeuML can provide advisory support and/or development assistance.</p><h3>Wrapping up</h3><p>This article covered the state of NeuML at the end of 2024 and our plans for 2025. We’re incredibly optimistic on the future of the AI space and NeuML!</p><p>Thank you for reading. Please follow along and check in on how we’re doing over the course of 2025.</p><p><em>Interested in NeuML’s history? Then read the recaps from </em><a href="https://medium.com/neuml/being-thankful-in-2020-c69a0bc1f67e"><em>2020</em></a><em>, </em><a href="https://medium.com/neuml/neuml-2021-year-in-review-ef051fcdbbda"><em>2021</em></a><em>, </em><a href="https://medium.com/neuml/neuml-2022-year-in-review-b17787179b7e"><em>2022</em></a><em> and </em><a href="https://medium.com/neuml/neuml-2023-year-in-review-560457b97fdb"><em>2023</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d446deaf5390" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/neuml-2024-year-in-review-d446deaf5390">NeuML — 2024 Year in Review</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Introducing AnnotateAI]]></title>
            <link>https://medium.com/neuml/introducing-annotateai-aecda8851ce5?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/aecda8851ce5</guid>
            <category><![CDATA[search]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[gen-ai-tools]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Thu, 19 Dec 2024 01:58:16 GMT</pubDate>
            <atom:updated>2025-12-01T16:05:00.788Z</atom:updated>
            <content:encoded><![CDATA[<h4><strong>Automatically annotate papers using LLMs</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/705/1*MCR28wwRJq4PQZ9igpffHw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4jW1O1rLVVO9WHxp7jwIgg.png" /></figure><p>The volume of research papers is growing at an astronomical rate💫. With modern tooling it’s easier than ever to create a paper. And the breadth of topics is also growing fast. No one person can possibly understand all that is going on heading into 2025.</p><p>While LLMs can summarize papers, search papers and build generative text about papers, what about providing human readers with context as they read?</p><p>This article introduces annotateai, a project that automatically annotates papers using Large Language Models (LLMs).</p><h3>Introducing AnnotateAI</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UPFdU-4nEzUey07oMT_2lg.png" /></figure><p><a href="https://github.com/neuml/txtai">annotateai</a> automatically annotates papers using Large Language Models (LLMs).</p><p>A one-line call does the following:</p><ul><li>Reads the paper</li><li>Finds the title and important key concepts</li><li>Goes through each page and finds sections that best emphasis the key concepts</li><li>Reads the section and builds a concise short topic</li><li>Annotates the paper and highlights those sections</li></ul><p>annotateai is built with Python 3.9+ and is open-source under an Apache 2.0 license.</p><h3>Install and run AnnotateAI</h3><p>This following shows how to install annotateai</p><pre>pip install annotateai</pre><h3>Examples</h3><p>annotateai can annotate any PDF but it works especially well for medical and scientific papers. The following shows a series of examples using papers from <a href="https://arxiv.org/">arXiv</a>.</p><p>This project also works well with papers from <a href="https://pubmed.ncbi.nlm.nih.gov/">PubMed</a>, <a href="https://www.biorxiv.org/">bioRxiv</a> and <a href="https://www.medrxiv.org/">medRxiv</a>!</p><h4>Setup</h4><p>The primary input parameter is the path to the LLM. This project is backed by <a href="https://github.com/neuml/txtai">txtai</a> and it supports any <a href="https://neuml.github.io/txtai/pipeline/text/llm/">txtai-supported LLM</a>.</p><pre>from annotateai import Annotate<br><br># Lightweight but powerful default model<br>annotate = Annotate(&quot;Qwen/Qwen3-4B-Instruct-2507&quot;)<br><br># The previous default model uses the now deprecated AutoAWQ library<br># Run pip install autoawq to enable<br># Note as time goes on, this may require pinning to older versions of transformers &amp; torch<br>annotate = Annotate(&quot;NeuML/Llama-3.1_OpenScholar-8B-AWQ&quot;)<br><br># llama.cpp version of the above model<br># Run pip install llama-cpp-python to enable<br>annotate = Annotate(<br>  &quot;bartowski/Llama-3.1_OpenScholar-8B-GGUF/Llama-3.1_OpenScholar-8B-Q4_K_M.gguf&quot;<br>)</pre><h4>Annotate paper “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”</h4><p>This paper proposed RAG before most of us knew we needed it.</p><pre>annotate(&quot;https://arxiv.org/pdf/2005.11401&quot;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*2Wp9k_r_5dZs7an3.png" /><figcaption><em>Source: </em><a href="https://arxiv.org/pdf/2005.11401"><em>https://arxiv.org/pdf/2005.11401</em></a></figcaption></figure><h4>Annotate paper “HunyuanVideo: A Systematic Framework For Large Video Generative Models”</h4><p>This paper builds the largest open-source video generation model as of Dec 2024.</p><pre>annotate(&quot;https://arxiv.org/pdf/2412.03603v2&quot;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*edJewqOiq8AEQtKi.png" /><figcaption><em>Source: </em><a href="https://arxiv.org/pdf/2412.03603v2"><em>https://arxiv.org/pdf/2412.03603v2</em></a></figcaption></figure><h4>Annotate paper “OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset”</h4><p>This paper was presented at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks.</p><pre>annotate(&quot;https://arxiv.org/pdf/2406.14657&quot;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*DaTYQ2pxZFuzXYg1.png" /><figcaption><em>Source: </em><a href="https://arxiv.org/pdf/2406.14657"><em>https://arxiv.org/pdf/2406.14657</em></a></figcaption></figure><h3>Docker Web Application</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*28zz6p19ijUYW532.gif" /></figure><p><a href="https://hub.docker.com/r/neuml/annotateai">neuml/annotateai</a> is a web application available on Docker Hub.</p><p>This can be run with the default settings as follows.</p><pre>docker run -d --gpus=all -it -p 8501:8501 neuml/annotateai</pre><p>The LLM can also be set via ENV parameters.</p><pre>docker run -d --gpus=all -it -p 8501:8501 -e LLM=unsloth/gpt-oss-20b-GGUF/gpt-oss-20b-Q4_K_M.gguf -e MAXLENGTH=10000 -e n_ctx=4096 neuml/annotateai</pre><p>The code for this application can be found in the project’s <a href="https://github.com/neuml/annotateai/tree/master/app">app folder</a>.</p><h4>Prompts</h4><p>The following LLM prompts power annotateai</p><p><strong>Find and extract title</strong></p><p>The title is extracted from the paper and a highlight is created for it.</p><pre>Extract the paper title from the following text. Only return the title.</pre><p><strong>Generate keywords</strong></p><p>Keywords or concepts are generated for the paper. These keywords drive the highlighting process. The process goes page by page and highlights sections that best cover the keywords and concepts.</p><pre>Generate the best highly descriptive keywords for the paper.<br>Only return the keywords as comma separated.</pre><p><strong>Generate topic</strong></p><p>Once a section is highlighted, a topic is generated for it. This prompt distills the section down into simple terms for the reader.</p><pre>Create a simple, concise topic name in less than 5 words for the<br>following text. Only return the topic name.</pre><h3>Wrapping up</h3><p>This article introduced annotateai, a project that automatically annotates papers using Large Language Models (LLMs). We’re all not experts in many topics. annotateai is here to help broaden our horizons and learn more!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=aecda8851ce5" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/introducing-annotateai-aecda8851ce5">Introducing AnnotateAI</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Postgres is all you need for vectors]]></title>
            <link>https://medium.com/neuml/postgres-is-all-you-need-for-vectors-fb065e09ec64?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/fb065e09ec64</guid>
            <category><![CDATA[vector-search]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[postgresql]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Wed, 11 Dec 2024 19:41:20 GMT</pubDate>
            <atom:updated>2024-12-11T20:02:25.332Z</atom:updated>
            <content:encoded><![CDATA[<h4>Start small with SQLite and move up to Postgres for the win</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nwhFeHhB3_Lb0ejDIg9hnQ.png" /></figure><p><a href="https://github.com/neuml/txtai">txtai</a> is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*jW0nyUZQLv-I4peq" /></figure><p>Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases.</p><p>This foundation enables vector search and/or serves as a powerful knowledge source for large language model (LLM) applications.</p><p>txtai enables rapid development via local persistence with a SQLite + Faiss ensemble. This setup scales surprisingly well (millions of records) and lets one explore the world of vector search and AI.</p><p>From there, txtai makes it easy to switch persistence from local to client-server databases and beyond.</p><p>txtai supports the following persistence formats for each of it’s components.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zZoziBqRpaoC5A2r_bKYWw.png" /></figure><p>This article is going to cover txtai persistence with Postgres.</p><h3>Introduction to Vector Search</h3><p>Over the last few years, vector search burst on to the scene. For those only familiar with keyword search and not exactly sure of the benefits of semantic (aka vector) search, check out the article below for more.</p><p><a href="https://medium.com/neuml/getting-started-with-semantic-search-a9fd9d8a48cf">Getting started with semantic search</a></p><p>Vector search has helped improve the accuracy, recall and precision of search overall. As you would guess, vector search requires building vectors.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9QfQyvrOzmFncLN6QBb1Ew.png" /></figure><p>At a high level, text is tokenized, run through an embeddings model and a series of floating point numbers are returned. These vectors are designed to be compared with other vectors for similarity. The most common metric used is <a href="https://en.wikipedia.org/wiki/Cosine_similarity">cosine similarity</a>.</p><p>The paper <a href="https://arxiv.org/abs/1908.10084">Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks</a> is a great way to learn much more on this topic.</p><p>Vector search is also often paired with Large Language Models (LLMs) for more advanced tasks like Retrieval Augmented Generation (RAG). RAG helps reduce the risk of hallucinations by limiting the context in which a LLM can generate answers.</p><p>If you’d like to learn more on RAG, see the article below.</p><p><a href="https://medium.com/neuml/introducing-rag-with-txtai-f3456977cf91">Introducing RAG with txtai</a></p><p>With vectors comes the requirement of building the infrastructure to store and search vectors. A whole ecosystem of vector search databases has sprung up.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/735/1*RMKEraMhDwOim_tY9ESmUQ.png" /><figcaption>Source: <a href="https://www.sequoiacap.com/article/generative-ai-act-two/">https://www.sequoiacap.com/article/generative-ai-act-two/</a></figcaption></figure><p>These systems are all great in their own way. Yet in other ways, we’re going to have to rebuild all the production and reliability tools built into more established databases.</p><p>What if we could store vectors with something we already know?</p><h3>Introduction to Postgres</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/194/1*Lk-ekese3-A4akSR5c_zlw.png" /></figure><p><a href="https://www.postgresql.org/">Postgres</a> is a powerful, open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.</p><p>It’s also one of the most popular databases.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/820/1*v9EugjuEEwTGmkXHCsq33A.png" /><figcaption>Source: <a href="https://db-engines.com/en/ranking">https://db-engines.com/en/ranking</a></figcaption></figure><p>Postgres is a battle-tested system with years of production experience in <a href="https://www.postgresql.org/docs/current/maintenance.html">maintenance</a>, <a href="https://www.postgresql.org/docs/current/backup.html">backups</a>, <a href="https://www.postgresql.org/docs/current/high-availability.html">high availability</a>, <a href="https://www.postgresql.org/docs/current/monitoring.html">monitoring</a> and <a href="https://www.postgresql.org/docs/current/ddl-rowsecurity.html">security</a>. On top of that, there is also a whole ecosystem of analytical and BI tools built to work with Postgres.</p><p>Postgres doesn’t have a vector type or vector index capability built-in. Originally, there wasn’t a choice and we had to store vectors somewhere else. Now there is a popular open-source vector type and index provided by <a href="https://github.com/pgvector/pgvector">pgvector</a>.</p><p>txtai supports storing content, vectors, graph nodes and more with Postgres + pgvector.</p><h3>Postgres + pgvector + txtai</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1008/1*AUVL3_IvRSaT_C40_kdLZw.png" /></figure><p>The <em>Introducting txtai </em>article goes into the basic use cases for txtai. The first example shows how to store text elements with a Faiss index. It also covers Faiss + SQLite.</p><p><a href="https://medium.com/neuml/introducing-txtai-the-all-in-one-embeddings-database-c721f4ff91ad">Introducing txtai, the all-in-one embeddings database</a></p><p>The example below shows how to build a Faiss + SQLite embeddings index.</p><pre>from txtai import Embeddings<br><br># Works with a list, dataset or generator<br>data = [<br>  &quot;US tops 5 million confirmed virus cases&quot;,<br>  &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;,<br>  &quot;Beijing mobilises invasion craft along coast as Taiwan tensions escalate&quot;,<br>  &quot;The National Park Service warns against sacrificing slower friends in a bear attack&quot;,<br>  &quot;Maine man wins $1M from $25 lottery ticket&quot;,<br>  &quot;Make huge profits without work, earn up to $100,000 a day&quot;<br>]<br><br># Create an embeddings<br>embeddings = Embeddings(<br>  path=&quot;sentence-transformers/nli-mpnet-base-v2&quot;,<br>  content=True<br>)<br><br># Create an index for the list of text<br>embeddings.index(data)<br>embeddings.save(&quot;index&quot;)<br><br>print(embeddings.search(&quot;feel good story&quot;, 1))</pre><p>Running this code will show the entry for:</p><blockquote><em>Maine man wins $1M from $25 lottery ticket</em></blockquote><p>Looking at the <em>index </em>directory, we’ll see the files:</p><ul><li><em>config.json</em>: index configuration</li><li><em>documents</em>: SQLite database with the content</li><li><em>embeddings</em>: Faiss index with the vectors</li></ul><p>How hard is it to switch persistence to a running Postgres database?</p><pre># Set these ENV variables<br># ANN_URL=&quot;postgresql+psycopg2://user:pass@localhost/postgres&quot;<br># CLIENT_URL=&quot;postgresql+psycopg2://user:pass@localhost/postgres&quot;<br><br>from txtai import Embeddings<br><br>with Embeddings(<br>  path=&quot;sentence-transformers/nli-mpnet-base-v2&quot;,<br>  content=&quot;client&quot;,<br>  backend=&quot;pgvector&quot;) as embeddings:<br><br>  # Works with a list, dataset or generator<br>  data = [<br>    &quot;US tops 5 million confirmed virus cases&quot;,<br>    &quot;Canada&#39;s last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg&quot;,<br>    &quot;Beijing mobilises invasion craft along coast as Taiwan tensions escalate&quot;,<br>    &quot;The National Park Service warns against sacrificing slower friends in a bear attack&quot;,<br>    &quot;Maine man wins $1M from $25 lottery ticket&quot;,<br>    &quot;Make huge profits without work, earn up to $100,000 a day&quot;<br>  ]<br><br>  # Create an index for the list of text<br>  embeddings.index(data)<br><br>  print(embeddings.search(&quot;feel good story&quot;, 1))</pre><p>As we can see, not much is different here! Just simply changing the <em>content</em> and <em>backend</em> to point to Postgres. The URLs can be directly set in the code although it’s strongly advised to use environment variables so those aren’t stored in plain text. <a href="https://neuml.github.io/txtai/embeddings/configuration/database/#content">Read more about this here</a>.</p><p>For good measure, let’s query the database and see what we have. The diagrams below are created with <a href="https://www.pgadmin.org/">pgAdmin</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/857/1*v0nR8ZGWqSos9kMFg_kakw.png" /><figcaption>Database Schema</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/979/1*Pu-Ka4-F3IS95oOkdpPbtw.png" /><figcaption>Text data</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/983/1*199k60sqqJgyUforrEKZww.png" /><figcaption>Vector data</figcaption></figure><p>As we can see, we have tables storing text and vectors!</p><p>Now all our data is in Postgres. We get access to all those features we mentioned earlier (maintenance, backups, monitoring, security etc).</p><h3>16-bit and Binary Embeddings</h3><p>The pgvector backend also supports 16-bit and binary embeddings.</p><pre>Embeddings(<br>  path=&quot;sentence-transformers/nli-mpnet-base-v2&quot;,<br>  content=&quot;client&quot;,<br>  backend=&quot;pgvector&quot;,<br>  pgvector={<br>    &quot;precision&quot;: &quot;half&quot;<br>  }<br>)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/862/1*_rmZq6ACqb_bn0eYesae1w.png" /><figcaption>txtai schema with <strong>halfvec (16-bit)</strong></figcaption></figure><pre>Embeddings(<br>  path=&quot;sentence-transformers/nli-mpnet-base-v2&quot;,<br>  content=&quot;client&quot;,<br>  backend=&quot;pgvector&quot;,<br>  quantize=1<br>)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/876/1*glgw4i0GQKNbVCn7rKpwtw.png" /><figcaption>txtai schema with <strong>BIT (384 1-bit numbers)</strong></figcaption></figure><p>These options are a tradeoff between storage space and accuracy. In some use cases, the tradeoff might be worth the slight accuracy loss.</p><h3>Wrapping up</h3><p>This article covered how txtai integrates with Postgres + pgvector. While this setup works perfectly well with self-hosted systems, it also works with cloud-hosted Postgres instances.</p><p>We’re all in with this setup and believe it’s a solid choice in building enterprise systems ready for production!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=fb065e09ec64" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/postgres-is-all-you-need-for-vectors-fb065e09ec64">Postgres is all you need for vectors</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Grow your open-source project 2.0]]></title>
            <link>https://medium.com/neuml/grow-your-open-source-project-2-0-59d7da5ffdfb?source=rss-bd6fa5e6030------2</link>
            <guid isPermaLink="false">https://medium.com/p/59d7da5ffdfb</guid>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[open-source]]></category>
            <category><![CDATA[generative-ai-tools]]></category>
            <category><![CDATA[large-language-models]]></category>
            <category><![CDATA[github]]></category>
            <dc:creator><![CDATA[David Mezzetti]]></dc:creator>
            <pubDate>Tue, 26 Nov 2024 16:26:06 GMT</pubDate>
            <atom:updated>2024-11-26T16:32:10.911Z</atom:updated>
            <content:encoded><![CDATA[<h4>How It Started vs. How It’s Going</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*x5IHLn_zzevLjRoCYUS5HQ.png" /></figure><p><em>This is a follow-on update to the </em><a href="https://medium.com/neuml/grow-your-open-source-project-5b439cc9ca1a"><em>original article</em></a><em> written two years ago.</em></p><p>Developers often dream of finding the time and energy to build an open-source software project. Building a project to help the community is the goal, especially when we’ve benefited so much from open source.</p><p>Many of us search GitHub for an existing library, install it and move on. Maybe we leave a star but often we don’t. Without open-source, our jobs as developers would be significantly harder. Developers always want to learn and love learning about new projects that can make future work easier.</p><p>This article focuses on the journey 🚀 of <a href="https://github.com/neuml/txtai">txtai</a> over the last two years since the original article was written. That article was written right before the <a href="https://openai.com/index/chatgpt/">release of ChatGPT </a>and the start of the AI craze. The original article has more basic tips if just getting started and is still worth a read.</p><h3>About txtai</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/715/1*FuXkiNX3DtRGXn8j9ASuRA.png" /><figcaption><a href="https://github.com/neuml/txtai">Github Summary for txtai as of 2024–11–26</a></figcaption></figure><p><a href="https://github.com/neuml/txtai">txtai</a> is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.</p><p>It’s a foundational vector database with tooling built-in to support autonomous agents, retrieval augmented generation (RAG), large language model (LLM) orchestration and of course vector search.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*j4DY0sFJ6jS4tPVgJxrC_w.png" /><figcaption><a href="https://star-history.com/#neuml/txtai&amp;Date">GitHub Star history for txtai</a></figcaption></figure><p>Since the original article two years ago, txtai went from 2.2K ⭐’s to 9.5K ⭐’s today. That’s over 300% growth!</p><p>There is a “rising tide lifts all ships” phenomena in the AI space. Nonetheless, the project has seen steady growth due to it’s continued activity, being in a hot space and solving a problem people think is worth solving.</p><h3>A Rapidly Evolving Landscape</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*1aPNKCZLsR5GUmdgrr0yKw.png" /><figcaption>Source: <a href="https://www.sequoiacap.com/article/generative-ai-act-two/">https://www.sequoiacap.com/article/generative-ai-act-two/</a></figcaption></figure><p>There is no shortage of vector databases and LLM frameworks in late 2024.</p><ul><li><a href="http://google.com/search?q=vector+database">Search for vector databases</a></li><li><a href="http://google.com/search?q=llm+framework">Search for LLM frameworks</a></li></ul><p>This includes open-source vector databases (i.e. Weaviate, Qdrant, Chroma) and closed ones (i.e. Pinecone).</p><p>On the LLM framework side the big ones are LangChain and LlamaIndex.</p><p>txtai fits into all three of the categories above and more. Many of the tools above didn’t exist when the original article was written in 2022.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/989/1*dHW2RisgvobZ_HBdyI29JQ.png" /><figcaption>Source: <a href="https://ossinsight.io/collections/vector-search-engine/">https://ossinsight.io/collections/vector-search-engine/</a></figcaption></figure><p>With all that being said, it’s doing quite well and holding it’s own against some heavyweights.</p><h3>How it’s going</h3><p>With the increasingly crowded AI space and how everyone now says “powered by AI”, it’s often hard to get any air in the space. But all is not lost.</p><p>Traditional social media channels are still a viable way to share your project. It’s just more work and it takes being more persistent and savvy to actually reach people in 2024, especially when using the word “AI”.</p><p>Most still appreciate an honest approach with a touch of humility. You’ll stand out against those who are selling “AI Snake oil”.</p><p>Here are the main methods that have helped txtai grow.</p><h4>Hacker News</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fQj8zeN_wPJuRAg57DqKYg.png" /><figcaption><a href="https://news.ycombinator.com/item?id=41024362">Trending Hacker News Post</a></figcaption></figure><p>Trending on Hacker News is a surefire way to get a lot of traffic and stars.</p><p>Having a good project and putting thought into your headline goes a long way. Hacker News is a community where minimalism speaks to it.</p><p>If you can get a post visible and trending, there is still no better way to grow. The post above helped add 800+ ⭐’s over a week alone.</p><h4>LinkedIn</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/537/1*2vMX5OYoLRdRwDcAaP5o3w.png" /><figcaption><a href="https://www.linkedin.com/posts/neuml_alrightso-here-we-go-were-proud-activity-7264489976346636288-XhBn/">LinkedIn Release Post</a></figcaption></figure><p>LinkedIn is a community for serious conversation. It’s also a place where people love celebrating accomplishments! Releases, milestones and other celebratory posts will get good traction and visibility.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/546/1*ElhIdsQawLiKJT5_v5wM8g.png" /><figcaption><a href="https://www.linkedin.com/posts/khuyen-tran-1401_semanticsearch-llm-activity-7159208252541390849-Xp8E/">LinkedIn Post</a></figcaption></figure><p>It always helps when someone else organically posts about your project that has a large network too!</p><h4>X (Formerly known as Twitter)</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/591/1*BLG--5UYshnGaC_VReP9Aw.png" /><figcaption><a href="https://x.com/neumll/status/1809322170256154706">Post on X</a></figcaption></figure><p>The best posts on X will be technical. If you can show how your project solves a problem and/or how it compares to other known projects, you’ll get good traction.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/588/1*cioeV03u_SGc9Outvc5PCg.png" /><figcaption><a href="https://x.com/tom_doerr/status/1855365755962835255">Trending Post on X</a></figcaption></figure><p>It’s always better if someone else shares your work. This is the power⚡ of open source at it’s finest.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/578/1*V2YhcX-QpIiSo-nL8h3FmA.png" /><figcaption><a href="https://x.com/shao__meng/status/1838199816842801575">Trending Post on X</a></figcaption></figure><p>And perhaps someone shares it from the other side of the world 🌏.</p><h4>Reddit</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/768/1*eRla2B2OxNz5V6NsG4b3Hg.png" /><figcaption><a href="https://www.reddit.com/r/LocalLLaMA/comments/1guovi1/txtai_80_released_an_agent_framework_for/">Reddit Post</a></figcaption></figure><p>Reddit is a great place to have deep technical discussions. There are many communities focused on development. The best one for txtai as of 2024 is <a href="https://www.reddit.com/r/LocalLLaMA">r/LocalLLaMA</a>.</p><p>There are a lot of nice people who provide valuable and insightful feedback. But you have to have thick skin in some cases. It’s best to not engage with some users that are looking to pick a fight though. It won’t go anywhere positive. Maybe they’re just having a bad day 😀.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/753/1*Yx_r-yqmusMkNRUEHaVatg.png" /><figcaption><a href="https://www.reddit.com/r/LocalLLaMA/comments/1e1toel/llms_are_not_the_only_way/">Reddit Post</a></figcaption></figure><p>Technical and intellectually stimulating topics will do well.</p><h4>Medium/HashNode/Technical Blogs</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/874/1*hjBjIA9VDpL00lTeLbKxAQ.png" /><figcaption><a href="https://medium.com/p/59d7da5ffdfb">Is this an infinite loop?</a></figcaption></figure><p>Sharing information about your project in long form is also important. This can be direct examples or indirectly as with this article.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/869/1*pqXOC0UF60f7nBT73BN1eA.png" /><figcaption><a href="https://neuml.hashnode.dev/granting-autonomy-to-agents">Granting autonomy to agents</a></figcaption></figure><p>One thing that has worked great is to write an article as a Jupyter notebook and port that to the blogging site. Then link to the notebook as a “ready-to-run” example.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/876/1*6GZN2Vjh6yRvK0_CptQCOg.png" /><figcaption><a href="https://neuml.hashnode.dev/analyzing-hugging-face-posts-with-graphs-and-agents">Analyzing Hugging Face Posts with Graphs and Agents</a></figcaption></figure><p>Writing deep dive articles with code examples is crucial in showing developers how to use your project.</p><h4>Facebook</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/405/1*YqN_M5z4aOSzSEhQwrg2mA.png" /><figcaption><a href="https://www.facebook.com/people/NeuML/100057403391445/">NeuML Facebook Page</a></figcaption></figure><p>While content is shared on Facebook, txtai hasn’t been able to build an engaged community there. Not sure this is a great place to share technical content.</p><h4>GitHub Trending</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/990/0*EoeSa45crRh-q9O2" /><figcaption>Source: <a href="https://github.com/trending">https://github.com/trending</a> July 2024</figcaption></figure><p>When all goes right and enough activity is going to your project, if you’re lucky, you’ll get another boost! This time from GitHub Trending. Projects that trend on the front page can easily add 100s if not 1,000s of ⭐’s.</p><p>You just have to get lucky here. txtai has actually never trended on the GitHub front page, only the front page for Python. But even that has added a lot of stars.</p><h3>Wrapping up</h3><p>This article covered the journey 🚀 of <a href="https://github.com/neuml/txtai">txtai</a> over the last two years and shared ways to grow your own project.</p><p>Getting traction is a feast or famine activity and it can be frustrating. There will be good days and bad days. There has never been a time in history that one person can potentially have as much influence as they can today!</p><p>If you have a great project and stick with it, the community will come!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=59d7da5ffdfb" width="1" height="1" alt=""><hr><p><a href="https://medium.com/neuml/grow-your-open-source-project-2-0-59d7da5ffdfb">Grow your open-source project 2.0</a> was originally published in <a href="https://medium.com/neuml">NeuML</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>