Stories by David Mezzetti on Medium

Introducing ncoder

David Mezzetti — Thu, 22 Jan 2026 18:34:24 GMT

💫 Open-Source AI coding agent that integrates with Jupyter Notebooks

A video version of this article is also available.

ncoder is an open-source AI coding agent that integrates with Jupyter Notebooks. This project uses the OpenAI API client to connect to any OpenAI-compatible endpoint and enable collaborative coding with AI.

ncoder provides a sandboxed base Docker image that supports coding with OpenCode in server mode, a quantized Qwen3-Coder 30B model for lightweight local inference and/or any other txtai process.

ncoder is designed for Developers, AI Engineers and Data Scientists that spend a lot of their time inside of Jupyter Notebooks. If you do your research and/or prototyping inside of notebooks, this gives you an easy way to pull in new ideas.

Getting Started

ncoder consists of two parts: a sandboxed Docker image with an AI coding agent and a local Jupyter Notebook.

The coding agent can be started using one of the following ways.

# DEFAULT: Run with opencode backend, sends data to `opencode serve` endpoint
docker run -p 8000:8000 --gpus all --rm -it neuml/ncoder

# ALTERNATIVE 1: Run with qwen3-coder, keeps all data local
docker run -p 8000:8000 -e CONFIG=qwen3-coder.yml -gpus all \
--rm -it neuml/ncoder

# ALTERNATIVE 2: Run with a custom txtai workflow
docker run -p 8000:8000 -v config:/config -e CONFIG=/config/config.yml \
 --gpus all --rm -it neuml/ncoder

Running in a sandboxed environment decouples AI coding from your local working environment. Running in isolation provides assurance that it won’t modify your workspace directly.

Next, install the Jupyter Notebook extension on your local machine.

pip install ncoder

Jupyter Notebooks can be created in Visual Studio Code or your preferred notebook platform. Add the following two sections to any notebook to test.

# Load ncoder extension
%load_ext ncoder

# Test it out
%ncoder Write a Python Hello World Example

An example notebook is available for reference.

The ncoder Jupyter Notebook extension works with any LLM API that has OpenAI API compatibility. It’s simply a matter of setting the correct environment variables.

%env OPENAI_BASE_URL=LLM API URL (e.g. https://api.openai.com/v1)
%env OPENAI_API_KEY=api-key
%env API_MODEL=gpt-5.2

%load_ext ncoder

These same parameters can be used if the sandboxed Docker coding agent is being run using a different configuration (the default url is http://localhost:8000/v1).

Demo

The short video clip below gives a brief overview on how to use ncoder.

Wrapping up

This article introduced ncoder, an open-source AI coding agent that integrates with Jupyter Notebooks. ncoder also provides a sandboxed Docker image to decouple code generation from your working environment.

If you do your research and/or prototyping inside of notebooks, this gives you an easy way to pull in new ideas. Give it a try!

Introducing ncoder was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.

NeuML — 2025 Year in Review

David Mezzetti — Thu, 01 Jan 2026 13:05:40 GMT

NeuML — 2025 Year in Review

Recapping 2025 and looking ahead to 2026

Check out this video recap for a more in depth view of NeuML’s 2025.

NeuML is the company behind txtai, an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.

In 2025, NeuML continued to deliver AI-driven functionality both in open source and with paid consulting efforts. This article recaps the progress made in 2025 and looks ahead to 2026.

TxtAI

https://github.com/neuml/txtai

txtai is an all-in-one AI framework for semantic search, LLM orchestration and language model workflows. This is the foundational piece of software that all of our work stands on.

Highlights for txtai in 2025:

https://www.star-history.com/#neuml/txtai&type=date

⭐2,142 stars on GitHub to bring the total to ⭐11,975
270 total commits on GitHub
145 total issues resolved on GitHub
11 releases. Entered the year at v8.1 and finished at v9.3
10 articles and example notebooks added

Let’s recap the major functionality added.

New in 2025

In 2025, txtai had one major release — 9.0 along with 10 minor releases.

💡 What’s new in txtai 9.0

txtai 9.0 was released in August 2025. This release added SPLADE, ColBERT, MUVERA and Reranking pipelines.

Below is a summary of the major features added in 2025.

At the end of 2025, txtai is a growing and major player in the AI Orchestration Framework space. This illustration is frequently cited on LinkedIn and other social platforms showing txtai's place in the RAG ecosystem.

NeuML is one of the few bootstrapped non-VC backed companies on this list!

Looking ahead into 2026, we’ll continue to evangelize txtai as the premier minimalist AI framework for semantic search, LLM orchestration and language model workflows.

In addition to txtai, NeuML has a number of other open source projects that continue to evolve. This includes PaperAI, PaperETL, RAG and AnnotateAI. Check out our GitHub page for more.

Open Models

NeuML believes in open-source AI. As part of that, we’ve released a number of public models on the Hugging Face Hub. At the end of 2025, NeuML has 61 models available on the Hub and 12 datasets.

Here’s the highlights from 2025.

https://huggingface.co/NeuML/pubmedbert-base-embeddings

Our most popular model on the Hub! This model receives over 150K downloads a month and has almost 10 million lifetime downloads! It’s also been cited at least 56 times according to Google Scholar. This includes popular journals such Nature, Springer and Elsevier.

https://huggingface.co/collections/NeuML/bert-hash-nano-models

Next up is a model series we’re particularly proud of. We’ve long discussed the concept of “micromodels” all the way back to 2023 (see article below).

The big and small of txtai

The BERT Hash Nano Models series introduces a simple technique to significantly reduce model parameter sizes. Instead of the embeddings layer mapping directly to the hidden size, a projection layer is added in.

The article below discusses this is in more detail.

Training Tiny Language Models with Token Hashing

These models perform surprisingly well and come in under 1 million parameters. There are also fine-tunes available for ColBERT.

https://huggingface.co/NeuML/biomedbert-hash-nano

Building on the BERT Hash Series is BiomedBERT Nano. This is a 970K parameter BERT encoder-only model trained on data from PubMed.

Additional fine-tunes are also available.

See the following article for all the details on these models.

Encoding the World's Medical Knowledge into 970K

There are also a number of other models available for Text to Speech, Static Vectorization, Language Identification and more.

Consulting Services

https://neuml.com

NeuML provides consulting services around our open-source stack:

Generative AI Build agents, retrieval-augmented generation (RAG), large language model (LLM) orchestration and chat with your data systems
AI-driven Literature Analysis Automate analysis of unstructured medical, scientific and technical literature
Model Development Create AI, Embeddings and/or LLM models that excel in industry-specific domains
Advisory and Strategy Leverage our expertise to plan your data, engineering and AI strategy
Proposals Integrate AI into your technical proposals utilizing our knowledge of industry trends
Development Support Meet with us, get txtai implementation guidance and/or outsource development

While we keep the details of our consulting engagements private, this is the primary revenue stream for NeuML. It’s crucial that both our open source projects and open models are demonstrative of our core capabilities and that can be translated into paid work.

The ideal consulting client for NeuML is small to medium companies. One misconception is that some projects “aren’t worth our time”. NeuML sets out to apply machine learning to solve everyday problems. We’re interested in solving real problems!

While we’ve developed Subject Matter Expertise in the medical space, our techniques apply to almost any business area. Schedule a meeting or send a message to learn more.

Rating our progress in 2025

We’ve covered quite a lot of information recapping 2025. Next, let’s discuss how we stacked up against what we set out to do back in January. Each goal will be rated from 1–5 with 5 being the highest and 1 the lowest.

These were the goals set at the beginning of the year. Each goal is an abbreviated version from NeuML’s 2024 Year in Review article.

TxtAI 10K

Surpass 10K stars on GitHub

⭐⭐⭐⭐⭐ (5 of 5)

🚀 Mission accomplished! We’re ending the year a little under 12K stars.

While the overall star growth was lower in 2025 vs previous years, this is mainly due to the lack of trending posts. Our efforts were focused elsewhere vs working to get txtai to trend on social.

With that being said, 10K stars is a great accomplishment and helps validate txtai as a major player in the space.

NeuML as a leading voice in AI Community

Be a vocal leader in the AI Community and a trusted voice

⭐⭐⭐⭐ (4 of 5)

NeuML strives to be a voice of reason in a space of unreasonable expectations that is AI. We’re measured and realistic. With that being said, we do believe in the promise of AI.

We’ve built a vibrant community of followers on LinkedIn, X, Facebook and more recently Reddit. In late 2025, Reddit experienced a large surge in activity.

Below is where we stand as of late 2025.

LinkedIn

https://www.linkedin.com/company/neuml

X aka Twitter

https://x.com/neumll

Facebook

https://www.facebook.com/people/NeuML/100057403391445/

Reddit

https://reddit.com/r/txtai

While LinkedIn is our largest audience, Reddit is surging rapidly! We’ve added almost 800 members to r/txtai in the last month of 2025. If this pace holds, Reddit will overtake our other social platforms in 2026.

Overall, our engagement has been great. The only reason this isn’t a 5 is that we didn’t do in-person conferences or speaking engagements.

Monetization of our place in the AI space

Convert open source and open model work into revenue streams

⭐⭐⭐ (3 of 5)

While this data is not being shared publicly, we’re generally on the right track. It’s notoriously difficult to translate open source work into paying customer streams. Many open source companies build a large following and project only to find there isn’t a viable path to income. People like the project but aren’t going to pay for it.

Our current focus on consulting work has added value. We’ve also received 300+ submissions of interest for txtai.cloud. So that is also an area to explore.

Custom models fine-tuned to specific business areas or tasks are also a growth area. Custom models are AI framework agnostic, which potentially could add customers who aren’t using our stack.

Overall

In 2025, the self-proclaimed score for NeuML is 🥁 🎶

⭐⭐⭐⭐ (12 of 15)

This averages out to a 4 out of 5.

NeuML has much to be proud of on this journey to date. Building an open source project with over 10K stars back in 2020 seemed unimaginable. Even a 1,000 stars seemed far fetched. That’s a result of an amazing amount of dedicated effort over a long period of time. But there is still much to do.

Playbook for 2026

Looking ahead to 2026, we’ll focus on the following areas.

Monetization of our place in the AI space

Normally we don’t like duplicating our goals year over year but this one deserves continued emphasis.

More customers, more projects, more revenue is the mantra here. Making it easier to engage with NeuML is also important. Not all consulting engagements have to be long lasting. We should ensure we have methods to make working with NeuML in a limited manner easy (i.e. a simple payment page).

We should also investigate non-consulting revenue streams such as deciding what to do with txtai.cloud.

Be a leader in the vector retrieval space

txtai has it’s roots as a vector database and a retrieval platform. While it has many pipelines for AI orchestration, it’s built on a foundation of an embeddings database.

In 2026, we’ll work to grow our presence in the vector retrieval space. This is less about frameworks and more about developing models and techniques. Similar to our work with BERT Hash.

Publish papers covering our work

While we publish plenty of blog posts summarizing our work, in order to maximize visibility, we need to publish our work. We can start with submitting to pre-publication servers such as ArXiv.

Even AI has written a paper covering our work 😀

Ways to find NeuML

The full list of ways to interact with NeuML is shown is below.

Contact us
Website | Meet | Email | Slack

Code
GitHub | Docker Hub | HF Spaces | Cloud

Social Media
LinkedIn | Twitter | Facebook | YouTube | Reddit

Articles
Medium | Hashnode | dev.to | Newsletter

Consulting Support
Need help with txtai? Struggling to build your own datasets to power AI systems? Want to train your own embeddings models? Need AI strategy support?

Book an intro meeting or email us to discuss how NeuML can provide advisory support and/or development assistance.

Wrapping up

This article covered the state of NeuML at the end of 2025 and our plans for 2026. Thank you for reading. Please follow along and check in on how we’re doing over the course of 2026.

Interested in NeuML’s history? Then read the recaps from 2020, 2021, 2022, 2023 and 2024.

NeuML — 2025 Year in Review was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.

Training Tiny Language Models with Token Hashing

David Mezzetti — Thu, 09 Oct 2025 17:42:50 GMT

Learn how a simple tweak can drastically reduce model sizes

This article introduces the BERT Hash series of encoder models. Encoder models are the engine behind generating vector embeddings, text classification, entity extraction and more. The BERT Hash architecture is a simple tweak of the well known and legendary BERT model.

The Generative AI era is all about building bigger models — more parameters, more GPUs and more training data. Bigger is better is the mantra. Few are exploring the other side of the spectrum, efficiency and doing more with less. Sure LLMs can do some of the same tasks an encoder model can do but the encoder model is often a better choice.

Let’s jump in.

Challenges with Tiny Language Models

Language models can be made small in a number of ways. The following is a non-exhaustive list.

Changing the vocabulary size
Changing the number of hidden dimensions
Changing the number of attention heads or layers
Create a custom model architecture

When models get into the single digit millions of parameters, the vast majority of the parameter count is dedicated to the token embeddings layer.

With BERT Tiny, a 4.43M parameter model, 3.94M of the parameters are allocated to the token embeddings as shown below.

30,522 Vocabulary Size * 128 Hidden Dimensions = 3,936,640 parameters

The first step with BERT is running the tokens through a 30,522 x 128 embeddings matrix. These parameters are learned at training time and are known as the token embeddings.

The BERT Tiny model already has reduced the side of the hidden dimensions from 768 to 128 and the number of layers and attention heads from 12 each to 2. This is how the model size was reduced from 110M parameters to 4.43M.

Changing the vocabulary size is one way to reduce this. Let’s say we limit the vocabulary to 1000 tokens.

1,000 Vocabulary Size * 128 Hidden Dimensions = 128,000 parameters

The problem with this is that now we’ll generate many more tokens to represent the same text, which could in the end increase overall computation time.

What if the same vocabulary could be used but we can still reduce the number of parameters used by the token embeddings layer? Enter BERT Hash.

BERT Hash Architecture

BERT Hash is a very straightforward modification of the token embeddings layer of a BERT model. Instead of the embeddings layer mapping directly to the hidden size, a projection layer is added in.

Let’s say we use 16 projections with 128 hidden dimensions.

30,522 Vocabulary size * 16 projections + 16 * 128 hidden = 490,400 parameters

This change along brings parameter count for a BERT Tiny model down to 950K from 4.4M, only 22% of the original model size.

The code for this component is quite simple.

from torch import nn

class BertHashTokens(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config

        # Token embeddings
        self.embeddings = nn.Embedding(
          config.vocab_size,
          config.projections,
          padding_idx=config.pad_token_id
        )

        # Token embeddings projections
        self.projections = nn.Linear(
           config.projections,
           config.hidden_size
        )

    def forward(self, input_ids):
        # Project embeddings to hidden size
        return self.projections(self.embeddings(input_ids))

That’s it. Everything else stays the same! This method is inspired by MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings.

BERT Hash Nano Model Series

Pre-trained models are available on the Hugging Face Model Hub for a number of configurations using this strategy.

BERT Hash Nano Models - a NeuML Collection

These models are pre-trained on the same training corpus as BERT (with a copy of Wikipedia from 2025) as recommended in the paper Well-Read Students Learn Better: On the Importance of Pre-training Compact Models.

Below is a subset of GLUE scores for these models.

Note that the nano model has a small drop off from the original BERT Tiny model but remember it’s only 22% of the size.

The training scripts are available on the model pages. It’s extremely straightforward. See this article for more on training language models from scratch.

ColBERT models are also trained on top of these models. See this collection for more.

ColBERT - a NeuML Collection

Ideas for Future Work

These models were trained on the standard BERT training dataset. But we can take advantage of all the great open datasets being released for LLM training.

The dataset below, fine web could be a great place to start for a general model.

HuggingFaceFW/fineweb · Datasets at Hugging Face

There is also a version of this with domain labels. This would enable training a tiny specific model on say the medical or sports domain.

m-a-p/FineFineWeb · Datasets at Hugging Face

While the method describe here is for an encoder model, the same idea could be explored for decoder / generative models. Perhaps it could be combined with the ideas from this paper to build a tiny reasoning LLM.

Less is More: Recursive Reasoning with Tiny Networks

Wrapping Up

This article introduced the BERT Hash model series. It explores the often under-explored area of small models and getting more for less.

Let’s see what you can do with it!

Training Tiny Language Models with Token Hashing was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.

What’s new in txtai 9.0

David Mezzetti — Thu, 28 Aug 2025 17:16:17 GMT

SPLADE, ColBERT, MUVERA and Reranking pipelines

txtai is an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.

The 9.0 release adds first class support for sparse vector models (i.e. SPLADE), late interaction models (i.e. ColBERT), fixed dimensional encoding (i.e. MUVERA) and reranking pipelines ✨

The embeddings framework was overhauled to seamlessly support both sparse and dense vector models. Previously, sparse vector support was limited to keyword/term indexes. Now learned sparse retrieval models such as SPLADE are supported. These models can help improve the accuracy of retrieval/search operations, which also improves RAG and Agents.

Support for late interaction models, such as ColBERT, were also added to the embeddings framework. Unlike traditional vector models that pool outputs into single vector outputs, late interaction models produce multiple vectors. These models are paired with the MUVERA algorithm to transform multiple vectors into fixed dimensional single vectors for search.

LLMs are quickly converging to produce similar outputs for similar inputs and becoming standard commodities. The retrieval or context layer makes or breaks projects. This is known as putting the R in RAG!

Standard upgrade disclaimer below

While everything is backwards compatible, it’s prudent to backup production indexes before upgrading and test before deploying.

Install dependencies

Install txtai and all dependencies.

pip install txtai[ann,vectors]

Sparse vector indexes

The first major change added with this release is learned sparse retrieval (aka sparse vector indexes) models. This effort was multi-faceted in that it required both changes to how vectors were generated as well as how they are stored.

txtai uses approximate nearest neighbor (ANN) search for it's vector search operations. The default library is Faiss. There is support for other libraries but in all cases the existing ANN backends only supported dense (i.e. NumPy) vectors.

There aren’t many options out there for sparse ANN search that supports txtai requirements, so IVFSparse was introduced. IVFSparse is an Inverted file (IVF) index with flat vector file storage and sparse array support. There is also support for storing sparse vectors in Postgres via pgvector.

Let’s see it in action.

from txtai import Embeddings

# Works with a list, dataset or generator
data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]
# Create an embeddings
embeddings = Embeddings(sparse=True, content=True)
embeddings.index(data)
embeddings.search("North America", 10)

[{'id': '0',
  'text': 'US tops 5 million confirmed virus cases',
  'score': 0.019873601198196412},
 {'id': '1',
  'text': "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  'score': 0.018737798929214476}]

Late interaction models

Late interaction models encode data into multi-vector outputs. In other words, multiple input tokens map to multiple output vectors. Then at search time, the maximum similarity algorithm is used to find the best matches between the corpus and a query. This algorithm has achieved excellent results on retrieval benchmarks such as MTEB.

The downside of this approach is that it produces multiple vectors as opposed a single vector for each input. For example, if a text element tokenizes to many input tokens, there will be many output vectors vs a single one as with standard pooled vector approaches.

Starting with the 9.0 release, late interaction models are supported with embeddings instances. Late interaction vectors will be transformed into fixed dimensional vectors using the MUVERA algorithm. See below.

from txtai import Embeddings

# Works with a list, dataset or generator
data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]
# Create an embeddings
embeddings = Embeddings(path="colbert-ir/colbertv2.0", content=True)
embeddings.index(data)
embeddings.search("North America", 10)

[{'id': '0',
  'text': 'US tops 5 million confirmed virus cases',
  'score': 0.04216160625219345},
 {'id': '1',
  'text': "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  'score': 0.029944246634840965},
 {'id': '3',
  'text': 'The National Park Service warns against sacrificing slower friends in a bear attack',
  'score': 0.015931561589241028}]

Reranking pipeline

Another major new component in this release is the Reranker pipeline. This pipeline takes an embeddings instance, a similarity instance and uses the similarity instance to rerank outputs. This is a key component of the MUVERA paper — using the standard vector index to retrieve candidates then reranking the outputs using the late interaction model.

from txtai import Embeddings
from txtai.pipeline import Reranker, Similarity

# Works with a list, dataset or generator
data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]
# Create an embeddings
embeddings = Embeddings(path="colbert-ir/colbertv2.0", content=True)
embeddings.index(data)
similarity = Similarity(path="colbert-ir/colbertv2.0", lateencode=True)
ranker = Reranker(embeddings, similarity)
ranker("North America")

[{'id': '1',
  'text': "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  'score': 0.3324427008628845},
 {'id': '0',
  'text': 'US tops 5 million confirmed virus cases',
  'score': 0.24423550069332123},
 {'id': '3',
  'text': 'The National Park Service warns against sacrificing slower friends in a bear attack',
  'score': 0.16353240609169006}]

Notice that while the outputs are the same, the scoring and order is different.

Let’s try a more interesting example.

from txtai import Embeddings
from txtai.pipeline import Reranker, Similarity

# Create an embeddings
embeddings = Embeddings()
embeddings.load(provider="huggingface-hub", container="neuml/txtai-wikipedia")
similarity = Similarity(path="colbert-ir/colbertv2.0", lateencode=True)
ranker = Reranker(embeddings, similarity)
ranker("Tell me about ChatGPT")

[{'id': 'ChatGPT',
  'text': 'ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other multimodal models to create human-like responses in text, speech, and images. It has access to features such as searching the web, using apps, and running programs. It is credited with accelerating the AI boom, an ongoing period of rapid investment in and public attention to the field of artificial intelligence (AI). Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.',
  'score': 0.6639302968978882},
 {'id': 'ChatGPT Search',
  'text': 'ChatGPT Search (originally SearchGPT) is a search engine developed by OpenAI. It combines traditional search engine features with generative pretrained transformers (GPT) to generate responses, including citations to external websites.',
  'score': 0.6477508544921875},
 {'id': 'ChatGPT in education',
  'text': 'The usage of ChatGPT in education has sparked considerable debate and exploration. ChatGPT is a chatbot based on large language models (LLMs) that was released by OpenAI in November 2022.',
  'score': 0.5918337106704712}]

Wrapping up

This article gave a quick overview of txtai 9.0. Updated documentation and more examples will be forthcoming. There is much to cover and much to build on!

See the following links for more information.

💡 What’s new in txtai 9.0 was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.

Introducing txtai, the all-in-one AI framework

David Mezzetti — Wed, 23 Apr 2025 12:25:02 GMT

Build autonomous agents, retrieval augmented generation (RAG) processes, multi-model workflows and more

This is an updated version of the original article.

AI is rapidly evolving with a number of new developments. Large-scale generative language models are an exciting new capability allowing us to add amazing functionality. Innovation continues with new models and advancements coming in at what seems a weekly basis.

It’s hard to filter through the noise and know what is realistic today. While we’re not yet at full AI automation, there are plenty of ways to integrate AI into business workflows.

This article introduces txtai, an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.

Introducing txtai

txtai is an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.

The key component of txtai is an embeddings database, which is a union of vector indexes (sparse and dense), graph networks and relational databases.

This foundation enables vector search and/or serves as a powerful knowledge source for large language model (LLM) applications.

Build autonomous agents, retrieval augmented generation (RAG) processes, multi-model workflows and more.

The following is a summary of key features:

🔎 Vector search with SQL, object storage, topic modeling, graph analysis and multimodal indexing
📄 Create embeddings for text, documents, audio, images and video
💡 Pipelines powered by language models that run LLM prompts, question-answering, labeling, transcription, translation, summarization and more
↪️️ Workflows to join pipelines together and aggregate business logic. txtai processes can be simple microservices or multi-model workflows.
🤖 Agents that intelligently connect embeddings, pipelines, workflows and other agents together to autonomously solve complex problems
⚙️ Web and Model Context Protocol (MCP) APIs. Bindings available for JavaScript, Java, Rust and Go.
🔋 Batteries included with defaults to get up and running fast
☁️ Run local or scale out with container orchestration

txtai is built with Python 3.10+, Hugging Face Transformers, Sentence Transformers and FastAPI. txtai is open-source under an Apache 2.0 license.

NeuML is the company behind txtai and we provide AI consulting services around our stack. Schedule a meeting or send a message to learn more.

We’re also building an easy and secure way to run hosted txtai applications with txtai.cloud.

Install and run txtai

txtai can be installed via pip or Docker. The following shows how to install via pip.

pip install txtai

Semantic search

Embeddings databases are the engine that delivers semantic search. Data is transformed into embeddings vectors where similar concepts will produce similar vectors. Indexes both large and small are built with these vectors. The indexes are used to find results that have the same meaning, not necessarily the same keywords.

The basic use case for an embeddings database is building an approximate nearest neighbor (ANN) index for semantic search. The following example indexes a small number of text entries to demonstrate the value of semantic search.

from txtai import Embeddings

# Works with a list, dataset or generator
data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]

# Create an embeddings
embeddings = Embeddings(path="sentence-transformers/nli-mpnet-base-v2")

# Create an index for the list of text
embeddings.index(data)

print("%-20s %s" % ("Query", "Best Match"))
print("-" * 50)

# Run an embeddings search for each query
for query in ("feel good story", "climate change", 
    "public health story", "war", "wildlife", "asia",
    "lucky", "dishonest junk"):
  # Extract uid of first result
  # search result format: (uid, score)
  uid = embeddings.search(query, 1)[0][0]

  # Print text
  print("%-20s %s" % (query, data[uid]))

The example above shows that for all of the queries, the query text isn’t in the data. This is the true power of transformers models over token based search. What you get out of the box is 🔥🔥🔥!

Updates and deletes

Updates and deletes are supported for embeddings. The upsert operation will insert new data and update existing data

The following section runs a query, then updates a value changing the top result and finally deletes the updated value to revert back to the original query results.

# Run initial query
uid = embeddings.search("feel good story", 1)[0][0]
print("Initial: ", data[uid])

# Create a copy of data to modify
udata = data.copy()

# Update data
udata[0] = "See it: baby panda born"
embeddings.upsert([(0, udata[0], None)])
uid = embeddings.search("feel good story", 1)[0][0]
print("After update: ", udata[uid])

# Remove record just added from index
embeddings.delete([0])

# Ensure value matches previous value
uid = embeddings.search("feel good story", 1)[0][0]
print("After delete: ", udata[uid])

Initial:  Maine man wins $1M from $25 lottery ticket
After update:  See it: baby panda born
After delete:  Maine man wins $1M from $25 lottery ticket

Persistence

Embeddings can be saved to storage and reloaded.

embeddings.save("index")

embeddings = Embeddings()
embeddings.load("index")

uid = embeddings.search("climate change", 1)[0][0]
print(data[uid])

Canada's last fully intact ice shelf has suddenly collapsed, forming a
Manhattan-sized iceberg

Hybrid search

While dense vector indexes are by far the best option for semantic search systems, sparse keyword indexes can still add value. There may be cases where finding an exact match is important.

Hybrid search combines the results from sparse and dense vector indexes for the best of both worlds.

# Create an embeddings
embeddings = Embeddings(
  hybrid=True,
  path="sentence-transformers/nli-mpnet-base-v2"
)

# Create an index for the list of text
embeddings.index(data)

print("%-20s %s" % ("Query", "Best Match"))
print("-" * 50)

# Run an embeddings search for each query
for query in ("feel good story", "climate change", 
    "public health story", "war", "wildlife", "asia",
    "lucky", "dishonest junk"):
  # Extract uid of first result
  # search result format: (uid, score)
  uid = embeddings.search(query, 1)[0][0]

  # Print text
  print("%-20s %s" % (query, data[uid]))

Same results as with semantic search. Let’s run the same example with just a keyword index to view those results.

# Create an embeddings
embeddings = Embeddings(keyword=True)

# Create an index for the list of text
embeddings.index(data)

print(embeddings.search("feel good story"))
print(embeddings.search("lottery"))

[]
[(4, 0.5234998733628726)]

See that when the embeddings instance only uses a keyword index, it can’t find semantic matches, only keyword matches.

Content storage

Up to this point, all the examples are referencing the original data array to retrieve the input text. This works fine for a demo but what if you have millions of documents? In this case, the text needs to be retrieved from an external datastore using the id.

Content storage adds an associated database (i.e. SQLite, DuckDB) that stores associated metadata with the vector index. The document text, additional metadata and additional objects can be stored and retrieved right alongside the indexed vectors.

# Create embeddings with content enabled.
# The default behavior is to only store indexed vectors.
embeddings = Embeddings(
  path="sentence-transformers/nli-mpnet-base-v2",
  content=True,
  objects=True
)

# Create an index for the list of text
embeddings.index(data)

print(embeddings.search("feel good story", 1)[0]["text"])

Maine man wins $1M from $25 lottery ticket

The only change above is setting the content flag to True. This enables storing text and metadata content (if provided) alongside the index. Note how the text is pulled right from the query result!

Let’s add some metadata.

Query with SQL

When content is enabled, the entire dictionary is stored and can be queried. In addition to vector queries, txtai accepts SQL queries. This enables combined queries using both a vector index and content stored in a database backend.

# Create an index for the list of text
embeddings.index([{"text": text, "length": len(text)} for text in data])

# Filter by score
print(embeddings.search("select text, score from txtai where similar('hiking danger') and score >= 0.15"))

# Filter by metadata field 'length'
print(embeddings.search("select text, length, score from txtai where similar('feel good story') and score >= 0.05 and length >= 40"))

# Run aggregate queries
print(embeddings.search("select count(*), min(length), max(length), sum(length) from txtai"))

[{'text': 'The National Park Service warns against sacrificing slower friends in a bear attack', 'score': 0.3151373863220215}]
[{'text': 'Maine man wins $1M from $25 lottery ticket', 'length': 42, 'score': 0.08329027891159058}]
[{'count(*)': 6, 'min(length)': 39, 'max(length)': 94, 'sum(length)': 387}]

This example above adds a simple additional field, text length.

Note the second query is filtering on the metadata field length along with a similar query clause. This gives a great blend of vector search with traditional filtering to help identify the best results.

Object storage

In addition to metadata, binary content can also be associated with documents. The example below downloads an image, upserts it along with associated text into the embeddings index.

import urllib

from IPython.display import Image

# Get an image
request = urllib.request.urlopen("https://raw.githubusercontent.com/neuml/txtai/master/demo.gif")

# Upsert new record having both text and an object
embeddings.upsert([("txtai", {"text": "txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.", "object": request.read()}, None)])

# Query txtai for the most similar result to "machine learning" and get associated object
result = embeddings.search("select object from txtai where similar('machine learning') limit 1")[0]["object"]

# Display image
Image(result.getvalue(), width=600)

Topic modeling

Topic modeling is enabled via semantic graphs. Semantic graphs, also known as knowledge graphs or semantic networks, build a graph network with semantic relationships connecting the nodes. In txtai, they can take advantage of the relationships inherently learned within an embeddings index.

# Create embeddings with a graph index
embeddings = Embeddings(
  path="sentence-transformers/nli-mpnet-base-v2",
  content=True,
  functions=[
    {"name": "graph", "function": "graph.attribute"},
  ],
  expressions=[
    {"name": "category", "expression": "graph(indexid, 'category')"},
    {"name": "topic", "expression": "graph(indexid, 'topic')"},
  ],
  graph={
    "topics": {
      "categories": ["health", "climate", "finance", "world politics"]
    }
  }
)

embeddings.index(data)
embeddings.search("select topic, category, text from txtai")

[{'topic': 'confirmed_cases_us_5',
  'category': 'health',
  'text': 'US tops 5 million confirmed virus cases'},
 {'topic': 'collapsed_iceberg_ice_intact',
  'category': 'climate',
  'text': "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg"},
 {'topic': 'beijing_along_craft_tensions',
  'category': 'world politics',
  'text': 'Beijing mobilises invasion craft along coast as Taiwan tensions escalate'}]

When a graph index is enabled, topics are assigned to each of the entries in the embeddings instance. Topics are dynamically created using a sparse index over graph nodes grouped by community detection algorithms.

Topic categories are also be derived as shown above.

Subindexes

Subindexes can be configured for an embeddings. A single embeddings instance can have multiple subindexes each with different configurations.

We’ll build an embeddings index having both a keyword and dense index to demonstrate.

# Create embeddings with subindexes
embeddings = Embeddings(
  content=True,
  defaults=False,
  indexes={
    "keyword": {
      "keyword": True
    },
    "dense": {
      "path": "sentence-transformers/nli-mpnet-base-v2"
    }
  }
)
embeddings.index(data)

embeddings.search("feel good story", limit=1, index="keyword")

[]

embeddings.search("feel good story", limit=1, index="dense")

[{'id': '4',
  'text': 'Maine man wins $1M from $25 lottery ticket',
  'score': 0.08329027891159058}]

Once again, this example demonstrates the difference between keyword and semantic search. The first search call uses the defined keyword index, the second uses the dense vector index.

LLM orchestration

txtai is an all-in-one AI framework. txtai supports building autonomous agents, retrieval augmented generation (RAG), chat with your data, pipelines and workflows that interface with large language models (LLMs).

The RAG pipeline is txtai’s spin on retrieval augmented generation (RAG). This pipeline extracts knowledge from content by joining a prompt, context data store and generative model together.

The following example shows how a large language model (LLM) can use an embeddings database for context.

from txtai import RAG

# Create embeddings
embeddings = Embeddings(path="sentence-transformers/nli-mpnet-base-v2", content=True, autoid="uuid5")

# Create an index for the list of text
embeddings.index(data)

# RAG Prompt Template
template = """
  Answer the following question using the provided context.

  Question:
  {question}

  Context:
  {context}
"""

# Create and run RAG instance
rag = RAG(embeddings, "Qwen/Qwen3-0.6B", template=template, output="reference")
rag("What country is having issues with climate change?")

{'answer': 'Canada is having issues with climate change.',
 'reference': 'da633124-33ff-58d6-8ecb-14f7a44c042a'}

The logic above first builds an embeddings index. It then loads a LLM and uses the embeddings index to drive a LLM prompt.

The RAG pipeline can optionally return a reference to the id of the best matching record with the answer. That id can be used to resolve the full answer reference. Note that the embeddings above used an uuid autosequence.

uid = rag(prompt("What country is having issues with climate change?"))["reference"]
embeddings.search(f"select id, text from txtai where id = '{uid}'")

[{'id': 'da633124-33ff-58d6-8ecb-14f7a44c042a',
  'text': "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg"}]

LLM inference can also be run standalone.

from txtai import LLM

# Default LLM is granite-4.0-350m
# Supports any LLM (Hugging Face, llama.cpp, Ollama, vLLM, OpenAI, Claude etc)
# See https://neuml.github.io/txtai/pipeline/llm/llm
llm = LLM()
llm("Say the name of 1 place to visit in Washington, DC")

One of the most popular and iconic places to visit in Washington, DC is the National Mall.

Language model workflows

Language model workflows, also known as semantic workflows, connect language models together to build intelligent applications.

Workflows can run right alongside an embeddings instance, similar to a stored procedure in a relational database. Workflows can be written in either Python or YAML. We’ll demonstrate how to write a workflow with YAML.

# Embeddings instance
writable: true
embeddings:
  path: sentence-transformers/nli-mpnet-base-v2
  content: true
  functions:
    - {name: translation, argcount: 2, function: translation}

# Translation pipeline
translation:

# Workflow definitions
workflow:
  search:
    tasks:
      - search
      - action: translation
        args:
          target: fr
        task: template
        template: "{text}"

The workflow above loads an embeddings index and defines a search workflow. The search workflow runs a search and then passes the results to a translation pipeline. The translation pipeline translates results to French.

from txtai import Application

# Build index
app = Application("embeddings.yml")
app.add(data)
app.index()

# Run workflow
list(app.workflow(
  "search", 
  ["select text from txtai where similar('feel good story') limit 1"]
))

['Maine homme gagne $1M à partir de $25 billet de loterie']

SQL functions, in some cases, can accomplish the same thing as a workflow. The function below runs the translation pipeline as a function.

app.search("select translation(text, 'fr') text from txtai where similar('feel good story') limit 1")

[{'text': 'Maine homme gagne $1M à partir de $25 billet de loterie'}]

LLM chains with templates are also possible with workflows. Workflows are self-contained, they operate both with and without an associated embeddings instance. The following workflow uses a LLM to conditionally translate text to French and then detect the language of the text.

llm:
  path: Qwen/Qwen3-4B-Instruct-2507

workflow:
  chain:
    tasks:
      - task: template
        template: Translate text '{statement}' to {language} if the text is English, otherwise keep the original text
        action: llm
      - task: template
        template: What language is the following text. Only print the answer? {text}
        action: llm

inputs = [
  {"statement": "Hello, how are you", "language": "French"},
  {"statement": "Hallo, wie geht's dir", "language": "French"}
]

app = Application("workflow.yml")
list(app.workflow("chain", inputs))

['French', 'German']

Wrapping up

AI is advancing at a rapid pace. Things not possible even a year ago are now possible. This article introduced txtai, an all-in-one AI framework. The possibilities are limitless and we’re excited to see what can be built on top of txtai!

Visit the links below for more.

GitHub | Documentation | Examples

Introducing txtai, the all-in-one AI framework was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to succeed with AI Agents — it starts with your data

David Mezzetti — Fri, 18 Apr 2025 17:03:24 GMT

Let’s talk about the 🔥 topic, what’s next and how to win in 2025

We’re about a third of the way through 2025 (at the time of this article). One topic we can’t hide from is “AI Agents”. It’s an overused term that means many things to many people.

For the purpose of this article, we’ll define an AI Agent as:

Connects to a Large Language Model (LLM)
Has access to a list of tools and/or other agents
Breaks a task into an iterative series of steps, looping until completion

Anyone can come up with a simple demo that shows how a build a trip booking assistant, a web research agent or coding assistant. The real power for enterprise businesses is the ability to connect an agent with internal data and knowledge.

In this article, we’ll step through the main components of an AI Agent, talk about where this is all heading and how to be successful.

Large Language Models (LLMs)

LLMs are the engine behind most AI Agents. The LLM takes a generated prompt with data, tools and a user query then generates actions.

An AI Agent shouldn’t be hardwired to a specific LLM. It should be easy to switch between LLM providers. An AI Agent shouldn’t care if the LLM is local or run via an API.

So while it’s easy to just use the OpenAI Python library, it’s best to use a provider that abstracts LLM inference. Examples of this are txtai (a library built by NeuML), LangChain, Llama Index and LiteLLM.

In the case of txtai agents, the LLM provider is automatically inferred.

from txtai import Agent

# Local Transformers model
agent = Agent(model="meta-llama/Meta-Llama-3.1-8B-Instruct")

# LLM APIs - must also set API key via environment variable
agent = Agent(model="gpt-4o")
agent = Agent(model="claude-3-5-sonnet-20240620")

Access to tools and data

In the world of AI Agents, tools are how external knowledge and data is integrated. A tool could be a web search, vector search, API call or another agent.

For an enterprise business, this is by far the most crucial step of the process. Internal data needs to be made accessible to the AI Agent. There are number of ways to do this. One successful pattern is building compendiums of knowledge that helps the AI Agent access the right data more quickly.

With txtai, internal functions and embeddings databases are supported. Services available via the Model Context Protocol (MCP) are also supported.

What exactly is a compendium of knowledge? Here are a few examples:

Each of these datasets are preprocessed and summarized. For example, the Wikipedia embeddings database stores knowledge-dense abstracts for every Wikipedia article. The same for the ArXiv dataset. The LinkedIn dataset is a list of all NeuML LinkedIn posts and the Astronomy dataset is a curated dataset joining astronomy datasets together.

The goal here is the make it easier for the Agent to reach a final answer. This reduces the overall latency and cost as it lowers the number of LLM calls.

Where is this all heading?

The name of the game is augmentation. The end goal of AI Agents is automating enterprise tasks and augmenting humans.

Many talk about AI Agents as replacements for humans. That conversation is a moral and ethical one. But at the time of this article, we’re still a ways away from that even for those who want it.

In 2025, there are a number of tangible and realistic ways AI Agents can help your business.

Augmented data analysis — Enable your analysts to go through and understand data more quickly.
Research Agents — Domain-specific agents that can provide an initial automated analysis of datasets. Example domains include medical, scientific, legal and national security.
Software issue triage — Task agents to analyze software bug reports and provide an initial triage.

These are all examples of augmenting your workforce. It’s about taking an individual, giving them new capabilities and freeing them up to focus on other matters and do more.

How to be successful

A successful strategy involves selecting the right tools, enabling Agents to have access to the best and most concise data and setting realistic expectations.

If one expects AI Agents to fully replace a development team, heartache is ahead. Coding assistants certainly can augment humans but there are a host of issues as of 2025 regarding the reliability, security and effectiveness of code generated by AI.

If one expects AI Agents to replace your team of data analysts, it likely doesn’t end well.

If one wants to use AI Agents to augment an existing team and perhaps enable a small team to punch above their weight class, now we’re talking. There is much to gain with a strategy like this. This strategy revolves around preparing your data for success with AI Agents and using the right toolkit.

There is much to do and many gains ahead. If you set yourself on the right path and have the right mindset, you’ll be ahead of the curve. Good luck!

Want help with your AI Strategy? Need development guidance and assistance with txtai? Then book a meeting or email us to hear more!

How to succeed with AI Agents — it starts with your data was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.

NeuML — 2024 Year in Review

David Mezzetti — Wed, 01 Jan 2025 20:36:47 GMT

NeuML — 2024 Year in Review

Recapping 2024 and looking ahead to 2025

NeuML is the company behind txtai, an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. We are building a suite of applications to bridge the gap between research and production.

NeuML continued to build on it’s strong open-source foundation in 2024. The majority of our focus throughout the year was on txtai and our consulting efforts. This article will recap the progress made in 2024 and look ahead to 2025.

TxtAI

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. This is the foundational piece of software that all of our work stands on.

Highlights for txtai in 2024:

https://star-history.com/#neuml/txtai&Date

⭐4,043 stars on GitHub to bring the total to ⭐9,833
284 total commits on GitHub
232 total issues resolved on GitHub
10 releases. Entered the year at v6.2.0 and finished at v8.1.0
16 articles and example notebooks added

Let’s recap the major functionality added.

New in 2024

In 2024, txtai had two major releases, 7.0 and 8.0.

💡 What’s new in txtai 7.0

txtai 7.0 was released in February 2024. This release added GraphRAG, LoRA / QLoRA training support, an improved embeddings storage format when content is disabled and binary content support via the API.

💡 What’s new in txtai 8.0

txtai 8.0 was released in November 2024. This release added Agents and Model2Vec support.

Below is a list of all the major features added in 2024.

It’s hard to believe none of this existed on Jan 1, 2024! Let’s briefly cover each of these new additions.

Agents (Added in 8.0)

Agents automatically create workflows to answer multi-faceted user requests. Agents iteratively prompt and/or interface with tools to step through a process and ultimately come to an answer for a request.

The following example articles show how txtai uses agents to iteratively solve complex multi-hop problems.

GraphRAG (Added in 7.0)

Graph path traversal opens up a different type of RAG process. A standard RAG process typically runs a single vector search query and returns the closest matches. Those matches are then passed into a LLM prompt and used to limit the context and help ensure more factually correct answers are generated. Graphs enable more complex analysis.

Advanced RAG with graph path traversal

Speech to Speech RAG (Added in 7.5)

A Speech to Speech (S2S) RAG workflow starts with a microphone pipeline, which streams and processes input audio. The microphone pipeline has voice activity detection (VAD) built-in. When speech is detected, the pipeline returns the captured audio data. Next, the speech is transcribed to text and then passed to a RAG pipeline prompt. Finally, the RAG result is run through a text to speech (TTS) pipeline and streamed to an output audio device.

Speech to Speech RAG

PGVector and Postgres persistence (Added in 7.2)

txtai can now integrate with Postgres, a powerful, production-ready and open source object-relational database system. All major components can be stored in Postgres: vectors, content and graphs.

Integrate txtai with Postgres

LLM backends for llama.cpp and LLM API services (Added in 6.3)

txtai has been and always will be a local-first framework. It was originally designed to run models on local hardware using Hugging Face Transformers. As the AI space has evolved over the last year, so has txtai. Recent changes have added the ability to use these frameworks for vectorization and made it easier to use for LLM inference.

RAG with llama.cpp and external API services

Streaming LLM generation (Added in 7.3)

Prior to this change, all LLM inference calls had to fully wait for the entire LLM response. Streaming generation enables getting results token by token, which reduces the perceived response time to a user.

The Speech to Speech RAG workflow chains a number of streaming pipelines together. See below.

Speech to Speech RAG

Textractor integration with Docling (Added in 8.1)

Up until v8.1, txtai only supported text extraction via Apache Tika. While Apache Tika is a battle-tested project, it depends on Java. This has proven to be problematic for some integrations. Additionally, it doesn’t have support for complex PDF elements such as tables.

Docling is a new open-source text extraction library that gained popularity in late 2024. It has impressive support for complex PDFs (supports tables, formatting, sections).

See this link for an in-depth review.

Model2Vec (Added in 8.0)

Model2Vec is a technique to turn any sentence transformer into a really small static model, reducing model size by 15x and making the models up to 500x faster, with a small drop in performance.

We’ve long had a goal to build micromodels. Model2Vec is one way we’re on this path. We’re also planning to release an exciting Model2Vec model based on work in late 2024 in early 2025. Stay tuned!

💡 What's new in txtai 8.0

Phew — that was a lot! Let’s now take a look at other projects.

Other Projects

In addition to txtai, a number of subprojects have been created over the years. The strategy with each of these projects is to build an initial implementation and support future work based on interest. This is an evolving target, some projects fade away and then even come back!

The following sections cover the major “other project” initiatives in 2024.

RAG

We introduced an open-source Retrieval Augmented Generation (RAG) application built on top of txtai. This application adds a front-end with Streamlit to txtai RAG. It supports Vector RAG and Graph RAG.

This application had 9 releases in 2024 and earned 309 ⭐’s on GitHub. More can be read in the article below.

Introducing RAG with txtai

AnnotateAI

annotateai automatically annotates papers using Large Language Models (LLMs).

A one-line call does the following:

Reads the paper
Finds the title and important key concepts
Goes through each page and finds sections that best emphasis the key concepts
Reads the section and builds a concise short topic
Annotates the paper and highlights those sections

annotateai incorporates txtmarker to highlight PDFs. This is notable as 2024 was the first new release of txtmarker since 2020! This illustrates the point that some projects fade away then come back.

annotateai had 2 releases in 2024 and earned 206 ⭐’s on GitHub. More can be read in the article below.

Introducing AnnotateAI

PaperAI / PaperETL

The biggest current downstream project is paperai. paperai is a semantic search and workflow application for medical/scientific papers. It helps automate tedious literature reviews allowing researchers to focus on their core work. paperetl is a companion project for parsing medical literature. The paperai stack has been integrated in a number of NeuML’s consulting efforts.

paperai had 1 release in 2024 and earned 212 ⭐’s on GitHub. paperetl has 1 release in 2024 and earned 93 ⭐’s on GitHub.

Public Models

NeuML believes in open-source AI. As part of that, we’ve released a number of public models on the Hugging Face Hub. In 2024, we released 8 new or updated models to the Hub!

NeuML (NeuML)

Our repositories page on GitHub is continuously updated and projects that are no longer supported are marked as “Public Archive”. Otherwise, projects on that page will continue to be supported on an as-needed basis.

Consulting Services

NeuML provides consulting services around our open-source stack:

Generative AI Build agents, retrieval-augmented generation (RAG), large language model (LLM) orchestration and chat with your data systems
AI-driven Literature Analysis Automate analysis of unstructured medical, scientific and technical literature
Model Development Create AI, Machine Learning and/or NLP models that excel in industry-specific domains
Advisory and Strategy Leverage our expertise to plan your data, engineering and AI strategy
Speaking Engagements Discuss txtai, industry trends, insights and developments in the space with your team
Paid Support Meet with us, receive a private Slack channel to ask questions and get implementation guidance

Our efforts in 2024 were once again centered around txtai. Consulting work is symbiotic with our open-source projects, each helping to push the other ahead.

Revenue separates hobby projects from projects that are part of a company. Consulting is the main source of revenue for NeuML and required for the viability of the company, as it is structured today.

In 2024, NeuML had a mix of hands-on consulting projects, primarily focused on Medical RAG and advisory support. Some projects are long-term and continuing, others are shorter periods of support.

Rating our progress in 2024

We’ve covered quite a lot of information already recapping 2024. Next, let’s discuss how we stacked up against what we set out to do back in January. Each goal will be rated from 1–5 with 5 being the highest and 1 the lowest.

These were the goals set at the beginning of the year. Each goal is an abbreviated version from NeuML’s 2023 Year in Review article.

Generative knowledge graphs

Retrieval augmented generation (RAG) powered by knowledge graphs.

⭐⭐⭐⭐⭐ (5 of 5)

txtai was one of the first if not the first project that saw the promise of graphs for search and RAG. GraphRAG was added in txtai 7.0.

This is powered by graph path traversal. Using a path of related nodes as context enables a breadth of knowledge not possible with simple vector search.

GraphRAG was featured in a number of articles and examples in 2024. Switching between vector search and knowledge graph search is now seamless.

Micromodels

Models that can run on limited-resourced systems such as microcontrollers, phones and embedded devices.

⭐⭐⭐⭐⭐ (5 of 5)

Model2Vec was integrated in txtai 8.0. This adds the possibility of embeddings models in the 1M+ parameter range. Work was done on this in late 2024 and expected to be released in early 2025. Stay tuned!

NeuML also added a quantized version of the OpenScholar model. While this isn’t for embedded devices, it does enable running on lesser resourced devices.

Last but not least, we released PubMedBERT Embeddings Matryoshka which allows creating vectors as small as 64 dimensions. It’s important to note this only helps with storage, not processing speed. But that is where the Model2Vec model will come in soon!

Cloud offering

Adding a cloud offering enables rapid development, especially for those with small and/or overloaded technical teams.

⭐⭐ (2 of 5)

As of January 1, 2025 we’ve received over 180 responses to our txtai.cloud form. There has been a sizable interest in this initiative.

There has also been a strong push by the large cloud vendors with new offerings such as AWS Bedrock, GCP Vertex AI and Azure AI.

We’ve taken a “wait and see” approach to decide the best way to do this.

Consulting 2x

In 2024, we’ll set out to double our consulting efforts over what was done in 2023.

⭐⭐⭐⭐ (4 of 5)

It was mission accomplished in terms of doubling our consulting efforts. 2024 brought a good mix of short-term advisory work paired with longer-term development projects. With this approach, there just needs to be a constant focus on building a pipeline of customers given that needs change over time.

There’s certainly room for growth but it was a solid effort in 2024.

Community engagement and training

Speaking engagements and training provide immense value to our open-source projects. We’ll look to do a higher volume of this in 2024.

⭐⭐⭐ (4 of 5)

We’ve been actively engaged online and on social media. Our social media presence grew a great deal in 2024. NeuML just needs to literally not only virtually “get out of the building”.

This will continue to be an area of focus in 2025.

Overall

In 2024, the self-proclaimed score for NeuML is 🥁 🎶

⭐⭐⭐⭐ (20 of 25)

This averages out to a 4 out of 5. Goals are that, goals. Sometimes you hit them and other times you don’t and you learn how to do better.

It was a 5 out of 5 for txtai in 2024, couldn’t have been better! But in terms of capitalizing on this project from a business perspective, we’re looking to do even better in 2025!

Playbook for 2025

Looking ahead to 2025, we’ll focus on the following areas. Our goals are more concise this year than years past. And all three goals are symbiotic and work in tandem together.

TxtAI 10K

With txtai sitting at 9.8K stars, this doesn’t seem very ambitious. This goal is here to celebrate the hard work and fortitude it takes to grow a project with real interest in 2024. While many projects have more stars, txtai has grown organically with real people.

It’s also expected that reaching this milestone will bring new people in and build confidence that txtai is indeed a project that enterprises can build around. It’s been supported now for 5 years and just keeps getting better!

NeuML as a leading voice in AI Community

NeuML has a growing following in what is now a crowded space. In 2025, more should be done to let the world know what NeuML can do, what txtai can do and what our other projects can do.

In addition to our robust online and social media presence, NeuML needs to do more in engaging with customers and the community. Conferences, meet ups and other opportunities to meet directly with those looking to integrate AI into their workflows in 2025.

Monetization of our place in the AI space

Our primary revenue-generating activity to date has been providing consulting support. These consulting projects have driven a number of new initiatives in txtai as it illuminates real-world challenges.

Goals like “Consulting 2x”, “txtai.cloud” are great but it ultimately comes down how to best “monetize” NeuML’s work. AI is a very dynamic space and things change fast, certainly over the course of 12 months.

For the value that txtai is bringing to the space, it still feels like NeuML is leaving upside on the table. The goal is to be better here in 2025.

To sum it up, growing txtai to 10K stars brings new interest to NeuML, which should help the company become a larger voice in the AI space. That foundation and a more concerted focus on customer engagement, will lead to more revenue-generating activities.

Ways to find NeuML

The full list of ways to interact with NeuML is shown is below.

Contact us
Website | Email | Slack

Code
GitHub | Docker Hub | HF Spaces | Cloud

Social Media
LinkedIn | Twitter | Facebook | YouTube | Reddit

Articles
Medium | Hashnode | dev.to | Newsletter

Consulting Support
Need help with txtai? Struggling to build your own datasets to power AI systems? Want a fractional CTO to help with your overall direction?

Reach out to discuss how NeuML can provide advisory support and/or development assistance.

Wrapping up

This article covered the state of NeuML at the end of 2024 and our plans for 2025. We’re incredibly optimistic on the future of the AI space and NeuML!

Thank you for reading. Please follow along and check in on how we’re doing over the course of 2025.

Interested in NeuML’s history? Then read the recaps from 2020, 2021, 2022 and 2023.

NeuML — 2024 Year in Review was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.

Introducing AnnotateAI

David Mezzetti — Thu, 19 Dec 2024 01:58:16 GMT

Automatically annotate papers using LLMs

The volume of research papers is growing at an astronomical rate💫. With modern tooling it’s easier than ever to create a paper. And the breadth of topics is also growing fast. No one person can possibly understand all that is going on heading into 2025.

While LLMs can summarize papers, search papers and build generative text about papers, what about providing human readers with context as they read?

This article introduces annotateai, a project that automatically annotates papers using Large Language Models (LLMs).

Introducing AnnotateAI

annotateai automatically annotates papers using Large Language Models (LLMs).

A one-line call does the following:

Reads the paper
Finds the title and important key concepts
Goes through each page and finds sections that best emphasis the key concepts
Reads the section and builds a concise short topic
Annotates the paper and highlights those sections

annotateai is built with Python 3.9+ and is open-source under an Apache 2.0 license.

Install and run AnnotateAI

This following shows how to install annotateai

pip install annotateai

Examples

annotateai can annotate any PDF but it works especially well for medical and scientific papers. The following shows a series of examples using papers from arXiv.

This project also works well with papers from PubMed, bioRxiv and medRxiv!

Setup

The primary input parameter is the path to the LLM. This project is backed by txtai and it supports any txtai-supported LLM.

from annotateai import Annotate

# Lightweight but powerful default model
annotate = Annotate("Qwen/Qwen3-4B-Instruct-2507")

# The previous default model uses the now deprecated AutoAWQ library
# Run pip install autoawq to enable
# Note as time goes on, this may require pinning to older versions of transformers & torch
annotate = Annotate("NeuML/Llama-3.1_OpenScholar-8B-AWQ")

# llama.cpp version of the above model
# Run pip install llama-cpp-python to enable
annotate = Annotate(
  "bartowski/Llama-3.1_OpenScholar-8B-GGUF/Llama-3.1_OpenScholar-8B-Q4_K_M.gguf"
)

Annotate paper “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”

This paper proposed RAG before most of us knew we needed it.

annotate("https://arxiv.org/pdf/2005.11401")

Source: https://arxiv.org/pdf/2005.11401

Annotate paper “HunyuanVideo: A Systematic Framework For Large Video Generative Models”

This paper builds the largest open-source video generation model as of Dec 2024.

annotate("https://arxiv.org/pdf/2412.03603v2")

Source: https://arxiv.org/pdf/2412.03603v2

Annotate paper “OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset”

This paper was presented at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks.

annotate("https://arxiv.org/pdf/2406.14657")

Source: https://arxiv.org/pdf/2406.14657

Docker Web Application

neuml/annotateai is a web application available on Docker Hub.

This can be run with the default settings as follows.

docker run -d --gpus=all -it -p 8501:8501 neuml/annotateai

The LLM can also be set via ENV parameters.

docker run -d --gpus=all -it -p 8501:8501 -e LLM=unsloth/gpt-oss-20b-GGUF/gpt-oss-20b-Q4_K_M.gguf -e MAXLENGTH=10000 -e n_ctx=4096 neuml/annotateai

The code for this application can be found in the project’s app folder.

Prompts

The following LLM prompts power annotateai

Find and extract title

The title is extracted from the paper and a highlight is created for it.

Extract the paper title from the following text. Only return the title.

Generate keywords

Keywords or concepts are generated for the paper. These keywords drive the highlighting process. The process goes page by page and highlights sections that best cover the keywords and concepts.

Generate the best highly descriptive keywords for the paper.
Only return the keywords as comma separated.

Generate topic

Once a section is highlighted, a topic is generated for it. This prompt distills the section down into simple terms for the reader.

Create a simple, concise topic name in less than 5 words for the
following text. Only return the topic name.

Wrapping up

This article introduced annotateai, a project that automatically annotates papers using Large Language Models (LLMs). We’re all not experts in many topics. annotateai is here to help broaden our horizons and learn more!

Introducing AnnotateAI was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.

Postgres is all you need for vectors

David Mezzetti — Wed, 11 Dec 2024 19:41:20 GMT

Start small with SQLite and move up to Postgres for the win

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases.

This foundation enables vector search and/or serves as a powerful knowledge source for large language model (LLM) applications.

txtai enables rapid development via local persistence with a SQLite + Faiss ensemble. This setup scales surprisingly well (millions of records) and lets one explore the world of vector search and AI.

From there, txtai makes it easy to switch persistence from local to client-server databases and beyond.

txtai supports the following persistence formats for each of it’s components.

This article is going to cover txtai persistence with Postgres.

Introduction to Vector Search

Over the last few years, vector search burst on to the scene. For those only familiar with keyword search and not exactly sure of the benefits of semantic (aka vector) search, check out the article below for more.

Getting started with semantic search

Vector search has helped improve the accuracy, recall and precision of search overall. As you would guess, vector search requires building vectors.

At a high level, text is tokenized, run through an embeddings model and a series of floating point numbers are returned. These vectors are designed to be compared with other vectors for similarity. The most common metric used is cosine similarity.

The paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks is a great way to learn much more on this topic.

Vector search is also often paired with Large Language Models (LLMs) for more advanced tasks like Retrieval Augmented Generation (RAG). RAG helps reduce the risk of hallucinations by limiting the context in which a LLM can generate answers.

If you’d like to learn more on RAG, see the article below.

Introducing RAG with txtai

With vectors comes the requirement of building the infrastructure to store and search vectors. A whole ecosystem of vector search databases has sprung up.

Source: https://www.sequoiacap.com/article/generative-ai-act-two/

These systems are all great in their own way. Yet in other ways, we’re going to have to rebuild all the production and reliability tools built into more established databases.

What if we could store vectors with something we already know?

Introduction to Postgres

Postgres is a powerful, open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

It’s also one of the most popular databases.

Source: https://db-engines.com/en/ranking

Postgres is a battle-tested system with years of production experience in maintenance, backups, high availability, monitoring and security. On top of that, there is also a whole ecosystem of analytical and BI tools built to work with Postgres.

Postgres doesn’t have a vector type or vector index capability built-in. Originally, there wasn’t a choice and we had to store vectors somewhere else. Now there is a popular open-source vector type and index provided by pgvector.

txtai supports storing content, vectors, graph nodes and more with Postgres + pgvector.

Postgres + pgvector + txtai

The Introducting txtai article goes into the basic use cases for txtai. The first example shows how to store text elements with a Faiss index. It also covers Faiss + SQLite.

Introducing txtai, the all-in-one embeddings database

The example below shows how to build a Faiss + SQLite embeddings index.

from txtai import Embeddings

# Works with a list, dataset or generator
data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]

# Create an embeddings
embeddings = Embeddings(
  path="sentence-transformers/nli-mpnet-base-v2",
  content=True
)

# Create an index for the list of text
embeddings.index(data)
embeddings.save("index")

print(embeddings.search("feel good story", 1))

Running this code will show the entry for:

Maine man wins $1M from $25 lottery ticket

Looking at the index directory, we’ll see the files:

config.json: index configuration
documents: SQLite database with the content
embeddings: Faiss index with the vectors

How hard is it to switch persistence to a running Postgres database?

# Set these ENV variables
# ANN_URL="postgresql+psycopg2://user:pass@localhost/postgres"
# CLIENT_URL="postgresql+psycopg2://user:pass@localhost/postgres"

from txtai import Embeddings

with Embeddings(
  path="sentence-transformers/nli-mpnet-base-v2",
  content="client",
  backend="pgvector") as embeddings:

  # Works with a list, dataset or generator
  data = [
    "US tops 5 million confirmed virus cases",
    "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
    "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
    "The National Park Service warns against sacrificing slower friends in a bear attack",
    "Maine man wins $1M from $25 lottery ticket",
    "Make huge profits without work, earn up to $100,000 a day"
  ]

  # Create an index for the list of text
  embeddings.index(data)

  print(embeddings.search("feel good story", 1))

As we can see, not much is different here! Just simply changing the content and backend to point to Postgres. The URLs can be directly set in the code although it’s strongly advised to use environment variables so those aren’t stored in plain text. Read more about this here.

For good measure, let’s query the database and see what we have. The diagrams below are created with pgAdmin.

Database Schema

Text data

Vector data

As we can see, we have tables storing text and vectors!

Now all our data is in Postgres. We get access to all those features we mentioned earlier (maintenance, backups, monitoring, security etc).

16-bit and Binary Embeddings

The pgvector backend also supports 16-bit and binary embeddings.

Embeddings(
  path="sentence-transformers/nli-mpnet-base-v2",
  content="client",
  backend="pgvector",
  pgvector={
    "precision": "half"
  }
)

txtai schema with halfvec (16-bit)

Embeddings(
  path="sentence-transformers/nli-mpnet-base-v2",
  content="client",
  backend="pgvector",
  quantize=1
)

txtai schema with BIT (384 1-bit numbers)

These options are a tradeoff between storage space and accuracy. In some use cases, the tradeoff might be worth the slight accuracy loss.

Wrapping up

This article covered how txtai integrates with Postgres + pgvector. While this setup works perfectly well with self-hosted systems, it also works with cloud-hosted Postgres instances.

We’re all in with this setup and believe it’s a solid choice in building enterprise systems ready for production!

Postgres is all you need for vectors was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.

Grow your open-source project 2.0

David Mezzetti — Tue, 26 Nov 2024 16:26:06 GMT

How It Started vs. How It’s Going

This is a follow-on update to the original article written two years ago.

Developers often dream of finding the time and energy to build an open-source software project. Building a project to help the community is the goal, especially when we’ve benefited so much from open source.

Many of us search GitHub for an existing library, install it and move on. Maybe we leave a star but often we don’t. Without open-source, our jobs as developers would be significantly harder. Developers always want to learn and love learning about new projects that can make future work easier.

This article focuses on the journey 🚀 of txtai over the last two years since the original article was written. That article was written right before the release of ChatGPT and the start of the AI craze. The original article has more basic tips if just getting started and is still worth a read.

About txtai

Github Summary for txtai as of 2024–11–26

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

It’s a foundational vector database with tooling built-in to support autonomous agents, retrieval augmented generation (RAG), large language model (LLM) orchestration and of course vector search.

GitHub Star history for txtai

Since the original article two years ago, txtai went from 2.2K ⭐’s to 9.5K ⭐’s today. That’s over 300% growth!

There is a “rising tide lifts all ships” phenomena in the AI space. Nonetheless, the project has seen steady growth due to it’s continued activity, being in a hot space and solving a problem people think is worth solving.

A Rapidly Evolving Landscape

Source: https://www.sequoiacap.com/article/generative-ai-act-two/

There is no shortage of vector databases and LLM frameworks in late 2024.

This includes open-source vector databases (i.e. Weaviate, Qdrant, Chroma) and closed ones (i.e. Pinecone).

On the LLM framework side the big ones are LangChain and LlamaIndex.

txtai fits into all three of the categories above and more. Many of the tools above didn’t exist when the original article was written in 2022.

Source: https://ossinsight.io/collections/vector-search-engine/

With all that being said, it’s doing quite well and holding it’s own against some heavyweights.

How it’s going

With the increasingly crowded AI space and how everyone now says “powered by AI”, it’s often hard to get any air in the space. But all is not lost.

Traditional social media channels are still a viable way to share your project. It’s just more work and it takes being more persistent and savvy to actually reach people in 2024, especially when using the word “AI”.

Most still appreciate an honest approach with a touch of humility. You’ll stand out against those who are selling “AI Snake oil”.

Here are the main methods that have helped txtai grow.

Hacker News

X (Formerly known as Twitter)

Post on X

The best posts on X will be technical. If you can show how your project solves a problem and/or how it compares to other known projects, you’ll get good traction.

Medium/HashNode/Technical Blogs

Is this an infinite loop?

Sharing information about your project in long form is also important. This can be direct examples or indirectly as with this article.

Granting autonomy to agents

One thing that has worked great is to write an article as a Jupyter notebook and port that to the blogging site. Then link to the notebook as a “ready-to-run” example.

Analyzing Hugging Face Posts with Graphs and Agents

Writing deep dive articles with code examples is crucial in showing developers how to use your project.

Facebook

NeuML Facebook Page

While content is shared on Facebook, txtai hasn’t been able to build an engaged community there. Not sure this is a great place to share technical content.

GitHub Trending

Source: https://github.com/trending July 2024

When all goes right and enough activity is going to your project, if you’re lucky, you’ll get another boost! This time from GitHub Trending. Projects that trend on the front page can easily add 100s if not 1,000s of ⭐’s.

You just have to get lucky here. txtai has actually never trended on the GitHub front page, only the front page for Python. But even that has added a lot of stars.

Wrapping up

This article covered the journey 🚀 of txtai over the last two years and shared ways to grow your own project.

Getting traction is a feast or famine activity and it can be frustrating. There will be good days and bad days. There has never been a time in history that one person can potentially have as much influence as they can today!

If you have a great project and stick with it, the community will come!

Grow your open-source project 2.0 was originally published in NeuML on Medium, where people are continuing the conversation by highlighting and responding to this story.