<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>High Dimensional Research</title>
        <link>https://hdr.is</link>
        <description>We connect and empower new intelligences</description>
        <lastBuildDate>Tue, 07 Oct 2025 00:23:29 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Introducing kips]]></title>
            <link>https://hdr.is/blog/kips</link>
            <guid>https://hdr.is/blog/kips</guid>
            <pubDate>Tue, 10 Dec 2024 10:00:00 GMT</pubDate>
            <description><![CDATA[Keep your personal data across sessions with MCP clients like Claude Desktop.]]></description>
            <content:encoded><![CDATA[
**Keep a personal database of your notes, tasks, past conversations, and even authentication details for MCP clients like Claude Desktop with `kips`. Run `npx kips config` to add it to Claude Desktop. You can add, update and delete entries with Claude directly or read more below.**

As we continue to build out our stack for the next generation of model-integrated applications, data custody and permissioning across otherwise ephemeral sessions has been high on the list of priorities. In the past our team has scoped out a utility just like `kips` over and over, but it never found a clean placement inside our suites of applications.

`kips`, named after lovely [Kips Bay](https://en.wikipedia.org/wiki/Kips_Bay%2C_Manhattan), is a tidy utility that creates and custodies your own personal database.

## Usage

`kips` [has a command line interface](https://github.com/hdresearch/kips) driving its own tiny SQLite database, but you can get going entirely inside Claude if that's what you so choose.

Just one line to add it to Claude Desktop:

```bash
npx kips config
```

`kips` is designed to custody the following:

- Authentication (username/password details)
- Notes
- Tasks
- Conversations

Notes, tasks and conversations can be tagged and all can be gathered by tag. If you want to ingest a note, you can tag it as in the following example:

```bash
npx kips import --type note ~/note.txt --tag "big-deal another-tag"
```

What if it's not a text file, but a PDF? What if you just screenshot conversations? Then you can still break it down into text with Claude as a transcription agent.

![](https://content.hdr.is/ingest3.png)

Want to get Claude to remember things to do? Try integrating with `mcp-shell` and gathering tasks from `gh issue list`. Suddenly you have a local database of to-do items Claude can reference and correct for you within your own shell.

![](https://content.hdr.is/mcp3.png)

But what if you wanted to map over every task at once? What if, indeed? We have more to talk about quite soon, but for now, we encourage you to experiment with your own integrations with `kips`.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Introducing mcp-shell]]></title>
            <link>https://hdr.is/blog/mcp-shell</link>
            <guid>https://hdr.is/blog/mcp-shell</guid>
            <pubDate>Tue, 26 Nov 2024 10:00:00 GMT</pubDate>
            <description><![CDATA[An additional utility while we enter the next era.]]></description>
            <content:encoded><![CDATA[
**We've written a utility, `mcp-shell`, for integrating your shell with Claude using the [Model Context Protocol](https://modelcontextprotocol.io/introduction). Check out [the source code](https://github.com/hdresearch/mcp-shell) or just run `npx mcp-shell`.**

A lot has been happening under the surface at High Dimensional Research.

Careful readers may have noticed slight changes to our website as we enter our next era, including some new faces in the shop. We'll have more to say as we go, but our same basic mission continues in a new form: whereas before we "connected and extended new intelligence," we now "build the infrastructure for intuition." What does this mean?

We've historically thought of models and intelligences as aspirations toward an extension of the human mind -- not necessarily literal, in-your-brain extensions (though who knows what the future will bring), but as extensions of yourself to handle and organize information. When we move our bodies, we simply "intuit" what to do. The command is not announced to ourselves, but inferred. Interface design, at its best, turns intention into an intuitive gestalt. Likewise, when we work with machines, at their best they extend our own intuitions into the world. Likewise, how else could we describe the probabilistic mechanisms of a generative model as anything other than "intuitive"? Models don't high-level reason so much as ... infer, slot in a little at a time. And so we've refined our mission toward the broader goal of enforcing that relationship with new intelligence, both its and our intuitive leverage.

## `mcp-shell`

In the meantime, we've been experimenting with Anthropic's recent [Model Context Protocol](https://modelcontextprotocol.io/introduction) and built a basic utility to help Claude Desktop users integrate Claude with their own desktop: a package called [`mcp-shell`](https://github.com/hdresearch/mcp-shell). 

Just run `npx mcp-shell` and Claude Desktop will automatically pick up the server and notice it can run a shell as a local function call.

{% video src="https://content.hdr.is/music.webm" %}
{% /video %}

Of course, we've put in some basic safeguards to avoid disastrous results, but it should enable you to work with your own files in a fun way.

]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Psychotransport]]></title>
            <link>https://hdr.is/blog/transport</link>
            <guid>https://hdr.is/blog/transport</guid>
            <pubDate>Tue, 15 Oct 2024 10:00:00 GMT</pubDate>
            <description><![CDATA[Nolita and Index: rearchitected. Additional products in the works.]]></description>
            <content:encoded><![CDATA[
**[Nolita](https://nolita.ai) is being reconstructed with a new Python codebase, built around preliminary navigation planning and hierarchical agent delegation. For more information on what this means, read below.**

[Wormholes](https://hdr.is/blog/wormhole) form lateral transport: within the same universe; across different universes; and through gravitational sleights of hand you could theoretically [travel through time](https://en.wikipedia.org/wiki/Wormhole#Time_travel), though for the moment it remains science fiction. Still, nothing stops us from lateral transport within the realms of imagination. We entered our exploratory mode willing to reinvent anything so long as it perfected our mission to the broadest possible audience: to connect and extend new intelligence.

In our first missive today we describe a new architecture for our autonomous navigation stack, [Nolita](https://nolita.ai), and its [Memory Index](https://nolita.ai/memory), currently in preview.

## Northern Little Italy

While redeveloping the Memory Index, it was evident that a lot of the work involved in more intelligent objective assessment and optimisation was tied into Nolita proper. 

Whether we rely on the Index, or on Nolita, for different functionalities depended on where we delivered the most extensible and robust experience for developers. High Dimensional Research crafts tools and provides supplemental, data-oriented extensions for those tools as our core service; our tools therefore need to be as easy to integrate within your own stack as they possibly can be. For this reason, we've historically avoided serving a black boxed model of our own. 

We believe in treating models as relatively agnostic slots within a stack, where you can use weaker or stronger models for the task as necessary. We also believe in being more resourceful than bringing a bigger gun, so to speak, when the work involved requires higher precision.

### State of affairs

Our pre-existing Nolita codebase takes a monolithic agent approach; it's comparatively naive in its simplicity, and yet it's gotten decent results for simpler tasks.

1. We enter an objective and a starting URL and give it to Nolita.
2. It enters into a loop between itself and an LLM where we go from page to page, see all our previous actions on the page (with an assessment of our current progress) and generate new steps.
3. Inbetween steps, our Memory Index can substitute actions if we've seen similar objectives and actions in the past. Given the intent of the user query and the pages in question, we assess if we have similar trajectories in the Index to use to advance the state without generating steps with the LLM.

So how can we improve upon this? Either we can improve the way we fetch steps in the Index, or we can reconsider the single agent approach entirely.

## Navigation hivemind

Obviously, we chose the latter. Instead of one agent, we have **three** high-level agents and over a **dozen** low-level agents handled by an orchestrator.

A planning agent, an execution agent, and an evaluation agent all cooperate to more robustly navigate through tasks. Imagine the difference between holding the high-level picture behind a task in your head as you do it -- and having a manager hold onto the bigger picture while only telling you what you need to focus on. When the plan goes awry, we refer back to our planner to amend the plan. The execution agent has a team of agents filling in the details when using any function it can perform in a web browser -- and all of these agents can integrate with and substitute steps from the Index.

Most importantly, we no longer have to pass a full DOM tree analysis into context; we can preprocess it with a dedicated agent and stream the page down to relevant details -- this, too, can use previous sessions from the Index to analyse what each DOM element "does" for a user.

The goal behind these changes is to continue to take load off of an LLM for a task, leaving its main speciality to be its continuous malleability when encountering new situations.

### Why Python?

When performing this rewrite, we found that orchestration support was simply superior inside a Python codebase. Additionally, we've found in user interviews that developers are integrating Nolita inside predominantly Python codebases. It's seemed overdue that we change to match.

For current users of Nolita, we intend to retire Nolita as written, leaving it as the `nolita-legacy` codebase, and stripping calls to the Index, leaving it in a workable state for basic workflows in TypeScript.

## Preview

We'll have more to say when we can show this version of Nolita publicly. We're currently in private testing with design partners. If you're interested in testing the next generation of Nolita, get in touch with Tynan at [tynan.daly@hdr.is](mailto:tynan.daly@hdr.is).]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Into the Wormhole]]></title>
            <link>https://hdr.is/blog/wormhole</link>
            <guid>https://hdr.is/blog/wormhole</guid>
            <pubDate>Tue, 20 Aug 2024 10:00:00 GMT</pubDate>
            <description><![CDATA[Dimensionality means pursuing lines of flight.]]></description>
            <content:encoded><![CDATA[
In the wake of [GrowthHax](https://hdr.is/blog/growthhax-aftermath) we've continued to hone our suite of products. This means not just fixing bugs in Nolita, but making decisions about the future of the suite. Autonomous web navigation is still an active research area for many companies, and we have a very idiosyncratic approach, selectively (and agnostically) deploying models in the automation pipeline. We want to rely on the strengths of our approach, but stay open to redesigning other aspects of our approach that still require more careful planning.

Some of these changes require some heads-down work, and we anticipate to be in an exploratory mode as a company for the next quarter in addition to our regular maintenance cadence.

## In the short term

We're currently incorporating local models into Nolita as a provider, specifically [Ollama](https://github.com/ollama/ollama) as a starting point. We also have begun work incorporating explicit planning into the agentic framework before beginning the navigation process.

## Looking ahead

With this work as a first step, our goal is to then reconstruct the way we prompt the Memory Index with planning in mind; we currently rely more on being clever with searches and queries for replaying navigation steps, but we plan to move toward evaluating graph similarity for identifying similar pages.

In addition, we will continue to complement our suite of applications that [extend and connect new intelligence](/people). We're going to begin incubating additional projects that supplement and compliment Nolita.

We will post updates we have more to share about our efforts. As always, we are grateful for those of you following our work.]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Recap: GrowthHax]]></title>
            <link>https://hdr.is/blog/growthhax-aftermath</link>
            <guid>https://hdr.is/blog/growthhax-aftermath</guid>
            <pubDate>Tue, 16 Jul 2024 10:00:00 GMT</pubDate>
            <description><![CDATA[The aftermath of 60 hackers in New York City.]]></description>
            <content:encoded><![CDATA[
On July 13th, High Dimensional Research hosted a hackathon, [GrowthHax](https://lu.ma/h3j3tqc3?tk=r5tEDQ), at Betaworks in New York City. This event gathered developers and entrepreneurs to explore innovative ways to drive growth using our product suite, [Nolita](https://nolita.ai) and the [Collective Memory Index](https://hdr.is/memory).

GrowthHax saw 13 teams present their projects, with around 60 participants engaging throughout the day. The projects ranged from playful oddities to businesses and product features that we expect to see in market soon.

It was an invaluable opportunity for the HDR team to watch developers use our products, troubleshoot, and answer their questions. We’re thankful for everyone who showed up. Check out a list of projects below.

### **First Place:** Apply Bot

Automates job applications by uploading a resume, finding matching jobs, customizing the resume for each job, and applying automatically.

### **Second Place:** Developer Trading Card Generator

Uses Nolita to scrape Dev.to profiles and generate trading cards based on the user's profile information.

### **Third Place:** What's New NYC

Quickly finds tech jobs in New York with an automated search tool.

### Honorable Mentions

- **AI Code Review:** Provides thorough code reviews from an AI model every time code is pushed to GitHub.
- **Rube Goldberg Agent:** Turns everyday goals into over-engineered adventures with AI-powered planning.
- **Reddit Raider:** Creates detailed personas and finds potential Reddit promoters, curating matches to user profiles to save time and boost engagement.
- **Akari KYC:** Uses AI for instant banking onboarding, eliminating the need for human review of documents and online information to improve fintech product conversion rates.
- **Dev Pages:** Generates a developer website from a GitHub username.

]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[2024-07 Release Notes]]></title>
            <link>https://hdr.is/blog/07-09-24-changelog</link>
            <guid>https://hdr.is/blog/07-09-24-changelog</guid>
            <pubDate>Tue, 09 Jul 2024 12:40:00 GMT</pubDate>
            <description><![CDATA[Nolita 2.1: Claude 3.5 support, replay sessions, unified auth.]]></description>
            <content:encoded><![CDATA[
**Nolita 2.1 is now available for developers [on GitHub](https://github.com/hdresearch/nolita) or just by running npx nolita@latest.**

## What's new?

While a small version iteration, a lot's changed under the hood.

### Additional model support

Our previous dependency packages for chat completion caught us unable to support all the models we've liked. In particular, Nolita was limited to Claude 2.1 support and local models were fairly limited. We've finally resolved that issue; you can use any Anthropic model [currently available](https://docs.anthropic.com/en/docs/about-claude/models#model-names).

The obvious next step is local model support. You get the idea.

### Replay sessions

Of note is a new method in the Page API, [followRoute](https://docs.nolita.ai/reference/classes/Page.html#followroute), which takes in a memory ID and replays the navigation steps across the remembered session. A "memory ID" is, of course, the same thing as `page.pageId`. 

If you use the new `--record` flag in `npx nolita`, you can get an idea of the flow.

![](https://content.hdr.is/record.gif)

This ID can then be used with `--replay` (though it will only report that the replay was successful) or used in Nolita scripts. Our intention with this feature is allowing Nolita users to use `Page.do()` and `Page.browse()` for a session, dictating the trajectory in natural language, before treating the trajectory as a retained session for future runs.

### Unified authentication

All authentication for Nolita is now done through `npx nolita auth`. In the past, you could provide Nolita keys (both model providers and HDR API access) via

- environment variables;
- a `config.json` file;
- inline, as a flag to the `npx nolita` runner;
- passed in each and every payload to the `npx nolita serve` server;
- or passed in each and every call to the Nolita `Page` and `Browse` APIs

These all still work. We just recommend you set your keys **once**, with `npx nolita auth`, and then never think about them again. All our interfaces will look for saved keys if none are provided, including the server and any create projects, so the calls to Nolita are exclusively about managing agentic browser sessions, navigation, and trajectories.

## What's next?

We've been busy with the upcoming hackathon. [GrowthHax](https://lu.ma/h3j3tqc3?tk=r5tEDQ), set for this Saturday, has over a hundred people signed up and is co-sponsored by [645 Ventures](https://645ventures.com/) and [Skej](https://skej.com/), not to mention our initial partners at [Betaworks](https://www.betaworks.com/).

We've been preparing to deploy a Python SDK for connecting with `npx nolita serve` sessions. Finally, as stated before, we believe that local models are an important part of Nolita's future, and we intend to add first-class functionality in the short-term.

{% iframe src="https://cdn.forms-content-1.sg-form.com/66216f3a-275e-11ef-b4f3-f6b8622304d9" %}{% /iframe %}]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Join us for GrowthHax]]></title>
            <link>https://hdr.is/blog/growthhax</link>
            <guid>https://hdr.is/blog/growthhax</guid>
            <pubDate>Tue, 25 Jun 2024 10:00:00 GMT</pubDate>
            <description><![CDATA[07.13.24. Betaworks. Hackers invited.]]></description>
            <content:encoded><![CDATA[
We at High Dimensional Research are happy to announce an inaugural hackathon around [Nolita](https://nolita.ai) and the [Memory Index](https://hdr.is/memory), called **GrowthHax**.

On July 13, 2024, at [Betaworks](https://www.betaworks.com/) in New York, we invite teams of hackers to create tools around the concept of **growth**: whether user growth, revenue growth, or anything to do with "growth" as a central theme. We invite creative twists on the idea.

We believe Nolita enables novel product design around visible and invisible agentic flows, and the team will be in person to help out (and provide our own twists on the theme). Food, drinks and prizes all included.

**Interested?** For more information, or to register, check out the event on [Luma](https://lu.ma/h3j3tqc3).]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Memory Index is publicly available]]></title>
            <link>https://hdr.is/blog/cmi-release</link>
            <guid>https://hdr.is/blog/cmi-release</guid>
            <pubDate>Wed, 19 Jun 2024 10:00:00 GMT</pubDate>
            <description><![CDATA[Get started with integration, with more features on the way.]]></description>
            <content:encoded><![CDATA[
We're pleased to allow anyone to sign up for Memory Index access today.

The Memory Index functions as a repository of user-anonymised web trajectories -- storing page structures as a graph, and navigation across those structures as searchable data.

In practice, this means that when an AI model uses Nolita to accomplish an objective on the web, it can share and recall patterns of navigation from other successful objectives when accomplishing subsequent tasks.

## What to expect

As the Memory Index grows, it becomes better, as more tasks and pages become searchable. This means at first improvement begins subtly, refining over time toward more intelligent application of steps. We also are continuing to improve our search and replay functionality on the Index and in Nolita respectively to improve global performance. We're happy to have our earliest users on the ground floor with us.

## How to get started

It's easy. Get an account on [our dashboard](https://dashboard.hdr.is). Every user gets 10,000 requests per month by default, with more available via subscription.

With `npx nolita auth`, you'll be prompted to access the dashboard, generate a key at [the API page](https://dashboard.hdr.is/keys), and input it into Nolita. It will be saved as a dotfile on disk.

From then on, all requests will utilize memories in the index.

If you use Nolita as a package import, you pass your HDR key with your `.env` file or by passing it into the Browser class:

```ts
  const agent = makeAgent(
    { apiKey: process.env.OPENAI_API_KEY!, provider: "openai" },
    { model: "gpt-4" }
  );

  const browser = await Browser.launch(false, agent, undefined, {
    apiKey: process.env.HDR_API_KEY,
  });

  const page = await browser.newPage();
```

## Next steps

- We intend to surface route following more explicitly. You'll soon be able to use `nolita record` and `nolita replay` to apply web navigation trajectories to different sessions.
    - Likewise, we'd like to surface memory management on the dashboard so that organizations can configure and retain successful memory identifiers to apply directly.
- We have a rewrite of our agent engine, supplanting the `zod-gpt` package, in pre-alpha, coming soon after.
    - Once the rewrite is in place (allowing for long-overdue Claude 3 integration), we anticipate local model integration to come soon after that.]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Why agents fail]]></title>
            <link>https://hdr.is/blog/bernoulli</link>
            <guid>https://hdr.is/blog/bernoulli</guid>
            <pubDate>Wed, 12 Jun 2024 12:40:00 GMT</pubDate>
            <description><![CDATA[Learn the iron law.]]></description>
            <content:encoded><![CDATA[
## Why agents fail

The thing about agents based on AI models is that right now, they are really bad at doing things. They can’t reliably do even basic addition and multiplication, let alone pilot a web browser. And this is not a problem that is going away any time soon.

While a lot of this has to do with aspects of how large language models work as a category of technology, there’s still a more fundamental issue at play about their basic mechanics. We’re gonna do a little bit of basic probability, but I promise it won’t be that bad.

If your chance of success at each step is {% math %}p{% /math %}, then the probability you are always successful for {% math %}n{% /math %} steps can be calculated using this formula:

{% math block=true %}
P(n) = p^n
{% /math %}

For example, say you want to know what the chance of flipping a coin and having it come up heads every time, five times in a row.

{% math block=true %}
P(5) = 0.5^5 \newline 
= 0.03125 \newline = 3.125\\%
{% /math %}

Quite a rare event!

This is Bernoulli’s iron law: [the law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers). Over time, no matter how many times we try an inherently probabilistic process, we discover its true value: the true reliability of the task.

![](https://content.hdr.is/ironlaw.jpg)

You can understand how the same basic principles can be applied to agents. When we give an agent a task and some starting conditions, we are hoping that it choses the correct action to take at every single step.

An agent that chooses the correct step 90% of the time — and I gotta say, if you have an agent like this, please contact us because you’ve made a fundamental breakthrough — has a 59.05% chance of completing a five step task. And at ten steps we’re looking at success rate of 34.87%. 

In the context of web agents, this means that even if the [three-click rule](https://en.wikipedia.org/wiki/Three-click_rule) holds true, the true success rate of that agent is more like 72.90%.

Software that only works 72.90%, 95%, or even 99% of the time is bad software. It is too unreliable to use. Reliability is measured in the number of nines. For example, AWS’s S3 has 11 nines. That is 99.999999999% reliability. In practice this means if you store 10,000 objects, you will only lose one once every 10 million years [1].

In order to elevate agents to production, they need to be basically perfect every time. This is why we see a lot of agent companies, but few agents actually in production. 

## When to (not) be agentic

Our solution is simple: reduce the probabilistic action space. We can still retain a lot of fundamental technology — falling back to models for exploring and reasoning about next potential actions — but by being clever with search and embedding, we can substitute in known actions as a kind of “guard rail” over a task.

Let’s work with an example. 

How many steps are involved in booking a hotel? Really simply: we aren’t even researching prices. We’re just booking the hotel.

1. You’ll visit the hotel homepage.
2. You’ll see something like a ‘Book now’ button and click it.
3. You may need to pick a type of room, and hit ‘search’.
4. You’ll see a calendar with dates and select the start and end dates.
5. You’ll see an estimation of the price, potential upsells, etc.
6. You’ll enter a payment flow. Optionally, you’ll be prompted to enter in details for a rewards program or something like that.

You can guess that while the details might differ, hotel websites don’t structurally differ all that much. And if we know the objective, and we can create a structure of the page content — leveraging accessibility tools in the process — we can take out a whole bunch of these steps. You may need the model for reasoning — what dates matter for us? What are the payment details to input? — we can substitute out most of these steps, leaving the new content and the state-specific reasoning to a model.

Our approach is based on our Collective Memory Index, a search index over web trajectories and actions. If you’d like to experiment with it, we invite you to learn more at [hdr.is/memory](http://hdr.is/memory) and get on the waitlist to use it with our agentic web automation framework, [Nolita](https://nolita.ai).

---

[1] [https://aws.amazon.com/blogs/aws/new-amazon-s3-reduced-redundancy-storage-rrs/](https://aws.amazon.com/blogs/aws/new-amazon-s3-reduced-redundancy-storage-rrs/)]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[2024-06 Release Notes]]></title>
            <link>https://hdr.is/blog/06-07-24-changelog</link>
            <guid>https://hdr.is/blog/06-07-24-changelog</guid>
            <pubDate>Fri, 07 Jun 2024 12:40:00 GMT</pubDate>
            <description><![CDATA[Nolita 2.0: introducing the natural language, drop-in Page API.]]></description>
            <content:encoded><![CDATA[
**Nolita 2.0 is now available for developers on [GitHub](https://github.com/hdresearch/nolita) or just by running `npx nolita@latest`.**

## What's new?

Forgive the silence. After presenting at Betaworks' Agent Camp [Demo Day](https://vimeo.com/947116133), we worked closely with other developers in agentic navigation and began redesigning the Nolita APIs for broader purpose.

We had several takeaways:

- objective-oriented, agentic navigation isn't always the ideal for quick tasks and composability;
- meeting Python developers where they are is important, even if our codebase is in TypeScript;
- finally, operating as simply as a drop-in replacement is important for developer ergonomics

With all these in mind, we started to consider how Nolita might operate as a more natural language drop-in for Puppeteer. By organising our API around [`Page`](https://docs.nolita.ai/reference/classes/Page.html) instances, we found that you didn't have to wait for one tab to complete, or even one Browser window to finish its task: you can begin to map over the typed responses from previous objectives.

We've updated [our example code](https://github.com/hdresearch/nolita/blob/main/examples/findEmail.ts) for an idea of the ergonomics available for developers. 

```ts
  const browser = await Browser.launch(false, agent);

  const page = await browser.newPage();

  await page.goto("https://hdr.is");

  await page.do("click on the company link");
  const answer = await page.get(
    "Find all the email addresses on the page",
    z.object({
      emails: z
        .array(z.string())
        .describe("The email addresses found on the page"),
    })
  );
```

`Page.do` provides a single, action-oriented command to an agent; `Page.get` poses an objective to accomplish. Passing an expected, typed response from your agent is as easy as passing a parameter of the shape.

We hope it is evident with use how much this enables for developers to construct applications using the same browser session -- not just one-off objective sessions.

## Increasing our documentation

In addition, we've begun expanding our inline source code documentation for all our exported classes. Our goal is to be as legible as possible in your IDE and in our documentation. With our new [Nolita documentation](https://docs.nolita.ai), you can see the exact location of all exported definitions and parameters, and of course, find example usage. We're still expanding our guides for common use cases, especially in the wake of our new API.

Since we never mentioned it on the company blog: **we named `@hdr/browser` Nolita**! It's named after our favourite neighbourhood in Manhattan. It's also important to mention that we increased the scope of the Nolita API itself to also include a server and bootstrapping full-stack applications built on Nolita. 

You can see examples of all interfaces in real-life, moving GIF [on GitHub](https://github.com/hdresearch/nolita).

## Going forward

So what's next for us?

- It would be powerful to control Page objects as an abstraction over remote instances of Chrome, so we intend to facilitate this functionality soon.
- Of course, even with all this Nolita work, we're still working on the Memory Index! You may have seen the documentation deployed for the [API endpoint](https://api.hdr.is/docs). We have a lot of work to do on increasing the scope of our Memory Index documentation, and signups are still closed, but we intend to softly onboard over the following month. The waitlist is still available [here on hdr.is](https://re8zt94ow1u.typeform.com/to/qo0GQ398).
- Finally, we intend to provide a Python SDK as part of our short-term efforts as well.

Thank you for being with us as we tirelessly hone our vision for open-source, web-first agents. Whether you want to just dictate, step by step, how you want to navigate the web programmatically -- or whether you want to quickly pipe an answer into your bash terminal, based on nothing more than a question and a single command -- we want to help you do it.

{% iframe src="https://cdn.forms-content-1.sg-form.com/66216f3a-275e-11ef-b4f3-f6b8622304d9" %}{% /iframe %}]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Collective memory, fragment one]]></title>
            <link>https://hdr.is/blog/cmi</link>
            <guid>https://hdr.is/blog/cmi</guid>
            <pubDate>Wed, 27 Mar 2024 10:00:00 GMT</pubDate>
            <description><![CDATA[The largest conquest: probability.]]></description>
            <content:encoded><![CDATA[
The current AI paradigm is inherently probabilistic. What that means is that AI-powered web agents fail – a lot.{% note #1 /%} And they fail in unpredictable ways. It is, therefore, extremely hard to build and commercialize products around AI. Yet, LLMs are incredibly powerful. How do you harness their power while taming their unpredictability?

This is the problem that HDR set out to solve. We’re building a framework for launching web agents to automate arbitrary tasks – everything from complex research projects to AI-powered e-commerce. The cornerstone of our framework is an entirely novel product concept and a new contribution to the agent developer’s toolkit: the Collective Memory Index (CMI). The goal of the CMI is to offer developers an off-the-shelf solution to the agent reliability problem by feeding their agents “memories” of how to complete complex workflows. How do we do this? There are three steps: 1) observe; 2) record; 3) recall.

First, we observe agents using a browser to accomplish the tasks that their users set for them. When they do so successfully, we record the set of actions they took to reach the desired end state as a memory. Those memories take the shape of a set of state and action pairs which are mapped to the web pages through which the AI navigated toward its goal. In cases where AIs persistently fail to accomplish their goals, we allow a human user to drive the browser to generate the necessary memory. Last, we make those memories searchable at web scale, empowering any agent that needs to accomplish the same or a substantially similar task to recall them. It is in this sense that the memories are collective. If no memory exists for a given website, we search for similar site structures and offer agents use of those memories as a cognate.

In short, every agent that uses our product contributes memories of successful actions which future agents can recall and then, in turn, they will contribute more. This generates a profound network effect: the more people use our product, the better it becomes, which draws in more users, and so on. It also puts us in a unique position within the entire AI ecosystem. We are not competitive with the state-of-the-art models. They simply make our product better.

The CMI solves other problems that contribute to customer churn as well. Since developers can now use state-of-the-art models to pathfind towards a goal and record a memory of their successful action sequences, they can subsequently employ smaller, cheaper, more specialized models to accomplish complex tasks. Additionally, the CMI can guide agents on the shortest path from their starting point to their ending point, reducing latency. What this means is that, using the collective memory index, developers can launch web agents to reliably perform complex tasks at a lower cost and latency.

{% note #1 %}Benchmarking of state-of-the-art LLMs suggests that they fail at complex tasks approximately 86% of the time.{% /note %}
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[@hdr/browser is open source]]></title>
            <link>https://hdr.is/blog/browser</link>
            <guid>https://hdr.is/blog/browser</guid>
            <pubDate>Fri, 08 Mar 2024 12:40:00 GMT</pubDate>
            <description><![CDATA[Drive your own browser. Structure your data. Locally.]]></description>
            <content:encoded><![CDATA[
_Image generated by the [Lenia](https://github.com/Chakazul/Lenia) cellular automaton._

We're pleased to announce that we've open sourced our browser automation framework, `@hdr/browser`, [on GitHub](https://github.com/hdresearch/hdr-browser) under the MIT license.

As we work on our collective memory infrastructure, we felt a framework that could showcase memory integration as a core feature while still being primarily objective-driven and agent-specific had many benefits: for one thing, you can run it directly:

```bash

npx @hdr/browser --agentProvider [provider] --agentModel [model]
```

This command alone will procure Chrome and get to work. If you don't specify an objective and a place on the internet to start browsing, it will ask you for them.

For more complex applications, we recommend importing the framework as a module. We have included an example of using our browser to dictate an expected typed response from an agentic task, defining the type ourselves and structuring the data, which you can find [here](https://github.com/hdresearch/hdr-browser/blob/396c4ed125fa3087aa18c7f198ba5b386005db49/examples/).

In general, the `@hdr/browser` module exports several classes that are expected to be constructed together, but can be modified at each level:

- the `Agent` class, which expects a chat completion API;
- the `Browser` class, which procures the browser locally;
- the `Logger` class, which defines the log level of the session;
- the `AgentBrowser` class, which takes in the preceding classes and is called with an objective.

For more documentation, you can see the [README](https://github.com/hdresearch/hdr-browser?tab=readme-ov-file#usage-as-an-imported-module) for an example.

We hope that this release helps encourage more people to build small LLM applications for everyday tasks and, as we continue to build out our collective memory, that developers at any stage can use smaller, simpler, even local models to build agents without an expensive training process.

For developers preferring a consolidated product, we still offer our Browser API endpoints that merge Collective Memory with our browser infrastructure.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[2024-01 Release Notes]]></title>
            <link>https://hdr.is/blog/01-31-24-changelog</link>
            <guid>https://hdr.is/blog/01-31-24-changelog</guid>
            <pubDate>Wed, 31 Jan 2024 12:40:00 GMT</pubDate>
            <description><![CDATA[Life after collective memory.]]></description>
            <content:encoded><![CDATA[
High Dimensional Research spent much of 2023 iterating our core product offering toward increasingly specific objectives: we started out by trying to optimize for what excited **us** about what we wanted to do with AI, but over time, those exciting ideas had technical prerequisites that weren't met by anyone yet. It became apparent that we should gear ourselves toward those very same prerequisites, lest we simply reinvent some rudimentary solutions to stand up lofty ideas. Over time, we worked our way down to face the core problems of inter-model coordination, inter-model memory, and inter-model reflexivity.

Why these problems? It's obvious in retrospect that the brain of an animal organizes and reinterprets incoming sensory information through systems that developed on the scale of millions of years. For some reason, we expect non-human intelligences to not have a learning period, to work **now**, to be adapted to their environments **now**.

And one of the most common tasks people want models to face involves interaction with the internet. As a single piece of software for autonomous agents, a web browser is easy to implement and difficult to implement well. You can write your browser around vision models at a slow and expensive clip with limited interaction, or you can try to parse the DOM for an intelligence without eyes or human reflexive cognition. We believed that the answer lay in an index of something we call a "reflexive cognitive process." Memory Index, as described [on our product page](/product), sits between a passthrough model and the browser. It preprocesses web content for legibility and it introduces something analogous to "muscle memory": Memory Index anonymizes those sessions, ignores any personal or sensitive information, and creates a bank of possible actions collectively taken across all sessions.

We began by getting infrastructure in place to support a core loop between model and controller. We finished our first version of that infrastructure last November. Then we began work on Memory Index, and finally shipped it at the start of the year.

We believe, over time, that we can continually improve success rates for autonomous tasks performed in-browser through use and improvements of Memory Index and the whole product suite. To track it, last week we also deployed a benchmarking suite (and wrote a [post about it](/blog/webarenas)) and implemented evaluators for benchmark tasks.

We also implemented some network comparison algorithms for comparing website structures, and altered the behavior behind our passthrough model endpoint consumption; we include inner page HTML into model messages and slice text into appropriately-sized messages consistently.

Finally, as part of this general slate of work, we also began to assess and record successful tasks into Memory Index. We believe that we have now laid the groundwork for assessing where we can improve and expand our work in terms of extending model agency on the internet, and we encourage you, if you are not building, testing our services with us, or using our services in the areas we are confident they excel, to visit from time to time and check in on our progress.

{% iframe src="https://cdn.forms-content-1.sg-form.com/66216f3a-275e-11ef-b4f3-f6b8622304d9" %}{% /iframe %}]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Agents in the Arena]]></title>
            <link>https://hdr.is/blog/webarenas</link>
            <guid>https://hdr.is/blog/webarenas</guid>
            <pubDate>Mon, 29 Jan 2024 17:44:58 GMT</pubDate>
            <description><![CDATA[Agents have a problem: they fail. A lot.]]></description>
            <content:encoded><![CDATA[
Agents have a problem: they fail. A lot.

## Why failure matters

Agent performance is the current name of the game. To understand why, consider the following toy model for agent value. Normally, we can calculate the net value of a given query with the following equation:

{% math block=true %}
Value_i = E[Query_i] - E[Cost_i]
{% /math %}

However agents behave a bit differently. We can break out {% math %}E[Cost_i]{% /math %} into two components; cost of the query itself, denoted as {% math %}Cost(Query_i){% /math %}, and the expected cost of the failure state of query _i_, denoted as {% math %}E[Failure(Query_i)]{% /math %}.

{% math block=true %}
Value_i = E[Query_i] - E[Failure(Query_i)] - Cost(Query_i)
{% /math %}

The cost of a query here is not expected since both people and LLM APIs charge set rates.

This disambiguation is important because the set of possible failure states is essentially unbounded. To illustrate, consider the case where you want an agent to order groceries. For simplicity’s sake, let’s say you would be willing to pay ten dollars for someone to do this task, {% math %}E[Query_i] = \$10{% /math %}, and the cost of employing an agent in this case is ten cents, {% math %}Cost(Query_i) = \$0.1{% /math %}.

The value our agent can deliver is 100x that of the cost. Pretty good right? Not so fast. We still need to consider {% math %}E[Failure(Query_i)]{% /math %}. Consider the following end states our agent might find itself in:

1. The agent successfully orders your groceries;
2. The agent fails order groceries;
3. The agent buys the wrong items; or
4. The agent puts the wrong address in the delivery information.

In scenario one, the value is $10.00. In scenario two, the value is -$0.10 since no transaction actually happens. Scenarios three and four are where things really start to get bad. Not only did the agent fail to complete the task, but now you have to clean up the mess the agent has made. If it takes the same amount of time to cancel, we’re looking at an expected cost of failure of -$10.10 dollars. If seriously bungles the order, the cost can balloon well beyond that.

## Benchmark environments for web agents

Generating benchmarks from live sites is a pain. You can get hit with captchas, rate limited, and have to explain to your credit card company that those purchases weren’t technically fraud since the AI was acting on your behalf but — nonetheless — you don’t actually want 32 handbags from Nordstrom. (Not that we have any experience with that.)

But the biggest problem is that sites change. They can change for normal reasons like old products disappearing, new user reviews getting written, or a CRM updating the sales total for each new order. However, the tricky thing about trying to benchmark a web agent is that the agent itself is often asked to change the state of a website. As a result, benchmarking on actual websites is out of the question.

That’s why we’re using [WebArena](https://webarena.dev/). WebArena is comprised of eight (we’re using six) realistic websites that are either open source clones or designed to simulate a real website. They are fully functional in the sense that you can browse products, make posts, lookup directions, etc. But they are neutered so you’ll never be charged for testing a checkout flow or get yelled at because your agent deleted all the draft blog posts 😅.

Arenas divorce actions taken on the web from real world outcomes. This means that any actions carried out within an arena, such as browsing products, making posts, or interacting with website features, have no actual impact beyond the arena environment. Developers can freely experiment and test their agents without concerns about unintended consequences or affecting real users.

But the most significant advantage of arenas is that they have a reset button. Resetting allows the arena's state, including any modifications made by the agent, to be effortlessly reverted. This aspect is especially critical when evaluating web agents since these agents frequently need to alter the website's state during their tasks. By being able to reset the state, each benchmarking attempt begins from a pristine and consistent state when needed, enabling precise comparisons and evaluations.

### Sample WebArena tasks

{% div className="overflow-x-auto" %}

{% table %}

- Arena
- Task ID
- Objective

---

- [http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:7780/admin](http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:7780/admin)
- `https://api.hdr.is/benchmarks/task/000`
- What is the top-1 best-selling product in 2022

---

- [http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:9999/forums/all](http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:9999/forums/all)
- `https://api.hdr.is/benchmarks/task/21`
- List out reviewers, if exist, who mention about ear cups being small

---

- [http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:3000/](http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:3000/)
- `https://api.hdr.is/benchmarks/task/424`
- Find the page of the place where Mr. Rogers was filmed on the map.

---

- [http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing](http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing)
- `https://api.hdr.is/benchmarks/task/427`
- Find the page of the university that has most Turning Award winners on the map.

---

- [http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:8023/explore](http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:8023/explore)
- `https://api.hdr.is/benchmarks/task/811`
- Assign the issue regarding 404 in a11yproject to myself.
  {% /table %}

{% /div %}

The WebArena benchmark dataset contains 812 tasks. These tasks range from simple information retrieval, such as listing negative reviewers, to complex manipulation tasks such as querying a content management system.

As you might imagine from such a range of tasks, success rates vary wildly. For some tasks, agents reliably perform tasks almost 100% of the time. For others, the failure rate is 100%. From our experience working on these benchmarks over the past few weeks, agent failure rate is directly correlated with the horizon length of the task. In plain english, the more steps a tasks takes the higher the failure rate.

However, our experience also shows that it is possible to bring error rates down. We think of long horizon tasks as being composed of a series of primitive tasks. In order to buy your groceries, an agent needs to navigate to the page, use the search bar, add items to cart, etc. Therefore, the first step to agents that work is getting _really_ good at the basics.

That’s why we’re releasing our benchmark [status page](https://hdr.is/status). We want builders to have a clear understanding of what it is possible to do with agents while giving them a place to test things out (audit us) that limits the blast radius when things go wrong. We’re still in private beta, but we encourage you to check back here for updates and sign up for [the waitlist](https://re8zt94ow1u.typeform.com/to/qo0GQ398).
]]></content:encoded>
        </item>
    </channel>
</rss>