Stories by Mihai Criveti on Medium

MCP Gateway: The Missing Proxy for AI Tools

Mihai Criveti — Sat, 07 Jun 2025 16:52:24 GMT

ContextForge MCP Gateway: The Missing Proxy & Registry for AI Tools

AI agents and tool integration are exciting — until you actually try to connect them. Different authentication systems (or none), fragmented documentation, and incompatible protocols quickly turn what should be simple integrations into debugging nightmares.

MCP Gateway is a smart proxy that sits between your MCP Clients and MCP Servers

I’ve released ContextForge MCP Gateway as open source to solve this problem by sitting between your AI clients and tool servers, giving you an open source, clean, secure endpoint for everything — and supports both REST and MCP upstream and downstream protocols (including stdio, SSE and Streamable HTTP, with auth). If you find it useful, leave us a star on GitHub ⭐!

ContextForge MCP Gateway — Adding a MCP Server and Creating a Virtual Gateway

In this article, I’ll share how to deploy MCP Gateway along with 2 MCP Servers (time and github), federate them in the gateway, and connect to them using an MCP Client (Visual Studio Code — Copilot).

The MCP Integration Problem

The Model Context Protocol promises to standardize how AI models call external tools, but the reality is messy. The ecosystem now has over 15,000 MCP-compatible servers, but they’re anything but uniform:

Transport chaos and incomplete implementations: Some servers only support STDIO, others stream over Server-Sent Events, and a few expose Streamable HTTP endpoints. They can’t talk to each other without custom adapters. And while the MCP specification evolves (now deprecating SSE) — MCP clients and servers are slow to catch up.

Security gaps: Many test servers skip authentication entirely or use weak schemes. Depending on what the client supports, you’ll want to either use a stdio wrapper, SSE + Bearer JWT auth, or full flown OAuth.

Writing new MCP Servers: Your existing API endpoints aren’t yet available as MCP Servers. You have to write new servers, and test them.

Everything lives somewhere different: Your prompt library runs on Server A, the vector database on Server B, and your custom tools on Server C. Managing URLs, keys, and retry logic across all of them becomes a full-time job.

https://www.slideshare.net/slideshow/contextforge-mcp-gateway-the-missing-proxy-for-ai-agents-and-tools/280961197

How MCP Gateway Fixes This

Instead of handling retry logic and managing multiple MCP servers directly in your agent or client application, MCP Gateway centralizes all that complexity. You can create multiple virtual servers for different clients or use cases, each with their own tool configurations and access controls.

The MCP Gateway Wrapper allows you to connect to the gateway securely, using a JWT token, while exposing a local STDIO server.

The gateway acts as a smart proxy that normalizes everything behind a single endpoint:

One Consistent Interface

The gateway converts STDIO, SSE, and HTTP into consistent HTTPS+JSON-RPC, so your clients only need to know one protocol.

And your MCP Clients — Agents such as Langchain, Autogen, Crew.AI — or Visual Studio code plugins (ex: Microsoft Copilot) can connect over SSE + JWT Auth, or via STDIO locally (using mcpgateway-wrapper to connect securely to the Gateway).

Complete Tool Discovery and Debugging

The gateway automatically discovers all connected servers and presents every tool, prompt, and resource in one catalog.

Tools can be enabled, disabled and tested — and JSON Schema and tool description easily viewed.

Non-MCP API Support

Wrap any REST API endpoint and expose it as a fully-typed MCP tool with automatic retries and schema validation.

Built-in Observability

Every API call is timed and logged, so you can track performance and debug issues without external monitoring tools. Both per-tool, per server and aggregated metrics are available.

Getting Started

The gateway ships as a single Docker container or pip package with no external dependencies-just a local SQLite database.

Docker / Podman

docker run -d --name mcpgateway \
  -p 4444:4444 \
  -e MCPGATEWAY_UI_ENABLED=true \
  -e MCPGATEWAY_ADMIN_API_ENABLED=true \
  -e HOST=0.0.0.0 \
  -e JWT_SECRET_KEY=my-test-key \
  -e BASIC_AUTH_USER=admin \
  -e BASIC_AUTH_PASSWORD=changeme \
  -e AUTH_REQUIRED=true \
  -e DATABASE_URL=sqlite:///./mcp.db \
  ghcr.io/ibm/mcp-context-forge:0.5.0

Python Package

pip install mcp-contextforge-gateway

# Enable the visual Admin UI (true/false)
export MCPGATEWAY_UI_ENABLED=true

# Enable the Admin API endpoints (true/false)
export MCPGATEWAY_ADMIN_API_ENABLED=true

BASIC_AUTH_PASSWORD=password mcpgateway --host 127.0.0.1 --port 4444

Once running, you’ll have access to http://localhost:4444 with:

Admin Dashboard (/admin) - Manage servers and tools through a web interface.
API Documentation (/docs- and /redoc) Interactive Swagger documentation.
Version, Health and Configuration (/version and /health) — get version, configuration and debugging information. Requires auth (login to admin page first, or use as API to retrieve JSON).
JSON-RPC Endpoint (/rpc) - RPC Endpoint.
Metrics (/metrics) - Performance and usage statistics.

Detailed deployment across Containers, Docker, Compose, Kubernetes, OpenShift, Minikube, Helm, Code Engine, AWS and Azure is available through the project documentation page.

Integration with VS Code and GitHub Copilot

One of the most practical applications is connecting MCP Gateway to GitHub Copilot in VS Code. This gives Copilot access to all your tools through a single, secure connection.

Generate a JWT token and test it:

python -m mcpgateway.utils.create_jwt_token \
  --username admin --exp 0 --secret my-test-key

curl -s -H "Authorization: Bearer $MCPGATEWAY_BEARER_TOKEN" \
     http://localhost:4444/version | jq

Spin up a couple of MCP Servers:

# Install npx (requires node.js / npm)
npm install -g npx

# Instal uvenv
pip install uvenv

# Deploy mcp-server-git (default port 8000)
npx -y supergateway --stdio "uvenv run mcp-server-git"

# Deploy mcp_server_time with a local timezone (port 8001)
npx -y supergateway \
  --stdio "uvenv run mcp_server_time -- --local-timezone=Europe/Dublin" \
  --port 8001

Add the MCP Servers to your gateway, under “Gateways (MCP Registry)”.

Now, create a Virtual Server under the Servers Catalog tab, adding just the tools you want to share with your MCP Clients.

Then, enable MCP support in VS Code by adding "chat.mcp.enabled": true to your GitHub Copilot Chat settings.json. Then add a mcp configuration block as described in VS Code the documentation:

{
  "servers": {
    "gateway": {
      "type": "sse",
      "url": "http://localhost:4444/servers/1/sse",
      "headers": {
        "Authorization": "Bearer YOUR_JWT_TOKEN"
      }
    }
  }
}

Press Ctrl + Alt + I to open Copilot Chat, and click on Tools. You’ll see all your gateway-managed tools available in the Tools panel:

Running the STDIO Wrapper (mcpgateway.wrapper)

Some AI agents (such as Claude Desktop, LangChain CLI, or custom shell-based tools) can’t authenticate over SSE. To support these securely, MCP Gateway offers a wrapper module that connects to the Gateway using JWT-authenticated HTTP and exposes a local STDIO interface. See mcpgateway.wrapper documentation for more info.

For example, Claude Desktop and similar tools can define this wrapper in their config:

{
  "mcpServers": {
    "mcpgateway-wrapper": {
      "command": "python3",
      "args": ["-m", "mcpgateway.wrapper"],
      "env": {
        "MCP_AUTH_TOKEN": "",
        "MCP_SERVER_CATALOG_URLS": "http://localhost:4444/servers/1"
      }
    }
  }
}

You can now use local AI agents or dev tools that speak STDIO to access all tools exposed via your gateway’s virtual servers — authenticated, observable, and fully compatible.

Troubleshooting

You can use MCP Inspector to access the Global Tools catalog: http://localhost:4444/tools — then each of the individual virtual servers you’ve created. Ex: http://localhost:4444/servers/1/sse

npx @modelcontextprotocol/inspector

Real-World Impact

Instead of maintaining separate connections to a dozen different tool servers, each with its own authentication, error handling, and monitoring, you manage one gateway instance. When you need to add a new tool, you register it with the gateway rather than updating every client.

This architectural change becomes more valuable as your tool ecosystem grows. Whether you’re building a personal AI assistant or deploying enterprise-scale automation, MCP Gateway eliminates the integration overhead that typically consumes more time than the actual feature work.

The gateway is production-ready with Kubernetes support, Helm charts, and comprehensive monitoring. You can start with the simple Docker setup and scale up as needed.

Try It Out

Source code: GitHub repository
Documentation: Project docs
Package: PyPI listing

The Model Context Protocol represents an important step toward standardized AI tool integration, but the implementation reality is still fragmented and the protocol and MCP ecosystem is going to evolve over time. MCP Gateway helps bridge that gap, turning different MCP servers into a unified, secure, and observable system.

Give it a try-you’ll spend less time on plumbing and more time building the AI experiences that matter.

Reducing AI Hallucinations with RAG: Writing an entire article from a Podcast Episode

Mihai Criveti — Sun, 15 Oct 2023 00:06:08 GMT

Reducing AI Hallucinations with RAG: Automatic Podcast to blog

Generating a medium.com article automatically from a voice where Mihai Criveti featured on IBM Fellow Jerry Cuomo’s Art of AI Podcast.

I’m hosting a hands-on workshop on Practical GenAI with HuggingFace 🤗 and Python models soon, and decided to challenge myself a bit by generating this entire post using IBM’s watsonx.ai, LLAMA2 and a Retrieval Augmented Generation platform I’m building. This content is based only on a 15 minute podcast where I discuss Large Language Models and Hallucinations with IBM Fellow and VP of Technology, Jerry Cuomo.

I’ve already hand-written an article on Reducing LLM Hallucinations with RAG that the podcast was based on, which is not in the training dataset of the models used. This takes those techniques one step further with Reciprocal Rank Fusion, Hybrid Searching, and more.

The content below is generated by watsonx using LLAMA2, including all the Questions and Answers, which have also been auto-generated. The source content is the “Art of AI Episode 4: Hallucinations, with guest Mihai Criveti“ — a 15 minute podcast where I discuss Large Language Models and Hallucinations with IBM Fellow and VP of Technology, Jerry Cuomo — and has also been converted to text using Speech Recognition model.

The result was formatted automatically, with only a few links added to Jerry’s Podcast, the hand-written article, and our LinkedIn profile. With further metadata enhancement, or connecting to an internet search platform for further RAG context infusion, links could also be generated automatically.

Subscribe to my YouTube Channel: Practical Cloud and AI with Mihai — for more videos and discussions on AI, RAG, reducing Hallucinations and more. I’ll do a deep dive there shortly!

Executive Summary

In “The Art of AI,” Episode 4, hosts Jerry Cuomo and Mihai Criveti discuss the challenges posed by hallucinations in Large Language Models (LLMs). As a solution, Criveti suggests methods such as few-shot prompting and retrieval augmented generation (RAG), which involve providing the model with relevant context and information to improve the accuracy of its responses.

Additionally, they emphasize the importance of attribution and transparency in AI decision-making processes.

Finally, they mention related resources, such as Criveti’s paper on Understanding Generative AI, and invite listeners to continue exploring these topics further. Key takeaways include the potential benefits of RAG and the significance of developing explainable AI systems.

Participant Names

The key participants in this episode of The Art of AI are:

Jerry Cuomo — host, IBM Fellow, and VP of Technology
Mihai Criveti — guest, STSM & Principle Architect, OIC Vice Chair — Technology, ScribeFlow GenAI Lead, and Podcast Host.

Action Items

Here are some potential actions or tasks for listeners based on the contents of the podcast:

Read Mihai Criveti’s article on Medium, “Understanding GenAI Large Language Model Limitations, and How Retrieval Augmented Generation Can Help,” which is mentioned in the podcast.
Research and learn more about Large Language Models (LLMs), their limitations, and how retrieval-augmented generation can help address some of these limitations.
Consider incorporating the approach of providing contextual information and relevant databases or knowledge bases when interacting with LLMs to improve the quality and accuracy of their responses.
Explore the idea of attribute able AI and understand how it works.
Look into the possibility of Fine-Tuning LLMs and compare the benefits and drawbacks of doing so versus other methods of providing context.
Investigate the cost and environmental impact of training and utilizing LLMs and consider whether there are ways to mitigate these factors.

Relevant Questions and Answers

All the questions (and answers) below have been generated by AI, no manual prompting was used.

How can you get more information

Mihai Criveti’s article “Understanding GenAI Large Language Model Limitations, and How Retrieval Augmented Generation Can Help” provides insight into utilizing large language models (LLMs) effectively and responsible.

He highlights three essential strategies:

prompt engineering, where instructions are given to the model along with relevant context;
retrieval augmented generation, involving the integration of a knowledge base or database search into the model’s prompt;
and vector databases, which enable flexible searches based on similarity rather than exact keywords.

These methods aim to minimize hallucinations — the model’s inclination to create responses without proper context or basis in reality — and enhance the accuracy of the produced outputs by incorporating context.

Additionally, they offer advantages like cost efficiency and increased speed compared to entirely retraining the model or fine-tuning it. By implementing these techniques, it is feasible to improve the performance of LLMs and move toward Explainable AI.

What is a large language model

A large language model (LLM) is a type of artificial intelligence (AI) model that processes natural language text. It is trained on vast quantities of text data and uses machine learning algorithms to learn patterns and relationships within the language. These models can generate human-like text, complete sentences, summarize information, and answer questions.

However, despite their capabilities, LLMs have limitations. They can produce erroneous or misleading outputs, known as hallucinations, due to their lack of reasoning abilities and limited context awareness. Additionally, they have constraints regarding the amount of input data they can handle, leading to reduced quality responses beyond a certain threshold.

To address these limitations, techniques such as prompt engineering, retrieval-augmented generation, and vector databases can enhance the accuracy and relevance of LLM responses.

Prompt engineering involves providing specific instructions to guide the model’s output, while retrieval-augmented generation incorporates external knowledge sources to expand the model’s context.

Vector databases utilize vector space representations of words to identify related documents and improve the efficiency of searches. By employing these strategies, organizations can leverage LLMs more effectively and responsibly in various applications, including customer service, writing assistance, and language translation.

What are some of its limitations

Some limitations of large language models (LLMs) include:

1. Models only act on language: They cannot handle PDF files, Word documents, binary files, images, or anything other than text.

2. Limited domain expertise: Although LLMs can generate text similar to human language, they lack domain-specific knowledge; therefore, their responses may need to be corrected regarding specialized domains such as medicine, law, etc.

3. No reasoning or deduction abilities: LLMs lack reasoning skills and cannot perform mathematical calculations or operations. They rely solely on statistical patterns identified during training.

4. No direct access to external sources: LLMs do not directly interact with the Internet or utilize outside data beyond what was included in their initial training dataset. As a result, they are oblivious to current events or fresh facts not incorporated into their instructional material.

5. Hallucination generations: Due to the absence of context, the model may produce erroneous outputs known as hallucinations. These hallucinations can range from grammatical mistakes to illogical statements or blatantly wrong assertions.

6. Factual inaccuracies: Sometimes, LLMs offer replies that appear believable but contain inaccurate details. They draw upon the information given during training, which may include inaccuracies present in the source materials. Therefore, factual accuracy cannot be guaranteed.

7. Contextual misunderstandings: Without proper guidance, LLMs risk producing responses disconnected from the situation due to their restricted capacity for comprehending context. Clear directions must be provided through prompt engineering to address this limitation.

8. Memory restrictions: Typically, LLMs can accept around 4,096 tokens or 3,000 words as input before performance declines. Consequently, offering extensive context becomes impractical without careful selection of essential information.

In summary, despite their impressive capabilities, LLMs face various constraints. To enhance their effectiveness, strategies like prompt engineering, retrieval-augmented generation, and thoughtful context injection become vital.

What can cause them to produce hallucinations

What causes hallucinations in Large Language Models (LLMs)?

Hallucinations in LLMs are caused by several factors, including:

Lack of Context: Sometimes, the model lacks context, leading to misunderstandings and inaccurate responses.

Limited Training Data: LLMs are trained on vast amounts of data; however, the quality and accuracy of this data can vary greatly. As a result, biases and inconsistencies can arise in the training data, causing hallucinations.

Prediction-Based Model: LLMs rely on predictions rather than actual learning, resulting in inaccuracies and hallucinations.

Conversational Memory Limits: There are restrictions to the conversational memory of LLMs since they can only store a limited amount of information from earlier interactions. Therefore, their ability to remember crucial details may be restricted, producing hallucinations.

Poor Quality Prompts: Providing poorly constructed prompts can lead to low-quality replies from LLMs, potentially containing hallucinations.

Misunderstanding Questions: LLMs occasionally comprehend queries incorrectly, triggering erroneous and hallucinatory responses.

Overfitting: Due to overfitting, LLMs might memorize certain patterns in the training data rather than gain generalizable knowledge, contributing to hallucinations.

What can be done to mitigate this problem

What can be done to alleviate this problem?’ refers to the issue of hallucinations in Large Language Models (LLMs). According to the conversation, hallucinations in LLMs occur due to their limitation in reasoning, learning, and utilizing external sources. Instead, they rely solely on producing coherent and fluent texts based on patterns recognized within their training data. As a result, the model may create contradictory or wrong statements, such as claiming that a tomato is both a fruit and a vegetable.

To address this challenge, several methods can be employed to improve the quality of answers provided by LLMs. One approach involves supplying clear and concise prompts, including sufficient background information to assist the model in comprehending the situation.

Another strategy entails incorporating retrieval-augmented generation, which enables searching a knowledge base for pertinent articles and integrating them into the dialogue to enhance the model’s reaction further. Providing instances and restricting inputs to 4, 096 tokens or fewer is recommended since the quality deteriorates beyond that point.

Furthermore, fine-tuning the model is feasible, although it can be costly and yield mixed outcomes, particularly when dealing with real-time data. Therefore, offering adequate context in each cue remains essential.

Additionally, it’s crucial to remember that attributing and explaining AI remain vital steps toward reducing hallucinations. By furnishing users with precise details regarding the source of the solution, such as a particular knowledge base document, PDF file, or database entry, a starting point for explainable AI and content grounding can be established, contributing to minimizing hallucinations significantly.

How can we get more context from the model

In summary, large language models (LLMs) have several limitations, such as not being able to reason, use logic, or access external information. They are also unable to comprehend context, leading to hallucinations or incorrect outputs.

To mitigate these issues, providing context through strategies such as writing clear prompts, injecting context into the prompt, utilizing retrieval-augmented generation, and fine-tuning models can improve response quality. However, there are tradeoffs between quality and cost, and finding the optimal balance is crucial.

What are some examples of this

Some instances of hallucinations in extensive language models (LLMs) include producing erroneous or misleading outputs, such as claiming that the sky is purple or that a tomato is both a fruit and a vegetable. Another illustration is a reaction that negates the request, such as getting a bad review after asking for a favorable assessment of a book.

Additionally, LLMs may offer factually wrong replies, such as stating that someone has authored 173 books when they have only written 173 articles or patents. To address these concerns, methods including prompt design, retrieval-augmented production, and vector databases can be employed.

By giving precise directions and enough background, prompt engineering may assist in ensuring that the model produces truthful and valuable responses.

Why is this important

The Importance of Understanding the Limitations of Large Language Models

In the field of artificial intelligence, large language models (LLMs) have gained significant attention due to their ability to generate human-like language outputs. However, these models are not without their limitations, and it is crucial to recognize these limitations to avoid misusing or misunderstanding their capabilities. In this essay, we will explore the importance of understanding the limitations of LLMs and how doing so can lead to more responsible and effective usage of these models.

Firstly, it is essential to comprehend what LLMs truly are. As stated by Mihai Criveti, a renowned expert in the field, LLMs are merely models that act on language and nothing else. They cannot process images, videos, or any other forms of media; they solely operate on text inputs.

Additionally, these models lack reasoning abilities and do not utilize external sources such as the internet during their processing. Instead, they rely heavily on statistical predictions based on patterns found within the training data. Therefore, it is inaccurate to assume that LLMs possess intelligent thought processes or can engage in independent learning.

Another critical limitation of LLMs is their tendency to produce hallucinations, which refer to outputs that are either incorrect or utterly nonsensical. These hallucinations occur due to the model’s reliance on statistical probabilities rather than genuine understanding.

As highlighted by Criveti, LLMs frequently fail to grasp context, leading to bizarre statements such as “the sky is purple.” To mitigate this challenge, proper contextualization and priming are vital. By supplying adequate background information and specifying the desired topic, users can increase the likelihood of receiving relevant and accurate responses from LLMs.

Furthermore, it is crucial to acknowledge that LLMs do not inherently learn from interactions. Although some platforms may claim that their models adapt to user input, this statement is largely misleading. In reality, the model’s performance may deteriorate over time since it relies on pre-existing statistics and lacks true learning capacities.

Conversely, humans possess the unique ability to develop new connections and comprehend novel ideas through experience and education. Thus, it is unfounded to compare LLMs to human cognition.

To address the challenges posed by LLMs’ limitations, innovative techniques such as few-shot prompting and retrieval augmented generation (RAG) have emerged. RAG, specifically, employs vector databases to enhance conventional searches, allowing for more flexible and inclusive queries. By integrating tools like RAG, users can bolster the accuracy and relevancy of LLMs’ outputs.

What other approaches could be used

Other approaches that could be used to improve the quality of responses generated by large language models include:

Prompt Engineering: Providing clear and concise prompts that contain sufficient context can help elicit better responses from the model. This includes breaking down complicated tasks into smaller parts and providing relevant information along with the query.

Few-Shot Learning: Providing multiple instances or illustrations that demonstrate the desired outcome can assist the model in producing superior replies. This strategy enables the model to learn from fewer examples and deliver appropriate outcomes.

Vector Database: Using vector databases allows for quicker and more effective searches of vast volumes of data. It converts papers into numerical representations, enabling fuzzy searches and the identification of comparable vectors rather than precise terms. As a result, vector databases may enhance the precision and pertinence of retrieved material.

Conversational Memory: By incorporating prior dialogue history into the chat, conversational memory enables the model to comprehend the context of a discussion better. Doing so improves the likelihood of receiving suitable and cohesive replies from the model.

Fine-Tuning: Although costly and resource-intensive, fine-tuning models can improve their accuracy and effectiveness on particular jobs. With adequate funds and computing power, fine-tuning can boost performance.

Contextual Understanding: Ensuring that the model possesses contextual awareness and understands the intent behind a request is crucial. One method to achieve this is to specify that the model won’t create anything it isn’t familiar with. Instead, it would rely on reliable sources.

Retrival Augemented Generation: RAG combines retrieval and generation capabilities to produce higher-quality responses. It involves inserting pertinent articles or excerpts from a database into the model’s prompt to offer context and guide the creation of improved responses.

Knowledge Base Integration: Combining a big language model with a structured repository of information can increase its capacity to produce insightful and exact answers. The model can access relevant details and offer users more thorough and informed responses by integrating a knowledge base.

What is the benefit of vector databases

Benefits of Vector Databases: Improved Search Functionality and Explainable AI

Vector databases offer various benefits over traditional SQL databases, particularly when utilized in conjunction with large language models (LLMs) for natural language processing tasks. One significant advantage is the ability to perform efficient similarity searches, which enhances the effectiveness of the model in generating relevant and accurate responses.

Unlike traditional databases that rely solely on exact keyword matching, vector databases employ vectorization techniques to transform texts into numerical representations, enabling the identification of semantically related concepts. As a result, queries yield more precise and diverse outcomes, better addressing users’ needs and minimizing hallucinations typically associated with LLMs.

Moreover, incorporating vector databases contributes to developing explainable AI systems since it allows for tracing back the sources used to inform generated responses, promoting transparency and trustworthiness in AI interactions. By integrating vector databases into their workflows, developers can create more reliable and accountable language models, leading to improved overall performance in natural language processing applications.

What is the last word of the podcast

The final statement of the provided material is: “See you again on an upcoming episode.” Therefore, the ultimate word of the podcast is “episode”.

What is the name of the podcast Mihai was on

The name of the podcast is The Art of AI.

Understand GenAI Large Language Model limitations, and how Retrieval Augmented Generation can help

Mihai Criveti — Sat, 09 Sep 2023 06:36:32 GMT

Add context from private data and documents to GenAI LLMs to reduce hallucinations and increase performance through Retrieval Augmented Generation.

Use Cases for Large Language Models

This article is featured on the Hallucinations episode from “The Art of AI” podcast, where host Jerry Cuomo has a conversation Mihai Criveti on practical approaches to reduce LLM hallucinations.

I’ve also written an ‘AI Version’ of this article here, written entirely by AI, using advanced RAG and hallucination reduction techniques, using only the audio podcast as input. Check it out!

Key use cases for Large Language Models include:

Generation: LLMs can be used to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. For example, LLMs can be used to generate realistic dialogue for chatbots, write news articles, or even create poems.
Summarization: LLMs can be used to summarize text, extract the main points of an article or document, and create a shorter version that is still accurate and informative. For example, LLMs can be used to summarize research papers, news articles, or even books.
Classification: LLMs can be used to classify text, identify the topic of a document, and determine whether it is positive or negative, factual or opinion, etc. For example, LLMs can be used to classify customer reviews, social media posts, or even medical records.
Extraction: LLMs can be used to extract information from text, identify specific entities or keywords, and create a table or list of the extracted information. For example, LLMs can be used to extract contact information from a business card, product information from a website, or even scientific data from a research paper.
Q&A: LLMs can be used to answer questions in an informative way, even if they are open ended, challenging, or strange. For example, LLMs can be used to answer questions about a particular topic, provide customer support, or even generate creative text formats of text content.

While Generative AI Large Language Models often seem like a panacea, they suffer from a number of key issues.

https://medium.com/media/678b1fcc9e969739c112742a887a72f1/href

Hallucinations: outputs of LLMs that deviate from facts or contextual logic. Models will ‘make stuff up’ if they don’t know an answer. They also suffer from a lack of contextual understanding. Hallucinations can range from sentence contradictions, prompt contradictions, factual contradictions or just non-sense/noise. Techniques like few-shot prompting and RAG can help.
Inference Performance: even the faster models are slower than a dial-up modem, or a fast typist! They also suffer from latency or time to first token. For most queries, expect 10–20 second response times from most models, and even with streaming, you’ll end up waiting a few seconds for the first token to be generated!
Inference Cost: LLMs are expensive to run! Some of the top 180B parameter models may need as many as 5xA100 GPUs to run, while even quantized versions of 70B LLAMA would take up a whole GPU! That’s one query at a time. The costs add up. For example, a dedicated A100 might cost as much as $20K a month with a cloud provider! A brute force approach is going to be expensive.
Stale training data: even top models haven’t been trained on ‘recent’ data, and have a cut-off date. Remember, a model doesn’t ‘have access to the internet’. While certain ‘plugins’ do offer ‘internet search’, it’s just a form of RAG, where ‘top 10 internet search query results’ are fed into the prompt as context, for example.
Use with private data: LLMs haven’t been trained on *your* private data, and as such, cannot answer questions based on our dataset, unless that data is inject through fine tuning or prompt engineering.
Token limits / context window size: Models are limited by the TOKEN_LIMIT, and most models can process, at best, a few pages of total input/output. You can’t feed a model and entire document, and ask for a summary or extract facts from the document. You need to chunk documents into pages first, and perform multiple queries.
They only support text: while this sounds obvious (from the name), it also means you can’t just feed a PDF file or WORD document to a LLM. You first need to convert that data to text, and chunk it to fit in the token limit, alongside your prompt and some room for output. Conversions aren’t perfect. What happens to your images, or tables, or metadata? It also means models can only output text. Formatting the text to output HTML or DOCX or other rich text formats requires a lot of heavy lifting in our pipeline.
Lack of transparency / explainability: why did the model generate a particular answer? Techniques such as RAG can help, as you are able to point at the ‘context’ that generated a particular answer, and even display the context. While the LLM answer may not necessarily be correct, you can display the source content that helped generate that answer.
Potential bias, hate, abuse, harm, ethical concerns, etc: sometimes, answers generated by an LLM can be outright harmful. Using the RAG pattern, in addition to HARM filters can help mitigate some of these issues. Models are also vulnerable to various forms of Prompt Hacking / Prompt Injection where you can trick the model to respond in a way it wasn’t designed to.
Training and fine tuning costs: to put it in perspective, a 70B model like LLAMA2 might need ~2048 A100 GPUs for a month to train, adding up to $20–40M training cost, not to mention what it takes to download and store the data. The: “Training Hardware & Carbon Footprint” section from the LLAMA2 paper suggests a total of 3311616 GPU hours was used to train LLAMA2 (7/13/34 and 70B)!

10 Limitations of Large Language Models and Mitigation Options

It helps to think of of Large Language Models (LLMs) like mathematical functions, or your phone’s autocomplete:

f(x) = x’

Where the input (x) and the output (x’) are strings. The model starts by looking at the input, then will ‘autocomplete’ the output.
For example, f(“What is Kubernetes”) = “Kubernetes, often abbreviated as K8s, is an open-source platform designed to automate deploying, scaling, and operating application containers.”
Most chat interfaces will also provide a default system prompt. For LLAMA2, this is: “You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."
Depending on the model and interface, there may be ‘hidden’ inputs to your model. Many Chat interfaces will include a conversational memory, where they insert a moving window of your previous prompts into the current prompt, as context. It would look something like this: “Below are a series of dialogues between a user and a AI assistant…. [dialogues] [new content]“

The inputs to a model are a little more complex though:

f(training_data, model_parameters, input_string) = output_string

training_data represents the data it was trained (different models will provide different answers). While not an ‘input’ as such, the data the model was trained (and how it was trained) on plays a key factor in the output.
model_parameters represent things like “temperature”, “repetition penalty”, “min tokens” or “max tokens”, “top_p”, “top_k”, and other such values.
input_string is the combination of prompt and context you give to the model. Ex: “What is Kubernetes” or “Summarize the following document: ${DOCUMENT}”
the ‘prompt’ is usually an optional instruction like “summarize”, “extract”, “translate”, “classify” etc. but more complex prompts are usually used. “Be a helpful assistant that responds to my question.. etc.”
The function can process a maximum of TOKEN_LIMIT (total input and output), usually ~4096 tokens (~3000 words in English, fewer in say.. Japanese). Models with larger TOKEN_LIMITS exist, though they usually don’t perform as well above the 4096 token limit. This means, in practice, you can’t feed a whole whitepaper to an LLM and ask it to ‘summarize this document’, for example.

What Large Language Models DON’T DO

Learn: A model will not ‘learn’ from interactions (unless specifically trained/fine-tuned).

Remember: A model doesn’t remember previous prompts. In fact, it’s all done with prompt trickery: previous prompts are injected. The API does a LOT of of filtering and heavy lifting!

Reason: Think of LLMs like your phone’s autocomplete, it doesn’t reason, or do math.

Use your data: LLMs don’t provide responses based on YOUR data (databases or files), unless it’s include in the training dataset, or the prompt (ex: RAG).

Use the Internet: A LLM doesn’t have the capacity to ‘search the internet’, or make API calls.

In fact, a model does not perform any activity other than converting one string of text into another string of text.
Any 3rd party data not in the model will need to be injected into prompts (RAG)

Cite Sources: models may ‘appear’ to cite sources, but that’s not reliable. It’s more likely to be a hallucination. Asking a model what content generated an output isn’t going to provide a relevant response. Retrieval Augmented Generation can help provide attribution / grounding though.

Adding an LLM to your software architecture:

A LLM is much much slower than a Faxmodem!

Believe it or not, LLMs are much slower than even a faxmodem! At WPM = ((BPS / 10) / 5) * 60, a 9600 baud modem will generate 11520 words / minute.

At an average 30 tokens / second (20 words) for LLAMA-70B, you’re getting 1200 words / minute!

Large models (70B) such as LLAMA2 can be painfully slow. Smaller models (20B, 13B, 7B) are faster, and require less GPU to run. Quantized models are also faster, but provide lower quality responses.

Quantize your model for faster inference

You can load and quantize your model in 8, 4, 3 or even 2 bits, sacrificing quality for faster inference speed.

This is always a trade-off, as you’re sacrificing model output quality for faster inferencing. Since a quantized model needs less GPU VRAM to run in, this helps you run large models on commodity hardware.

Why do models hallucinate?

Data Quality: The model itself has been trained on biased, noisy, old, low quality or incorrect data. For example, models trained on forums and other such data.
Generation Method: Models and their weights might be biased towards specific languages, words or data
Lack of context or contextual understanding: The input prompt is contradictory, or unclear. The prompt does not provide sufficient examples of the desired output. The model lacks context to respond to the input, either in it’s dataset or the prompt. This is within the control of the user, and can be improved through prompt engineering.

https://medium.com/media/e593888bbaeb099ddede40f41c3fcfdf/href

Reducing model hallucinations:

LLMs lack context from private data — leading to hallucinations when asked domain or company-specific questions. RAG can help reduce hallucinations by ‘injecting’ context into prompts.

Workarounds also include advanced prompt engineering, adding a prompt such as: If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct, or providing examples through one-shot or few-shot prompting.

Other techniques include activate mitigations, such as controlling the parameters of the model, such as “temperature”, which controls how ‘creative’ the response is.

Another approach is Chain-of-Verification (CoVe) — where the a LLM generates verification questions to fact check the initial response, then answers them, then verifies the response against the initial response. As you can imagine, these are very ‘costly’ operations from a computational / time perspective, but may be required where accuracy is paramount.

Papers:

Retrieval Augmented Generation and the importance of Vector Databases

A vector database is a specialized database designed to store and query vector embeddings efficiently. Vector embeddings are numerical representations of text, images, audio, or other data. They are used in a variety of machine learning applications, such as natural language processing, image recognition, and recommendation systems.

Near Vector search or how to Search for “Sky” and find “Blue”:

Finding the most similar documents to a given document
Finding documents that contain a specific keyword or phrase
Clustering documents together based on their similarity
Ranking documents for a search query

Popular vector databases include ChormaDB, Weaviate, Milvus.

Advantages of using a VectorDB with your LLM, in a Retrieval Augmented Generation Pattern:

Insert your data into prompts every time
Cheap, and can work with vast amounts of data
While LLMs are SLOW, Vector Databases are FAST!
Can help overcome model limitations (such as token limits) — as you’re only feeding ‘top search results’ to the LLM, instead of whole documents.
Reduce hallucinations by providing context.

Loading Documents into your Vector Databases:

Loading data into your vector database typically requires you to convert documents to text, split the text into chunks, then vectorize those chunks using an embedding model. SentenceTransformers offers a number of pre-trained models, such as all-mpnet-base-v2 or all-MiniLM-L12-v2 that perform well for English text.

Scaling factor for RAG: what to consider:

Vector Database: consider sharding and High Availability
Fine Tuning: collecting data to be used for fine tuning
Governance and Model Benchmarking: how are you testing your model performance over time, with different prompts, one-shot, and various parameters
Chain of Reasoning and Agents
Caching embeddings and responses
Personalization and Conversational Memory Database
Streaming Responses and optimizing performance. A fine tuned 13B model may perform better than a poor 70B one!
Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are terrible at reasoning and prediction, consider calling other models)
Fallback techniques: fallback to a different model, or default answers
API scaling techniques, rate limiting, etc.
Async, streaming and parallelization, multiprocessing, GPU acceleration (including embeddings), generating your API using OpenAPI, etc.
Retraining your embedding model

RAG Talk from Shipitcon can be found on GitHub and YouTube:

Social media

https://twitter.com/CrivetiMihai — follow for more LLM content
https://youtube.com/CrivetiMihai — more LLM videos to follow
https://www.linkedin.com/in/crivetimihai/

Building Data Science Environments

Mihai Criveti — Fri, 04 Aug 2023 16:47:49 GMT

Data Science on the Mainframe

Docker, Python and Jupyter Notebook on zLinux.

Data Science environments with Docker Compose running Jupyter Lab, PostgreSQL, PGAdmin, Superset, Grafana and Traefik — running on zLinux on the IBM Mainframe.

Here, I’m using Red Hat Enterprise Linux 7.5 to build and deploy Jupyter notebook in an Ubuntu container, create a Redis Alpine container and a postgresql container — then link them using docker-compose.

Oh, and in case you’re wondering: why would anyone do this — check out this snippet from the z14 announcement: “Microservices can be built on z14 with Node.js, Java, Go, Swift, Python, Scala, Groovy, Kotlin, Ruby, COBOL, PL/I, and more. They can be deployed in Docker containers where a single z14 can scale out to 2 million Docker containers”.

A few basic commands:

Establish the OS release and version. We’re running on RHEL 7.5 for s390x.

[cmihai@rh74s390x ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)

[cmihai@rh74s390x ~]$ uname -a
Linux rh74s390x.novalocal 3.10.0-693.17.1.el7.s390x #1 SMP Sun Jan 14 10:38:29 EST 2018 s390x s390x s390x GNU/Linux

[cmihai@rh74s390x ~]$ docker --version
Docker version 17.05.0-ce, build 89658be

Install or Upgrade Docker:

$ sudo ./icp-docker-18.03.1_s390x.bin --upgrade

$ /usr/bin/docker --version
Docker version 18.03.1-ce, build 9ee9f40

Setup regular user access, sudo and SSH keys

Create a regular user account

useradd cmihai
passwd cmihai
usermod -aG wheel cmihai
su - cmihai

Add your SSH public key to authorized_keys

Log in as your new user, and forward port 9000:

Setup docker

Create the Docker group

Start Docker

Test docker

Let’s run a simple Ubuntu interactive shell:

Building a Docker container for Jupyter Notebook

Create a file Dockerfile_jupyter from the s390x/ubuntu base image.

Build the container:

Run your new container:

Connect to Jupyter Notebook

http://localhost:9000

You can now install depedencies directly from Jupyter:

Setup docker-compose:

Setup Docker Compose and build multi-tiered applications specifications — such as connecting your Jupyter Notebook to PostgreSQL, Redis, Spark, etc.

Setup and link Juypter to a database container (Redis):

Create a Redis container image:

Create a `Dockerfile_redis` from the alpine image, a light weight Linux distribution:

Then build the container

Setup docker-compose to link Jupyter to Redis:

Create a file compose-redis-jupyter.yaml

Start your container

Connect Jupyter to Redis with Python code:

In Jupyter Notebook:

You now have access to Jupyter Notebook with a Python 3 kernel and key-value pair store (Redis).

Setup Postgresql:

Many of the official images run on multiple architectures. This includes the postgres image, which will run on s390x. Let’s connect Jupyter Notebook to postgres, using docker-compose:

In Jupyter, we can no access postgres, as show below:

Jupyter Notebook will connect to the postgres database:

Potential next steps:

Set up other programming languages or kernels (Java, R) even Zeppelin Notebook
Setup Spark or Kafka

For an interactive tutorial of using Docker for Data Science, check out: https://github.com/crivetimihai/docker-data-science

Manage Containers with Podman, Skopeo and Buildah

Mihai Criveti — Fri, 04 Aug 2023 16:42:56 GMT

Part 1

Install podman, buildah and skopeo

Part 2

Publish images to external registries
Quay.io and Clair

Part 3

Install CodeReady Containers, Create a project called wordpress
Create users and groups and setup htpassword authentication
Deploy mysql from registry.redhat.io/rhel8/mysql-80 and configure the secret
Deploy wordpress from docker.io/library/wordpress:5.5.0-php7.2-apache
Create a route and test wordpress, scale out

What is Podman?

podman — manage pods, containers and OCI compliant container images

How is Podman different?

Can be run as a regular user without requiring root.
Can manage pods (groups of one or more containers that operate together).
Lets you import Kubernetes definitions using podman play.
Fork-exec model instead of client-server model (containers are child processes of podman).
Compatible with Docker, Docker Hub or any OCI compliant container implementation.

https://www.redhat.com/en/blog/why-red-hat-investing-cri-o-and-podman https://developers.redhat.com/blog/2019/02/21/podman-and-buildah-for-docker-users/

What is Buildah?

buildah — build container images from CLI or Dockerfiles

How is Buildah different?

Containers can be build using simple CLI commands or shell scripts instead of Dockerfiles.
Images can then be pushed to any container registry and can be used by any container engine, including Podman, CRI-O, and Docker.
Buildah is also often used to securely build containers while running inside of a locked down container by a tool like Podman, OpenShift/Kubernetes or Docker.

What is Skopeo?

skopeo — inspect and copy containers and images between different storage

How does Skopeo help?

It can copy images to and from a host, as well as to other container environments and registries.
Skopeo can inspect images from container image registries, get images and image layers, and use signatures to create and verify images.

Red Hat Image Sources Explained

Red Hat Software Collections Library (RHSCL)

For developers that need the latest versions of tools not in the RHEL release schedule.
Use the latest development tools without impacting RHEL.
Available to all RHEL subscribers.

Red Hat Container Catalog (RHCC)

Certified, curated and texted images built on RHEL.
Images have gone through a QA process.
Upgraded on a regular bases to avoid security vulnerabilities.

Quay.io

Public / private container repository.

https://github.com/sclorg?q=-container for Dockerfiles

Universal Base Image — UBI

Red Hat Universal Base Image — UBI

UBI — Freely distributable OCI compliant secure container base images based on RHEL

How does UBI Help?

More than just a base image, UBI provides three base images across RHEL 7 and RHEL 8: ubi, ubi-minimal and ubi-init
And a set of language runtimes (ex: nodejs, ruby, python, php, perl, etc.)
All packages in UBI come from RHEL channels and are supported on RHEL and OpenShift.
Secure by default, maintained and supported by Red Hat.

The Red Hat Container Catalog

Certified container images from Red Hat and 3rd party vendors

Container Images with a Container Health Index

Pulling a container image

podman pull registry.access.redhat.com/ubi8/python-38

Podman Compose

What is podman-compose?

An implementation of docker-compose with Podman backend.

Why podman-compose and when to use it?

run unmodified docker-compose.yaml files, rootless
no daemon or setup required
Only depends on podman, Python 3 and PyYAML.

When NOT to use podman-compose?

When you can use podman pod or podman generate and podman play` instead to create pods or import Kubernetes definitions.
For single-machine development, consider CodeReady Containers
For multi-node clusters, check out Red Hat OpenShift, Kubernetes or OKD.

https://developers.redhat.com/blog/2019/01/15/podman-managing-containers-pods/

Install podman, buildah and skopeo

Fedora 32 / RHEL 8

# Install podman, buildah and skopeo on Fedora 32
sudo dnf -y install podman buildah skopeo slirp4netns fuse-overlayfs

Ubuntu / Debian

sudo apt update && sudo apt -y install podman buildah skopeo

Getting help

podman version
podman — help # list available commands
man podman-ps # or commands like run, rm, rmi, image, build
podman info # display podman system information

https://podman.io/getting-started/installation

slirp4netns is used to connect a network namespace to the internet in a rootless way.

Rootless Containers and cgroup v2

Note that our regular user has UID 1000

uid=1000(cmihai) gid=1000(cmihai) groups=1000(cmihai)

What are UIDs mapped to inside the container?

podman unshare cat /proc/self/uid_map

0 1000 1
1 100000 65536

UID 0 is mapped my UID (1000). UID 1 is mapped to 100000, UID 2 would map to 100001, etc. That means that a container UID of 27 would map to UID 1000026.

Let’s test this

mkdir test && podman unshare chown 27:27 test
ls -ld test

drwxrwxr-x. 2 100026 100026 4096 Sep 27 09:38 test

https://developers.redhat.com/blog/2020/09/25/rootless-containers-with-podman-the-basics/ https://podman.io/blogs/2019/10/29/podman-crun-f31.html

Running Containers with Podman

Searching for Images with podman search

Configure search sources

grep search /etc/containers/registries.conf

unqualified-search-registries =
[‘registry.fedoraproject.org’, ‘registry.access.redhat.com’,
‘registry.centos.org’, ‘docker.io’]

Searching for images (with filters)

podman search httpd — filter=is-official

INDEX NAME DESCRIPTION STARS OFFICIAL
docker.io docker.io/library/httpd The Apache HTTP Server 3181 [OK]

podman can be configured to search multiple private or public container registries for images.

Adding a local registry configuration

Create a configuration file

mkdir -p ~/.config/containers

Add public and private registries in search order

vim $HOME/.config/containers/registries.conf

[registries.search]
registries = ["registry.access.redhat.com", "quay.io", "docker.io"]

[registries.insecure]
registries = ['localhost:5000']

Inspecting and pulling images

Inspecting Images with skopeo (ex: listing tags)

skopeo inspect docker://docker.io/library/httpd

Inspect the image with podman and show image history

podman inspect httpd:2.4.46-alpine
podman history httpd:2.4.46-alpine

Pulling Images locally with podman pull

podman pull docker.io/library/httpd:2.4.46-alpine
podman images

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/managing_containers/finding_running_and_building_containers_with_podman_skopeo_and_buildah Skopeo lets you inspect and copy images between registries.

Running containers in interactive mode

Run an interactive session

podman run — name ubuntu — hostname ubuntu \
— interactive — tty ubuntu /bin/bash

Reattach

podman start — attach — interactive ubuntu

Delete your container on exit

podman run — rm — name ubuntu — hostname ubuntu \
— interactive — tty ubuntu /bin/bash

At this point, you could save your modified container and load it into a registry, but there’s a better way: build your container using a Dockerfile or buildah.

Running Containers in the background

Run a sample httpd container to serve a webpage

# Run a container in the background, bind port 8080
podman run — name httpd — detach \
— publish 8080:8080/tcp \
registry.fedoraproject.org/f29/httpd

We’ve named the container httpd to make it easier to access later.
Port 8080 inside the container is redirected to 8080 on the host.
Notice that we’re using an image that binds to a non-privileged (8080) port.

Test the webpage

curl localhost:8080

Check the process

ps -ef | grep podman

Check container status and logs

Check the container status and logs

# List the running containers
podman ps

# Inspect the (last) ran container — check the Env and IP sections
podman inspect -l

# Check the container logs
podman logs httpd

The Env section is especially useful to identify env. variables used to start the container.

Starting and stopping containers

Stop the container and check the status

podman stop httpd
podman ps -a

IMAGE CREATED STATUS
httpd:latest 5 minutes ago Exited (0) 2 seconds ago

Start the container back

podman start httpd
podman ps
CREATED STATUS PORTS NAMES
7 minutes ago Up 13 seconds ago 0.0.0.0:8080->8080/tcp httpd

Using Environment Variables

Search and inspect an image

podman search mysql-57-rhel7
skopeo inspect \
docker://registry.access.redhat.com/rhscl/mysql-57-rhel7 \
| grep usage

Deploy MySQL

podman run — name mysql \
-e MYSQL_USER=user -e MYSQL_PASSWORD=password \
-e MYSQL_DATABASE=mydb -e MYSQL_ROOT_PASSWORD=password \
— detach rhscl/mysql-57-rhel7:latest

Check the logs

podman logs mysql

Executing a command in a running container with podman-exec

Inspect the environment (different methods show)

podman inspect -f ‘{{ .NetworkSettings.IPAddress }}’ mysql
podman exec mysql env

Execute a shell inside the mysqld container

podman exec -it mysql bash
mysql -uroot
show databases;
use mydb;

exit

Execute a command

podman exec -it mysql \
/opt/rh/rh-mysql57/root/usr/bin/mysql \
-uuser -ppassword -e ‘show databases;’

podman exec -it mysql /bin/bash mysql -u root show databases;

Note that rootless podman won’t show an IPAddress https://podman.io/getting-started/network.html — run podman with sudo to obtain IP.

Container Resources

Check processes inside container

podman top -l

USER PID PPID %CPU ELAPSED TTY TIME COMMAND
default 1 0 0.000 2m4.13954806s ? 0s httpd -D FOREGROUND
default 18 1 0.000 2m4.139682033s ? 0s /usr/bin/coreutils

Display live stream of resource usage statistics

podman stats

ID NAME CPU % MEM USAGE / LIMIT MEM % NET IO BLOCK IO PIDS
00b65 httpd 0.07% 40.91MB / 67.4GB 0.06% — / — — / — 217

Check published ports

podman port -l

8080/tcp -> 0.0.0.0:8080

Commiting, Saving and Loading Images

Create an image

podman run — name ubuntu-apache2 — hostname ubuntu-apache2 \
— interactive — tty ubuntu:20.04 /bin/bash

# Install Apache HTTPD
apt update && apt install -y apache2 && exit

List changed files (A — added, C — changed, D — deleted)

podman diff ubuntu-apache2

Commit image from container with entrypoint and label

podman commit — change CMD=/bin/bash — change ENTRYPOINT=/bin/sh \
— change “LABEL author=cmihai” ubuntu ubuntu-apache2

Save an image

podman save -o ubuntu-apache2.tar ubuntu-apache2

Load an image

podman load -i ubuntu-apache2.tar ubuntu-apache2

podman commit creates an image based on a changed container.

Modifying the Apache image port form 80 to 8080

Use podman commit to create a custom Apache HTTPD image that listens on port 8080.

Search for the official image and run it

podman search httpd — filter=is-official
podman run -it — name httpd-docker docker.io/library/httpd:2.4 /bin/bash

Change the port name

sed -i ‘s/Listen 80/Listen 8080/g’ /usr/local/apache2/conf/httpd.conf
exit

Same to a new image

podman stop httpd-docker
podman diff httpd-docker
podman commit -a ‘Mihai’ httpd-docker cmihai/httpd:2.4
podman images | grep cmihai/httpd
podman rm httpd-docker

Test the new image

podman run — detach — publish 8080:8080 — name httpd-cmihai docker.io/library/httpd:2.4
curl localhost:8080

Tagging or Removing Tags from Images

Tag and image

podman tag ubuntu-apache2 cmihai/apache2
podman tag ubuntu-apache2 cmihai/apache2:latest

Remove a tag from an image

podman rmi cmihai/apache2:latest

Multiple tags can point to the same image.

Pusing an image to a Registry

Tag the image in your local repository

podman tag ubuntu-apache2 quay.io/apache2:latest

Push to quay.io

podman push quay.io/cmihai/apache2:latest

Example build and push

podman login quay.io
podman build --layers=false -t cmihai/jupyterlab:python38 .
podman tag localhost/cmihai/jupyterlab:python38 \
quay.io/cmihai/jupyterlab:latest
podman push quay.io/cmihai/jupyterlab:latest

Podman will automatically add the latest tag if you do not specify a tag!

Volumes

Create a volume directory on the host and provide permissions

mkdir myvol
podman unshare chown 999:999 myvol

https://www.redhat.com/sysadmin/user-namespaces-selinux-rootless-containers

Create a container and attach a volume to /data as rw

podman run — rm — name ubuntu \
— volume ./myvol:/data:Z \
— interactive — tty ubuntu /bin/bash

-Z tells podman to relabel the volume’s content to match the label inside the container
podman unshare runs a command inside a modified user namespace.

SELinux Permissions — Manual approach without using unshare

Let’s check what a MySQL container runs as

podman run -ti rhscl/mysql-57-rhel7:latest grep mysql /etc/passwd

mysql:x:27:27:MySQL Server:/var/lib/mysql:/sbin/nologin

Create a directory with owner and group root and give matching permissions

sudo mkdir mysql-data && sudo chown -R 27:27 mysql-data

Apply the SELinux container_file_t context and policy

sudo semanage fcontext -a -t container_file_t ‘./mysql-data(/.*)?’
sudo restorecon -Rv ./mysql-data

Running MySQL with a host directory volume

Start MySQL

podman run — name mysql \
-v ./mysql-data:/var/lib/mysql/data:Z \
-e MYSQL_USER=user -e MYSQL_PASSWORD=password \
-e MYSQL_DATABASE=mydb -e MYSQL_ROOT_PASSWORD=password \
— detach rhscl/mysql-57-rhel7:latest

Troubleshoot logs and permissions

podman logs mysql; sudo find ./mysql-data; ls -dnZ mysql-data

drwxrwxr-x. 6 100026 100026 system_u:object_r:container_file_t:s0:c303,c890
4096 Sep 27 09:09 mysql-data

Check out the permissions inside the container

podman exec mysql ls -lanZ /var/lib/mysql/data

drwxrwxr-x. 27 27 system_u:object_r:container_file_t:s0:c303,c890 .

You could just use:

mkdir mysql-data
podman unshare chown 27:27 mysql-data
ls -dZ mysql-data

system_u:object_r:container_file_t:s0:c303,c890 mysql-data

Volumes — MariaDB

Let’s try this one more time with mariadb/server

Network

Linking Containers

Pods

Cleanup

Stop and remove the container

podman stop httpd
podman rm httpd
podman ps -a # Check that the containers are deleted

Removing the container image

podman rmi registry.fedoraproject.org/f29/httpd
podman images # List images

Delete everything (stopped containers, pods, dangling images and build cache)

podman system prune

Setting a container to start at boot using systemd

Enable SELinux and start your container

setsebool -P container_manage_cgroup on
sudo podman run -d — name redis_server -p 6379:6379 redis

vim /etc/systemd/system/redis-container.service

[Unit]
Description=Redis container

[Service]
Restart=always
ExecStart=/usr/bin/podman start -a redis_server
ExecStop=/usr/bin/podman stop -t 2 redis_server

[Install]
WantedBy=local.target

Enable the service

systemctl enable redis-container.service
systemctl start redis-container.service
systemctl stop redis-container.service
systemctl restart redis-container.service
systemctl status redis-container.service

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/managing_containers/running_containers_as_systemd_services_with_podman

Creating a Wordpress + MariaDB Server pod

Create a volume and apply permissions

mkdir mariadb-data && podman unshare chown 999:999 mariadb-data

Create a pod

podman pod create — publish 127.0.0.1:8888:80 — name wordpress-pod

Start a pod to get info

podman run — rm mariadb:latest — verbose — help

Start MariaDB with a custom configuration and data volume

podman run — name mariadb — pod wordpress-pod \
-v ./mariadb-data:/var/lib/mysql:Z \
-e MYSQL_USER=wpsuser -e MYSQL_PASSWORD=password \
-e MYSQL_DATABASE=wordpress -e MYSQL_ROOT_PASSWORD=password \
— detach mariadb:10.5.5

Start wordpress

podman run — name wordpress — pod wordpress-pod \
-e WORDPRESS_DB_HOST=127.0.0.1:3306 \
-e WORDPRESS_DB_USER=wpsuser -e WORDPRESS_DB_PASSWORD=password \
-e WORDPRESS_DB_NAME=wordpress \
— detach wordpress:latest

Building Containers with Podman from a Containerfile

Building container images from Containerfile with podman

Create a Containerfile for Jupyter Lab starting from a UBI

FROM registry.access.redhat.com/ubi8/python-38

RUN pip install — upgrade — no-cache-dir jupyterlab

EXPOSE 8888
CMD [ “jupyter”,”lab”,” — ip=0.0.0.0" ]

Build the container image

podman build — layers=false -t cmihai/jupyterlab .

Test the image

podman run — name jupyterlab — detach — publish 8888:8888/tcp cmihai/jupyterlab
podman logs jupyterlab # Retrieve the token to log in

Building Containers with Buildah

Building container images with buildah

container=$(buildah from fedora)
buildah run ${container} dnf install -y texlive
wget https://github.com/jgm/pandoc/releases/download/2.9.2/pandoc-2.9.2-linux-amd64.tar.gz
tar zxf pandoc-2.9.2-linux-amd64.tar.gz
buildah copy ${container} pandoc-2.9.2/bin /usr/local/bin
buildah commit ${container} cmihai/pandoc
buildah images
buildah inspect ${container}
podman run cmihai/pandoc pandoc --version

Pushing images to a container registry with Quay

Logging into Quay

After creating a quay.io account and password, login using podman

podman login quay.io

Username: cmihai
Password: ( password here)

Creating a new container

Create a new container image based on UBI8 Minimal

podman run — name ubi8-httpd \
-it registry.access.redhat.com/ubi8/ubi-minimal:latest \
/bin/bash

microdnf update -y
microdnf -y install httpd
microdnf clean all
rm -rf /var/cache/yum

Commit the container image

Get the container ID (or name) and commit the changes to an image

podman ps -l
podman commit ubi8-httpd quay.io/cmihai/ubi8-httpd:latest

Check that the image is there

podman images | grep cmihai/ubi8-httpd

quay.io/cmihai/ubi8-httpd latest 8535a6affc3e 15 seconds ago 209 MB

Building the same image using a Dockerfile

FROM registry.access.redhat.com/ubi8/ubi-minimal
USER root
LABEL maintainer=”Mihai Criveti”

# Update image
RUN microdnf update -y && microdnf install -y httpd \
&& microdnf clean all && rm -rf /var/cache/yum \
&& echo “Apache” > /var/www/html/index.html

# Port
EXPOSE 80

# Start the service
CMD [“-D”, “FOREGROUND”]
ENTRYPOINT [“/usr/sbin/httpd”]

Build and tag the image

podman build . -t cmihai/ubi8-httpd:latest
podman tag localhost/cmihai/ubi8-httpd quay.io/cmihai/ubi8-httpd:latest

Test the image

Run the image

podman run — detach — name httpd — publish 8080:80 quay.io/cmihai/ubi8-httpd:latest

Check logs and server status

podman logs httpd
podman port -l
curl localhost:8080

Push the image to quay

Push the local tagged image to quay.io

podman push quay.io/cmihai/ubi8-httpd

Check the image on quay.io

Check that the image is there

podman pull https://quay.io/repository/cmihai/ubi8-httpd

Visit the image page:

See: https://quay.io/repository/cmihai/ubi8-httpd

Customize the image information

Create a Description

Creating a build directly on quay.io

Go to https://quay.io/repository/cmihai/ubi8-httpd?tab=builds
Upload your build Dockerfile

GeoSpacial Engineering with Podman, PostGIS and QGIS

Mihai Criveti — Fri, 04 Aug 2023 16:34:54 GMT

Docker PostGIS and PGAdmin

In this article, I will show you how to:

Create a Postgis docker image FROM postgres and publish it to hub.docker.com.
Create a Geospacial Database environment in Docker Compose with PostGIS and PGAdmin4.
Use PGAdmin4 to create a database an enable Geospacial extensions.

1. Build a PostGIS Docker Image

A Dockerfile contains instructions to build an image.
Start by extending an existing PostgreSQL 10 Debian image: FROM postgres:10.
Use apt-get to install required PostGIS extensions.
Use docker build to create the image, then push it to docker hub.

Creating the Dockerfile

# Extend exiting PostreSQL 10 Debian image: https://hub.docker.com/_/postgres/
FROM postgres:10

MAINTAINER Mihai Criveti

# Install PostGIS packages
RUN apt-get update
RUN apt-get install --no-install-recommends --yes \
    postgresql-10-postgis-2.4 postgresql-10-postgis-2.4-scripts postgresql-contrib

Building the image

Turn the Dockerfile into a usable image using docker build.
Tag the image with a namespace (the one used on Docker Hub): cmihai

docker build --tag cmihai/postgis .

Uploading the image to Docker Hub

Push the Dockerfile, README and docker-compose.yaml examples to github
Test the image end to end
Push the image to docker hub

export DOCKER_ID_USER="username"
docker login
docker push
docker tag cmihai/postgis $DOCKER_ID_USER/my_image
docker push $DOCKER_ID_USER/my_image

2. Composing multiple images with docker compose

PGAdmin4 is a web based PostgreSQL Administration and SQL Development environment.
Docker Compose can link an existing dpage/pgadmin4 image from Docker Hub to cmihai/postgis
Login to http://localhost:5050 admin:admin after running docker-compose up

Create docker-compose.yaml

version: '3.1'
services:

    postgis:
        image: cmihai/postgis
        container_name: postgis
        ports:
            - '5432:5432'
        environment:
            POSTGRES_PASSWORD: postgres
        volumes:
            - pgdata:/var/lib/postgresql/data

    pgadmin4:
        image: dpage/pgadmin4
        container_name: pgadmin4
        ports:
            - '5050:80'
        environment:
            PGADMIN_DEFAULT_EMAIL: admin
            PGADMIN_DEFAULT_PASSWORD: admin
        links:
            - postgis

volumes:
  pgdata:

Starting the services

Use docker-compose up to start the services:

docker-compose up

3. Create a database and enable PostGIS with PGAdmin4

Login to pgadmin4: http://localhost:5050 with admin:admin
Add a connection to postgis with user/pass postgres:postgres

Create a new database and call it gis

Open the SQL Query Tool on the newly created gis database: In the Browser window, select Servers > postgis > Databases > gis, the run Tools > Query Tool from the Menu.
Run the following SQL code to enable postgis database extensions:

-- Enable PostGIS (includes raster)
CREATE EXTENSION postgis;

-- Enable Topology
CREATE EXTENSION postgis_topology;

-- Enable PostGIS Advanced 3D and other geoprocessing algorithms
CREATE EXTENSION postgis_sfcgal;

-- Fuzzy matching needed for Tiger
CREATE EXTENSION fuzzystrmatch;

-- Rule based standardizer
CREATE EXTENSION address_standardizer;

-- Example rule data set
CREATE EXTENSION address_standardizer_data_us;

-- Enable US Tiger Geocoder
CREATE EXTENSION postgis_tiger_geocoder;

Expected Outcome: gis database with geospacial extensions

Query returned successfully:

gis > Extensions now lists a number of GIS extensions: postgis, postgis_sfgal, postgis_tiger_geocoder, postgis_topology, fuzzystrmatch, address_standardizer and address_standardizer_data_us.
gis > Schema > public > Functions has been populated with a high number (1000+) of GIS specific functions.
A new table called spacial_ref_sys is now available under gis > Schemas > public > Tables.
New schemas: tiger, tiger_data and topology have been created.

Next Steps:

Load Geospacial data from shapefile, KML, GeoJSON, etc.
Connect GIS Desktop clients such as QGIS.
Connect to PostGIS using Python (ex: geopandas).
Perform geospacial queries and analysis on the data.

Links and Reference:

Github Repository with Dockerfile and docker-compose.yaml: https://github.com/crivetimihai/geospacial-engineering
Docker Image: https://hub.docker/com

A look at Open Data

Common GIS data formats:

QGIS can be used to import a variety of formats in PostGIS, including Shapefiles, KML and GeoJSON.

Shapefile: a popular geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other GIS software products.
KML: Keyhole Markup Language is an XML notation for expressing geographic annotation and visualization within Internet-based, two-dimensional maps and three-dimensional Earth browsers.
GeoJSON: a format for encoding a variety of geographic data structures.

Open Data Sources

Ireland’s Open Data Portal: https://data.gov.ie/
Irish Spatial Data Exchange: http://isde.ie
Census 2016 Open Data: http://census2016.geohive.ie/
European Data Portal: https://www.europeandataportal.eu
2600+ Open Data sources: https://www.opendatasoft.com/a-comprehensive-list-of-all-open-data-portals-around-the-world/

Interesting datasets:

Downloading data:

# Download the data:
wget http://spatial.dcenr.gov.ie/GSI_DOWNLOAD/Landslide_Susceptibility_Map_Ireland.zip

# Unzip the data:
unzip Landslide_Susceptibility_Map_Ireland.zip

Loading data with QGIS

In this article, I will show you how to:

Install QGIS on Ubuntu Linux.
Connect QGIS to PostgreSQL/PostGIS.
Import data (shapefiles, GeoJSON) into the GIS database using DBManager.

1. Install QGIS:

On Ubuntu Linux, you can use:

sudo apt-get update
sudo apt-get install qgis

For other operating systems, follow the instructions listed at https://qgis.org/en/site/

Connect to PostGIS

Add PostGIS in QGis
Under Browser, Right click PostGIS > New Connection* and select Name: postgis, Host: localhost, Port: 5432.
Save the connection details.

2. Import Landslide shapefile data into QGIS, then Posgres

Import the data in QGIS

First, we import the sample data into QGIS:

‘Layer > Data Source Manager > Home’ and find the layers
Select the layers you wish to add and click the Add Selected Layers button.

Export the data to PostgreSQL / PostGIS

Click on Database > DB Manager > DB Manager
Select *PostGIS — yourdb > your schema > Table > Import Layer / File and name it (ex: Landslide_Events)
Repeat step 2 for every layer you wish to import
Close the DB Manager

Note: The import activity can take a long time. You can monitor progress in the PGAdmin4 Dashboard, by looking at the Tuples In: Inserts graph:

Delete the layers — and load them from the DB

In Layers, right click each layer — Remove
In Browser > Postgis > postgis > public — double click each layer (in the right order).

Using the psql commandline client and other tools

List the networks. There should be a container running on the network postgis_default

docker network ls

2e2afc387fbb postgis_default bridge local

Connect to PostGIS using the psql client (from Docker):

docker run --net postgis_default -it --rm \
    --link postgis:postgis postgres psql -h postgis -U postgres -d gis

Using a local client:

psql -h localhost -U postgres -d gis

List Tables:

\dt

Run SQL Queries:

-- Events created by ABC, in descending order of creation date:
SELECT county, quaternary, slope_type, bedrock_ty, land_use_c, creationda, creator
    FROM public."Landslide_Events"
    WHERE creator = 'ABC' ORDER BY creationda DESC;

Guide to getting started with Docker, Python and Jupyter Notebook on zLinux.

Mihai Criveti — Fri, 04 Aug 2023 16:31:30 GMT

Here, I’m using Red Hat Enterprise Linux 7.5 to build and deploy Jupyter notebook in an Ubuntu container. I will go over the steps used to build and run a Docker container.

A few basic commands:

Establish the OS release and version. We’re running on RHEL 7.5 for s390x.

[cmihai@rh74s390x ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)

[cmihai@rh74s390x ~]$ uname -a
Linux rh74s390x.novalocal 3.10.0-693.17.1.el7.s390x #1 SMP Sun Jan 14 10:38:29 EST 2018 s390x s390x s390x GNU/Linux

[cmihai@rh74s390x ~]$ docker --version
Docker version 17.05.0-ce, build 89658be

SETUP REGULAR USER ACCESS, SUDO AND SSH KEYS

CREATE A REGULAR USER ACCOUNT

useradd cmihai
passwd cmihai
usermod -aG wheel cmihai
su - cmihai

ADD YOUR SSH PUBLIC KEY TO AUTHORIZED_HOSTS

mkdir -p ~/.ssh
echo "YOURKEYHERE" >> ~/.ssh/authorized_keys

LOG IN AS YOUR NEW USER, AND FORWARD PORT 9000:

ssh -L 9000:https://www.linkedin.com/redir/invalid-link-page?url=127%2e0%2e0%2e1%3A9000 -i cmihai.pem cmihai@myzLinux

SETUP DOCKER

CREATE THE DOCKER GROUP

sudo groupadd docker
sudo usermod -aG docker cmihai

START DOCKER

sudo systemctl enable docker
sudo systemctl restart docker.service
sudo systemctl status docker.service

TEST DOCKER

docker run s390x/hello-world

LET’S RUN A SIMPLE UBUNTU INTERACTIVE SHELL:

docker run --name s390x-ubuntu --hostname s390x-ubuntu --interactive --tty s390x/ubuntu /bin/bash

BUILDING A DOCKER CONTAINER FOR JUPYTER NOTEBOOK

Create a Dockerfile from the s390x/ubuntu base image.

FROM s390x/ubuntu
MAINTAINER Mihai Criveti

# ADD AND RUN
RUN apt-get update \
    && apt-get install -y python3 python3-pip \
    && pip3 install jupyter \
    && apt-get clean

# COMMAND and ENTRYPOINT:
CMD ["jupyter","notebook","--allow-root","--ip=0.0.0.0","--port=9000"]

# NETWORK
EXPOSE 9000

BUILD THE CONTAINER:

docker build . --tag "cmihai/jupyter-lite:v1" -f Dockerfile

RUN YOUR NEW CONTAINER:

docker run --name jupyter --hostname jupyter -p 9000:9000 cmihai/jupyter-lite:v1

CONNECT TO JUPYTER NOTEBOOK

http://localhost:9000

YOU CAN NOW INSTALL DEPEDENCIES DIRECTLY FROM JUPYTER:

!apt-get install --yes zlib1g-dev libjpeg-dev

Potential next steps:

Consider setting up persistence for your notebooks (ex: VOLUME [“/notebooks”] in Dockerfile)
Setup Docker Compose and build multi-tiered applications specifications — such as connecting your Jupyter Notebook to PostgreSQL, Redis, Spark, etc.
Set up other programming languages or kernels (Java, R) even Zeppelin Notebook

For an interactive tutorial of using Docker for Data Science, check out: https://github.com/crivetimihai/docker-data-science

To see the original article, check out https://www.linkedin.com/pulse/data-science-environment-docker-jupyter-ibm-mainframe-mihai-criveti/