Wow, what a powerful open source project, this is in the class of seemingly infinite AI Coders, but the use of “modes” (what the current hype calls Agents ), but it’s super great, although a little confusing to use, so here are some notes:

  1. You install it as a VSCode extension, but it works fine in VSCodium as well. You get a little KangaROO icon, and then you have to set it up
  2. Click on the Settings icon at the upper right, and then you get to Providers. This is the most important screen, as you can select whatever providers you want. Hint, the Claude Code one is very useful if you are signed up for Claude Max, then you can use that for all your programming.
  3. This means you need to install Claude Code (npm install -g @anthropic/claude-code). Then start it in the command line, and it will ask you to authenticate via a web browser.
  4. At that point, you can pick API Provider as Claude Code and then select the mode,l which can be Opus or Sonnet. With Claude Code Max, you might as well choose Opus for planning problems and then Sonnect for others.
  5. The most useful features are the modes located at the bottom. The base ones are Architect, that is plan your code, and then “Code” to write it and “Debug” to read it, but there are many others you can add
  6. There is a marketplace for both modes and MCP Servers, which are super useful. These include Context7 for reading public repository documentation, GitHub for making pull requests, and the millions of web browsing options.
  7. Note that to use MCP servers, you have to provide a hint to it; for instance, “Use the Exa MCP server to search the web” works, but this is hit and miss. I find that if I say, “Search the web”, it tends to default to the first web searching tool, which in my case is “Brave Search”, so you have to experiment

Strange asdf and roo code interaction with claude

OK, figure this one out, I use asdf to shim commands and versions of python. For whatever reason it has a ./shims directory where it redirects commands. The annoying this is that since claude code is installed globally this command gets shimmed. And in the latest version of Roo Code, this means that when roo reads it, it stuffs the entire node modules file into the prompt and it fails with a -1 because I think the line is too long.

The fix is simple, get rid of asdf (on my list) or remove ~/.asdf/shims/claude and all is good

Codebase Indexing: Qdrant

Code Indexing, they just added this, and it is supposed to make finding things easier. Setting it up is a little complicated; there is a small “disk” icon at the bottom, and when you click on it, it will ask you for the Indexing system and the Qdrant Vector database. The easy choices are to use OpenAI and then to sign up for a free Qdrant. io-hosted vector database.

But if you want to roll your own, then you can use Ollama and make sure that you “ollama pull” the embeddings where mxbai-embed-large, all-minilm, and nomic-embed-text, but it is not clear what to use, and nomic-embed-code (which sounds good) is not available. You can also use an OpenAI-compatible API server as well, so the world is completely your oyster for embeddings, and it is a little strange since the “Ollama” entry is also OpenAI API compatible, so I think that entry is for convenience, but using the “OpenAI compatible one” let’s you pick any embedding model you want not just the limited selection in the “Ollama” entry

Qdrant errors and too many open files: delete the database

If you are using QDrant locally, you will see this error come up a million times: “decode: cannot decode batches with this context (use llama_encode() instead”; this is a superior message that you can ignore. Note that the instructions tell you how to use Docker to do this, but you can also just git clone “qdrant” and do a make run, and then it runs without the overhead of a virtual machine since it’s just a Python application in the end.

The fix is pretty bleak, it looks like the storage is getting corrupted, so going to the repo and deleting the direct ./storage fixes this

How to Use Codebase Indexing

The way that this is implemented is that the LLM needs to know how to use the “codebase_search” tool, so you have to write your queries the right way, such as “find API endpoint definitions.”

Picking an Embedding Model: Open WebUI and Roo Code with Qwen

Figuring out what embedding model is pretty hard; there are many guides, but amongst the best are using the CodeSearchNet and MTEB leaderboards. I used MTEB Leaderboard back in January:

  1. “gte-Qwen2-7B-Instruct” is the best if you are running locally (at 62.51% for all tasks). But as of July, it looks like “Qwen3-Embedding-8B” or Qwen3-Embedding-4B are the best general-purpose ones at 70.58% and 69.45% Use this if speed isn’t an issue (like with indexing a whole code base once)
  2. But if you want performance and can split the embedding from the reranking as Open WebUI does, then use a fact model “sentence-transformers/all-miniLM-L6-V2” for embedding and the 7B for above for reranking
  3. ✅Qwen3-Embedding-0.6B. This is a small model with 1024 vectors and seems to work with large code bases. The MTEB says its performance is very close to the 8B.
  4. ✅✅Note that the 4B fails with a failure because you have to set the vector dimensions right, and given the sizing, it feels like this may be the best tradeoff of speed and accuracy.
  5. ✅ ✅✅Qwen3-Embedding-8B. This is by far the slowest model, and for me it would hang you need to make sure to go to http://localhost:6333/dashboard and make sure the collections have the right dimension. T
  6. ✅✅✅Nomic Embed Code. 7B parameters, but see above, it is hard to get from Ollama generically, but you can use the OpenAPI Compatible to add it, this appears to be a very good coding embedding. The base model is nomic-embed-text-v1.5 is 44%, but this is supposed to be good at code and recommended

so I would probably use:

  1. Qwen3-Reranking-8B and Qwen3-Embedding-0.6B for Open Webui, as these will typically have lots of documents
  2. nomic-embed-code for Roo Code and other coder,s since this is all about coding and it is supposed to be tuned for it.

Then there are the Open WebUI problems: For Open WebUI, I originally recommended using sentence transformers, but since this is not in Roo Code, it is probably better to switch to Ollama for all of this and manage them from one place. So the models would be Qwen3-Embedding-0.6B does very well. But the problem is that these models have some sort of bug and do not work with Open WebUI The simpler models that are lower case, like “manutic/nomic-embed-code” seem to work fine, but “denchao/Qwen3-Embedding-8B” causes an “Error 400 NoneType object not iterable”

One important note is that you should really just select a single model and forget it. As I switched models because they have different dimensions, the Qdrant database got pretty confused, so you should select the one that is going to work for you.

Open bugs: OpenAI API does not save in Roo Code (fixed automatically!) OpenWebUI generates 400: Object not iterable

OK so I hit two bugs trying to get this to work:

  1. The Sentence Transformers works fine, and you can use “Qwen/Qwen3-Embedding-0.6B” and reranker as “Qwen/Qwen3-Reranker-8B”
  2. Roo Code works with Ollama as the Embedding provider but only for mxbai-embed-large and nomic-embed-text, and nothing else in releases before 3.23.13. But the latest release, 3.23.14, lets you type there, so you don’t need the OpenAI Server stuff. And you have to type the exact name of the model and also the tag if it is the default, “latest”, but it is working well. The 8B model
  3. More importantly, the bug in the OpenAI Server was fixed automatically, which is pretty crazy, as it would not validate at all

Claude Code

This is actually a hard product to really harness because it is about “outside in” programming, that is, instead of a complete IDE like VSCode, you are on the command line. This is actually the way I love to code normally because NeoVim for instance is so much faster, but it is definitely more complicated with Claude Code. Some observations:

  1. You need to have multiple windows open. That’s sort of the core of “outside in” programming. That is, you have a window for the log, a window for the server, etc. Some folks like to use Tmux to do this and have present windows, but I find remembering the Tmux commands yet another layer of remembering things, and since I use Rectangle every day, it’s trivial to remember how to lay out windows
  2. There are lots of magic slash commands and words. For instance, Claude is very good at “To Do lists,” which help the LLM stay on track. So saying,g make a “to-do” list. Another trick is the word “ultra think,” which is how to kick off reasoning.
  3. Adding MCP tools is important, it doesn’t know that much. I need to do a separate post on the best MCP tools, but allowing the LLM out of its “text cage” is important
  4. It doesn’t have “modes” or “agents” in the same way that Roo Code has natively, but you can set them up again with a bunch of work.
  5. There is another tool called claude-coder which let’s you use other LLMs with the Claude Code scaffolding (for those regions where Anthropic is not available).

MCP Servers galore

This is one of the miracles of the last 30 days as everyone has a way for an LLM to access other data and in the previous manual there’s an explanation of how it works, but here’s a list of what you should install and how to use, but the most important thing is:

You have to hint strongly in your prompt what MCP server you want, so it should be something like “Use EXA to search for things” or “when you search, you should use in order EXA, Firecrawl and then Brave”

Here are the ones I use, I also am not sure but think the order in the json file matters, so here are some tricks to force different MCP servers to get used. We are doing this by looking at the MCP text that is used to informal the system. If you don’t specify the tool, then the LLM will randomly guess which MCP server to use. Sometimes you can tell (Claude tells you and Roo Code tells, but others like ChatGPT do not). I use claude code to quickly tell if these queries work, because it actually prints out the prompt help that is in the MCP server, with the very simple, “mcp firecrawl what queries can I run” and you can actually ask, “what MCP servers do you have?”

  1. Brave-search for quick searches. This is pretty dependable and free, so use “brave search for a subject”. Note that you will only get 20 results per query. “Brave search for local business Hanok of Seattle”
  2. Firecrawl for deep research and crawling. Expensive but comprehensive, note there are two different FireCrawl APIs. But the good ones are “Firecrawl extract pricing, names and details from amazon.com” and the really powerful “Firecrawl deep research what countries have the best subsidies for AI companies” and the very useful To for it with “firecrawl get the content of the page tongfamily.com, “firecrawl list all URLs on tongfamily.com” or you can combine it “firecrawl map all the URLs on tongfamily.com and scrape each page”. Then there is the slow crawl, “firecrawl crawl all blog post from the first two levels of tongfamily.com” and finally you should use brave search because it is free, but you can run “firecrawl search for latest ai news” and for your sites, “firecrawl generate llmtxt do not index tongfamily.com”
  3. Arxiv. This is pretty straightforward, “arxiv search on prompted lLM adapters with categories CS in 2025 with 10 max results”, “arxiv download paper on machine learning” really does the download, “arxiv list papers” is a little confusing but it refers to papers that you’ve already downloaded, “arxiv read paper on banking” works for papers that have been downloaded.
  4. Browserbase. This is a tool like the built in one in Roo Code which runs a browser for the LLM and allows it to move around, so the sequence is “browserbvase navigate to tongfamily.com”

Leave a Reply

Only people in my network can comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

I’m Rich & Co.

Welcome to Tongfamily, our cozy corner of the internet dedicated to all things technology and interesting. Here, we invite you to join us on a journey of tips, tricks, and traps. Let’s get geeky!

Let’s connect

Recent posts

  1. Loading Mastodon feed…