Embeddings
Setup
-
Create project
mkdir Embeddings
cd Embeddings
dotnet new console --framework net9.0 -
Add packages
dotnet add package Microsoft.Extensions.AI --prerelease
dotnet add package Microsoft.Extensions.AI.Ollama --prerelease
dotnet add package System.Numerics.Tensors -
Ensure Ollama is installed
ollama list
ollama pull all-minilm:latest -
Open Code in Rider
rider . -
Setup
using Microsoft.Extensions.AI;
using System.Numerics.Tensors;
IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator = new OllamaEmbeddingGenerator(
new Uri("http://127.0.1:11434"),
modelId: "all-minilm");
Basic Embedding
-
Create a basic embedding
var result = await embeddingGenerator.GenerateAsync("Adam eats too much ice cream");
Console.WriteLine($"Vector of length {result.Vector.Length}");
foreach (var value in result.Vector.Span)
{
Console.Write("{0:0.00}, ", value);
} -
Run the app.
What is a Vector?
Imagine a 2D graph with an x and y axis. A vector is a point in that graph, represented by its coordinates (x, y). You can also think of it as an arrow from the origin (0, 0) to the point (x, y).
Now let's add a 3rd dimension, and add an extra number. Our vector is now represented by 3 numbers.
Now, let's add 1500 dimensions, and have a number to represent each dimension. That's what a vector is and it's these high dimensional space that the LLM can use to infer meaning from the data.
GitHub Data
-
Comment out the basic embdding code
-
Create github repo data to search on
// Top 100 .NET GitHub Libraries
Repository[] topDotNetRepos =
[
new("ASP.NET Core", "Cross-platform .NET framework for building modern cloud-based web applications", "https://github.com/dotnet/aspnetcore"),
new(".NET Runtime", "Home repository for .NET runtime and libraries", "https://github.com/dotnet/runtime"),
new("Entity Framework Core", "Lightweight, extensible, open source and cross-platform version of the popular Entity Framework data access technology", "https://github.com/dotnet/efcore"),
new("Newtonsoft.Json", "Json.NET is a popular high-performance JSON framework for .NET", "https://github.com/JamesNK/Newtonsoft.Json"),
new("AutoMapper", "Convention-based object-object mapper in .NET", "https://github.com/AutoMapper/AutoMapper"),
new("Dapper", "Simple object mapper for .NET", "https://github.com/DapperLib/Dapper"),
new("NLog", "Advanced and Structured Logging for Various .NET Platforms", "https://github.com/NLog/NLog"),
new("Serilog", "Simple .NET logging with fully-structured events", "https://github.com/serilog/serilog"),
new("xUnit.net", "Free, open source, community-focused unit testing tool for the .NET Framework", "https://github.com/xunit/xunit"),
new("FluentValidation", "Popular .NET validation library for building strongly-typed validation rules", "https://github.com/FluentValidation/FluentValidation"),
new("Polly", ".NET resilience and transient-fault-handling library", "https://github.com/App-vNext/Polly"),
new("SignalR", "Incredibly simple real-time web for .NET", "https://github.com/SignalR/SignalR"),
new("RestSharp", "Simple REST and HTTP API Client for .NET", "https://github.com/restsharp/RestSharp"),
new("NUnit", "NUnit testing framework for .NET", "https://github.com/nunit/nunit"),
new("MediatR", "Simple, unambitious mediator implementation in .NET", "https://github.com/jbogard/MediatR"),
new("Hangfire", "Easy way to perform background job processing in .NET and .NET Core applications", "https://github.com/HangfireIO/Hangfire"),
new("Nancy", "Lightweight, low-ceremony, framework for building HTTP based services on .Net and Mono", "https://github.com/NancyFx/Nancy"),
new("IdentityServer4", "OpenID Connect and OAuth 2.0 Framework for ASP.NET Core", "https://github.com/IdentityServer/IdentityServer4"),
new("Ocelot", ".NET Core API Gateway", "https://github.com/ThreeMammals/Ocelot"),
new("FluentAssertions", "Fluent API for asserting the results of unit tests", "https://github.com/fluentassertions/fluentassertions"),
new("StackExchange.Redis", "General purpose redis client", "https://github.com/StackExchange/StackExchange.Redis"),
new("Refit", "Automatic type-safe REST library for .NET Core, Xamarin and .NET", "https://github.com/reactiveui/refit"),
new("BenchmarkDotNet", "Powerful .NET library for benchmarking", "https://github.com/dotnet/BenchmarkDotNet"),
new("IdentityModel", "Identity and access control library for .NET", "https://github.com/IdentityModel/IdentityModel"),
new("MongoDB.Driver", "Official .NET driver for MongoDB", "https://github.com/mongodb/mongo-csharp-driver"),
new("FluentScheduler", "Automated job scheduler with fluent interface for the .NET platform", "https://github.com/fluentscheduler/FluentScheduler"),
new("Castle.Windsor", "Castle Windsor is a best of breed, mature Inversion of Control container", "https://github.com/castleproject/Windsor"),
new("Swashbuckle.AspNetCore", "Swagger tools for documenting API's built on ASP.NET Core", "https://github.com/domaindrivendev/Swashbuckle.AspNetCore"),
new("FluentEmail", ".NET Core email sending", "https://github.com/lukencode/FluentEmail"),
new("CsvHelper", "Library to help reading and writing CSV files", "https://github.com/JoshClose/CsvHelper"),
new("ReactiveUI", "An advanced, composable, functional reactive model-view-viewmodel framework", "https://github.com/reactiveui/ReactiveUI"),
new("Topshelf", "Easy service hosting framework for building Windows services using .NET", "https://github.com/Topshelf/Topshelf"),
new("MassTransit", "Distributed Application Framework for .NET", "https://github.com/MassTransit/MassTransit"),
new("Quartz.NET", "Quartz Enterprise Scheduler .NET", "https://github.com/quartznet/quartznet"),
new("Moq", "Most popular and friendly mocking framework for .NET", "https://github.com/moq/moq4"),
new("Scrutor", "Assembly scanning and decoration extensions for Microsoft.Extensions.DependencyInjection", "https://github.com/khellang/Scrutor"),
new("System.Text.Json", "High-performance JSON processor in C#", "https://github.com/dotnet/runtime/tree/main/src/libraries/System.Text.Json"),
new("ImageSharp", "Fully featured 2D graphics library for .NET", "https://github.com/SixLabors/ImageSharp"),
new("EPPlus", "EPPlus-Excel spreadsheets for .NET", "https://github.com/EPPlusSoftware/EPPlus"),
new("NBitcoin", "Comprehensive Bitcoin library for the .NET framework", "https://github.com/MetacoSA/NBitcoin"),
new("ClosedXML", "ClosedXML is a .NET library for reading, manipulating and writing Excel 2007+ (.xlsx, .xlsm) files", "https://github.com/ClosedXML/ClosedXML"),
new("AngleSharp", "Ultimate angle brackets parser library parsing HTML5, MathML, SVG and CSS", "https://github.com/AngleSharp/AngleSharp"),
new("HtmlAgilityPack", "Html Agility Pack (HAP) is a free and open-source HTML parser", "https://github.com/zzzprojects/HtmlAgilityPack"),
new("NancyFx", "Lightweight, low-ceremony, framework for building HTTP based services", "https://github.com/NancyFx/Nancy"),
new("Castle.Core", "Castle Core, including Castle DynamicProxy, Logging Services and DictionaryAdapter", "https://github.com/castleproject/Core"),
new("MessagePack-CSharp", "Extremely Fast MessagePack Serializer for C#", "https://github.com/neuecc/MessagePack-CSharp"),
new("MailKit", "Cross-platform .NET library for IMAP, POP3, and SMTP", "https://github.com/jstedfast/MailKit"),
new("SSH.NET", "SSH.NET is a Secure Shell (SSH) library for .NET", "https://github.com/sshnet/SSH.NET"),
new("LiteDB", "LiteDB - A .NET NoSQL Document Store in a single data file", "https://github.com/mbdavid/LiteDB"),
new("ServiceStack", "Thoughtfully architected, obscenely fast, thoroughly enjoyable web services for all", "https://github.com/ServiceStack/ServiceStack"),
new("Owin", "OWIN (Open Web Interface for .NET) defines a standard interface between web servers and web applications", "https://github.com/owin/owin"),
new("AutoFixture", "AutoFixture is an open source library for .NET designed to minimize the Arrange phase of your unit tests", "https://github.com/AutoFixture/AutoFixture"),
new("Npgsql", "Npgsql is the .NET data provider for PostgreSQL", "https://github.com/npgsql/npgsql"),
new("MySqlConnector", "Async MySQL Connector for .NET and .NET Core", "https://github.com/mysql-net/MySqlConnector"),
new("DotNetty", "DotNetty project – a port of netty, event-driven asynchronous network application framework", "https://github.com/Azure/DotNetty"),
new("Orleans", "Cross-platform framework for building distributed applications with .NET", "https://github.com/dotnet/orleans"),
new("Blazor", "Blazor is a framework for building interactive web UIs using C#", "https://github.com/dotnet/aspnetcore/tree/main/src/Components"),
new("Nuke", "The AKEless Build System for C#/.NET", "https://github.com/nuke-build/nuke"),
new("FluentMigrator", "Fluent migrations framework for .NET", "https://github.com/fluentmigrator/fluentmigrator"),
new("Elmah", "Error Logging Modules And Handlers for ASP.NET", "https://github.com/elmah/Elmah"),
new("YamlDotNet", "YamlDotNet is a YAML library for netstandard and .NET Framework", "https://github.com/aaubry/YamlDotNet"),
new("FluentFTP", "An FTP and FTPS client for .NET & .NET Standard", "https://github.com/robinrodricks/FluentFTP"),
new("Akka.NET", "Port of Akka actors for .NET", "https://github.com/akkadotnet/akka.net"),
new("Elasticsearch.Net", "Elasticsearch .NET client", "https://github.com/elastic/elasticsearch-net"),
new("NEST", "Elasticsearch .NET client", "https://github.com/elastic/elasticsearch-net"),
new("Simple Injector", "Easy, flexible, and fast Dependency Injection library for .NET", "https://github.com/simpleinjector/SimpleInjector"),
new("Unity Container", "The Unity Container (Unity) is a lightweight, extensible dependency injection container", "https://github.com/unitycontainer/unity"),
new("Autofac", "An addictive .NET IoC container", "https://github.com/autofac/Autofac"),
new("Ninject", "Dependency injector for .NET", "https://github.com/ninject/Ninject"),
new("StructureMap", "A Dependency Injection/Inversion of Control tool for .NET", "https://github.com/structuremap/structuremap"),
new("Machine.Specifications", "MSpec is a context/specification framework that removes language noise and simplifies tests", "https://github.com/machine/machine.specifications"),
new("Should", "Should testing for .NET - the way assertions should be!", "https://github.com/erichexter/Should"),
new("NBehave", "Behaviour Driven Development framework for .NET", "https://github.com/nbehave/NBehave"),
new("SpecFlow", "Binding business requirements to .NET code", "https://github.com/SpecFlowOSS/SpecFlow"),
new("Caliburn.Micro", "A small, yet powerful framework, designed for building applications across all XAML platforms", "https://github.com/Caliburn-Micro/Caliburn.Micro"),
new("MVVM Light Toolkit", "Light MVVM framework for building XAML apps", "https://github.com/lbugnion/mvvmlight"),
new("Prism", "Prism is a framework for building loosely coupled, maintainable, and testable XAML applications", "https://github.com/PrismLibrary/Prism"),
new("MahApps.Metro", "A framework that allows developers to cobble together a better UI for their own WPF applications", "https://github.com/MahApps/MahApps.Metro"),
new("Avalonia", "Cross-platform .NET UI framework", "https://github.com/AvaloniaUI/Avalonia"),
new("Windows UI Library", "Modern UI controls and styles for your Windows apps", "https://github.com/microsoft/microsoft-ui-xaml"),
new("Material Design In XAML", "Google's Material Design in XAML & WPF, for C# & VB.Net", "https://github.com/MaterialDesignInXAML/MaterialDesignInXamlToolkit"),
new("Humanizer", "Humanizer meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities", "https://github.com/Humanizr/Humanizer"),
new("AngleSharp.Css", "Library extending AngleSharp with CSS capabilities", "https://github.com/AngleSharp/AngleSharp.Css"),
new("CommonMark.NET", "Implementation of CommonMark specification in C# for converting Markdown documents to HTML", "https://github.com/Knagis/CommonMark.NET"),
new("MarkdownSharp", "C# implementation of Markdown processor", "https://github.com/StackExchange/MarkdownSharp"),
new("ColorCode", "ColorCode is a syntax highlighting library for .NET", "https://github.com/RichardD2/ColorCode-Universal"),
new("DiffPlex", "DiffPlex is a .NET library to generate textual diffs", "https://github.com/mmanela/diffplex"),
new("DocumentFormat.OpenXml", "The Open XML SDK provides tools for working with Office Word, Excel, and PowerPoint documents", "https://github.com/OfficeDev/Open-XML-SDK"),
new("QRCoder", "A pure C# Open Source QR Code implementation", "https://github.com/codebude/QRCoder"),
new("ZXing.Net", "ZXing.Net is a port of ZXing", "https://github.com/micjahn/ZXing.Net"),
new("PDFsharp", "A .NET library for processing PDF files", "https://github.com/empira/PDFsharp"),
new("iTextSharp", "iText for .NET is the .NET port of the iText library", "https://github.com/itext/itextsharp"),
new("CommandLineParser", "Terse syntax C# command line parser for .NET", "https://github.com/commandlineparser/commandline"),
new("McMaster.Extensions.CommandLineUtils", "Command line parsing and utilities for .NET Core and .NET Framework", "https://github.com/natemcmaster/CommandLineUtils"),
new("Conholdate.Total", "Complete solution for working with popular file formats in .NET applications", "https://github.com/conholdate/"),
new("FileHelpers", "The FileHelpers are a free and easy to use .NET library to read/write data from fixed length or delimited records", "https://github.com/MarcosMeli/FileHelpers"),
new("CacheManager", "CacheManager is an open source caching abstraction layer for .NET", "https://github.com/MichaCo/CacheManager"),
new("LazyCache", "Easy to use and Thread Safe library that makes it easy to add caching to your applications", "https://github.com/alastaير/LazyCache"),
new("Lucene.Net", "Apache Lucene.NET is a full-text search engine library", "https://github.com/apache/lucenenet"),
new("NLua", "Bridge between Lua and the .NET", "https://github.com/NLua/NLua"),
new("Jint", "Javascript Engine for .NET", "https://github.com/sebastienros/jint"),
new("Edge.js", "Run .NET and Node.js code in-process on Windows, MacOS, and Linux", "https://github.com/tjanczuk/edge"),
new("IronPython", "Python for .NET", "https://github.com/IronLanguages/ironpython3"),
new("IronRuby", "Ruby for .NET", "https://github.com/IronLanguages/ironruby"),
new("Roslyn", "The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs", "https://github.com/dotnet/roslyn"),
new("FSharp", "The F# compiler, core library & tools", "https://github.com/dotnet/fsharp"),
new("Cake", "Cake (C# Make) is a cross platform build automation system", "https://github.com/cake-build/cake"),
new("FAKE", "FAKE - F# Make", "https://github.com/fsprojects/FAKE")
]; -
Add the
Repositoryclasspublic class Repository
{
public string Name { get; }
public string Description { get; }
public string Url { get; }
public Repository(string name, string description, string url)
{
Name = name;
Description = description;
Url = url;
}
public string GetEmbeddingString() => $"{Name} {Description}";
public static Repository FromEmbeddingString(string embeddingString)
{
var parts = embeddingString.Split(' ', 3);
return new Repository(parts[0], parts[1], parts[2]);
}
}
Semantic Search
-
Generate embeddings for the repositories
// Generate embeddings for all document titles
Console.WriteLine("Generating embeddings for repos...");
var candidateEmbeddings = await embeddingGenerator.GenerateAndZipAsync(topDotNetRepos.Select(repo => repo.GetEmbeddingString()));
Console.WriteLine("Embeddings generated"); -
Debug and show what the candidateEmbeddings look like
-
Generate embedding for the input
Console.ForegroundColor = ConsoleColor.White;
Console.WriteLine("What sort of repos are you looking for?");
var input = Console.ReadLine();
Console.ResetColor();
var inputEmbedding = await embeddingGenerator.GenerateAsync(input!); -
Perform a semantic search
var closest =
from candidate in candidateEmbeddings
let similarity = TensorPrimitives.CosineSimilarity(candidate.Embedding.Vector.Span, inputEmbedding.Vector.Span)
orderby similarity descending
select new {Item = candidate.Value, Similarity = similarity}; -
Display the results
foreach (var item in closest.Take(3))
{
var repo = Repository.FromEmbeddingString(item.Item);
Console.WriteLine($"Repo: {repo.Name}, Description:{repo.Description}, Similarity: {item.Similarity:0.000}");
}infoWe've used cosine similarity to do the semantic search in memory. This works well with simple data. However, if we had more complex data we would want to use a vector database like Pinecone, Chroma, or Qdrant to store the embeddings and perform the search.
I have a demo of using EF9 with an Azure SQL vector database. If we have time towards the end of the session, remind me and I'll go through it.
https://github.com/danielmackay/dotnet-ef-core-vector/blob/main/ConsoleApp/Program.cs