Stories by Kemal Akkoyun on Medium

Vibe Coding with Cursor: My R&D Week Adventure

Kemal Akkoyun — Wed, 12 Mar 2025 00:00:58 GMT

TL;DR: Spent a week building cool stuff with Cursor, an AI-powered IDE. Found it surprisingly effective for both coding and managing my second brain. When your requirements are clear, it’s almost magical! ✨

The Setup: R&D Week Vibes

You know that feeling when R&D week rolls around, and you’re caught between “I should learn something useful” and “I want to have fun”? Well, this time I decided to combine both by diving deep into Cursor, an AI-powered code editor that’s been making waves in the developer community.

The mission was simple: Use Cursor for everything — from managing my notes to building small task-specific projects. And by everything, I mean everything.

What Makes Cursor Different?

Unlike traditional IDEs that just help you write code, Cursor feels more like having a pair programmer who actually gets your context. It’s built on top of VSCode (so you get all the good stuff you’re used to) but adds a layer of AI-powered features that make development feel more… vibey? 😎

The Good Parts

Context-Aware AI: The AI understands your project structure and can help with everything from code completion to refactoring. For example, when working on a React component, it automatically suggested appropriate hooks and state management patterns based on my component’s purpose. When your requirements are clear, it’s almost magical how it can scaffold projects and implement patterns!
Rules Feature: This is where things get interesting. You can create custom rules and context for different types of work, both at the project and global level. Think project-specific coding standards, documentation patterns, and even architecture guidelines.
Notepads: Quick thoughts? Code snippets? The notepad feature is like having a smart scratchpad that understands code and can share context between different parts of your development workflow.

Second Brain Management: A Pleasant Surprise

One of my unexpected discoveries was how well Cursor handles note-taking and second brain management. Here’s what made it click for me:

The Rules Feature: A Game Changer

The rules feature deserves its own spotlight. Cursor offers two powerful ways to customize AI behavior (note that the older .cursorrules file is being deprecated in favor of this new system):

Project Rules (.cursor/rules directory):

Semantic descriptions for specific use cases
File pattern matching with glob patterns
Automatic attachment when matching files are referenced
Chain multiple rules using @file references
Version controlled with your project
Create new rules via command palette (Cmd + Shift + P > New Cursor Rule)

Global Rules (Cursor Settings):

Applied across all projects
Perfect for consistent preferences
Control output language and response style
Set universal development guidelines

Pro tip: Use project rules whenever possible — they’re more flexible, can be version controlled, and provide better granular control over different parts of your project.

I’ve set up different contexts for various types of work:

Technical blog posts (with specific writing guidelines)
Project documentation (with architecture patterns)
Personal notes (with custom templates)
Code standards (with framework-specific rules)

Each context comes with its own set of rules and AI behavior. It’s like having multiple specialized assistants at your disposal.

Notepads: Beyond Simple Notes

The Notepads feature (currently in beta) has been a revelation. Think of them as enhanced reference documents that go beyond regular .cursorrules. I use them for:

Dynamic Boilerplate Generation:

Templates for common code patterns
Project-specific scaffolding rules
Consistent code structure templates

Architecture Documentation:

Frontend specifications
Backend design patterns
Data model documentation

Development Guidelines:

Team conventions
Best practices
Project-specific rules

The ability to share context between composers and chat interactions makes them incredibly powerful. Plus, you can attach files and use @ mentions to create a web of connected knowledge.

Small Projects, Big Impact

During the week, I worked on several small, task-specific projects. The workflow typically went like this:

Create a new project with clear requirements
Set up project-specific rules and templates
Let the AI handle boilerplate and routine coding
Focus on architecture and edge cases

The AI handled a lot of the repetitive work, letting me focus on the creative aspects of each project. The clearer my requirements were, the more magical the results became. ✨

Lessons Learned

AI-Powered Doesn’t Mean AI-Dependent: Cursor enhances your workflow without taking over.
Rules Are Your Friend: Taking time to set up proper rules pays off immensely.
Context is King: The more context you provide, the better the AI assistance becomes.
Second Brain Benefits: It’s not just for coding; it’s a genuine knowledge management tool.
Clear Requirements = Magic: The more precise your task definition, the better the results.

What’s Next?

I’m planning to:

Expand my rule sets for different types of work
Create more structured templates for common architectural patterns
Explore advanced AI features like multi-file refactoring
Share my rules and templates with the community

Conclusion

R&D weeks are about trying new things and finding better ways to work. This experiment with Cursor turned out to be more than just playing with a new tool — it’s changed how I think about IDE capabilities and knowledge management.

The combination of familiar VSCode features with AI assistance, especially the rules system, makes it a powerful tool for both coding and knowledge work. It’s not perfect (what is?), but it’s definitely earned its place in my daily toolkit.

Remember: The best tools are the ones that enhance your natural workflow rather than forcing you to adapt to them. Cursor does this surprisingly well. 👍

Originally published at https://kakkoyun.me on March 12, 2025.

FOSDEM 2025: Blimey, What a Weekend!

Kemal Akkoyun — Tue, 04 Feb 2025 00:00:42 GMT

Another Year, Another FOSDEM

FOSDEM-the annual pilgrimage to Brussels for a weekend of open-source brilliance, hallway track magic, and the inevitable sleep deprivation. This year’s Free and Open Source Software Developers’ European Meeting was, as always, a whirlwind of ideas, people, and tech so bleeding-edge it practically needed bandages.

But for me? It was all about seeing friends. Catching up, syncing, and squeezing in as many conversations as humanly possible. As we always say-the hallway track is the real conference. I’m beyond grateful for the people I managed to see, and equally bummed about those I missed. But with a toddler waiting at home, even carving out this limited time was a logistical miracle.

Saturday: Go, Go, Go… and the eBPF Black Hole#

Saturday kicked off with a deep dive into the Go DevRoom, before a (failed) mission to infiltrate the eBPF talks.

The Go DevRoom delivered as expected:

The eBPF DevRoom? Packed. Absolutely impenetrable. As someone put it on Twitter:

“Nobody leaves #eBPF room at #FOSDEM, so nobody gets in. 🥲”

Next year, I’m bringing a tent and camping outside the door.

Sunday: Monitoring, Metrics, and Maybe Too Many Frites#

Sunday was all about observability, performance, and squeezing every bit of insight from running systems.

Observability Overload#

The Monitoring and Observability DevRoom had a strong lineup:

Community Vibes

Like I said- FOSDEM is really about the people. The talks are great, but the real magic happens in the hallway track. Some of the best conversations weren’t planned; they just happened over coffee, between sessions, or during a frantic sprint between buildings.

I’m incredibly happy for the folks I got to see, and at the same time, I wish I had more time to catch up with everyone I missed. But life is about balance, and with a little one waiting at home, I had to make every moment count.

Oh, and the frites? Still undefeated.

Bonus: Trains, Chaos, and a Race Against Time

Because no trip is complete without public transport drama, my journey back home came with an extra dose of stress. Trains? Cancelled. Schedule? A mess. Plane? Hanging by a thread.

Somehow, I made it. But FOSDEM weekend wouldn’t be complete without at least one unexpected adventure.

Final Thoughts

FOSDEM 2025 delivered. Again. Already looking forward to next year. If you’re into open source and haven’t experienced FOSDEM, sort it out.

Originally published at https://kakkoyun.me on February 4, 2025.

Profiling Python with eBPF: A New Frontier in Performance Analysis

Kemal Akkoyun — Mon, 12 Feb 2024 00:00:51 GMT

Profiling Python applications can be challenging, especially in scenarios involving high-performance requirements or complex workloads. Existing tools often require code instrumentation, making them impractical for certain use cases. Enter eBPF (Extended Berkeley Packet Filter)-a revolutionary Linux technology-and the open-source project Parca, which together are reshaping the landscape of Python profiling.

In this post, I’ll explore how eBPF enables continuous profiling, discuss challenges like stack unwinding in Python, and demonstrate the power of modern profiling tools.

You can also watch my full talk here or refer to the slides from the presentation.

Profiling helps optimize performance and troubleshoot issues, such as CPU spikes, memory leaks, or out-of-memory (OOM) events. For instance:

Performance optimization: Identifying bottlenecks in code.
Incident resolution: Determining which function or component caused a memory spike or CPU overload.

Traditional Python profiling tools, like or , require application instrumentation, which isn’t always feasible-especially in production environments where code access might be restricted. This is where eBPF shines, offering non-intrusive, external profiling.

The Python ecosystem offers several profiling tools, each with unique strengths:

: A built-in module for deterministic profiling.
pyinstrument: A call stack profiler for Python.
: A sampling profiler for Python programs.
: Yet Another Python Profiler, supports multithreaded programs.
: A ptracing profiler for Python.
: A high-performance CPU and memory profiler.

While these tools are valuable, many require code instrumentation or introduce significant overhead, making them less suitable for continuous profiling in production environments.

Originally designed for network packet filtering, eBPF has evolved into a versatile event-driven system. It enables safe execution of custom programs inside the Linux kernel, using:

By leveraging eBPF with PMUs, profiling becomes faster and more efficient than traditional approaches.

Parca is an open-source project enabling continuous profiling. Its eBPF agent hooks into perf events, collects stack traces, and aggregates data for visualization. The process involves:

Hooking into CPU events to monitor active functions.
Stack unwinding to trace function calls.
Data aggregation and visualization in a web-based UI.

Unlike traditional profilers, Parca introduces minimal runtime overhead, making it ideal for production workloads.

Profiling native code is straightforward: we unwind the stack by reading memory addresses from the CPU and resolving them into human-readable symbols using debug information (e.g., DWARF).

For Python, stack unwinding is complex due to its interpreter-based execution. Python maintains execution state in custom data structures, such as:

Interpreter state: Tracks threads and their execution context.
Thread state: A linked list of threads running in the interpreter.
Frame state: Represents the current execution frame.

To unwind Python stacks, we must traverse these structures, extract relevant information, and map them to human-readable symbols.

Here’s how Parca handles Python profiling:

Reverse Engineering the Python Runtime:

Analyze Python’s internal structures (e.g., thread and frame states).
Identify offsets and symbols using tools like GDB or DWARF debuggers.

Unwinding Python Stacks:
Mapping Symbols:

Resolve function addresses to readable symbols.
Encode line numbers and function names for better traceability.

Efficient Data Handling:

Use eBPF maps for kernel-to-user space communication.
Optimize symbol resolution by caching frequently seen traces.

The upcoming Python 3.13 release introduces a debug offset structure that simplifies stack unwinding. It provides precomputed offsets for key runtime fields, eliminating much of the manual reverse engineering required for earlier versions. This improvement marks a significant leap forward for tools like Parca.

Parca’s UI provides a comprehensive view of application performance:

Flame graphs: Visualize stack traces over time, highlighting bottlenecks.
Filtering and Metadata: Focus on specific languages (e.g., Python) or layers (e.g., C libraries).
Continuous Insights: Compare profiles across deployments to monitor performance regressions.

For example, a flame graph might reveal inefficient recursion in a Python function, enabling developers to pinpoint and optimize the problematic code.

Parca supports profiling for Python versions from 2.7 to 3.11, with ongoing work for 3.12 and full support anticipated for 3.13. The project’s modular design allows quick adaptation to new Python runtime changes.

Profiling Python applications with eBPF and Parca represents a new frontier in performance analysis. By leveraging eBPF and continuous profiling, we can gain invaluable insights into our applications, enabling effective performance optimization. I encourage you to explore Parca, provide feedback, and contribute to the project-it’s a collaborative effort that can benefit us all as we tackle the challenges of modern software development.

Watch my full talk or check out the presentation slides. Explore Parca on GitHub and join the community. Your feedback helps improve the tooling and shape the future of observability.

Originally published at https://kakkoyun.me on February 12, 2024.

Fantastic Symbols and Where to Find Them — Part 2

Kemal Akkoyun — Thu, 27 Jan 2022 15:01:51 GMT

Fantastic Symbols and Where to Find Them — Part 2

How profilers and debuggers translate machine addresses to human-readable symbolic names

Originally published on polarsignals.com/blog on 27.01.2022

This is a blog post series. If you haven’t read Part 1 we recommend you to do so first!

In the first blog post, we learned about the fantastic symbols (debug symbols), how the symbolization process works and lastly, how to find the symbolic names of addresses in a compiled binary.

The actual location of the symbolic information depends on the programming language implementation the program is written in. We can categorize the programming language implementations into three groups: compiled languages (with or without a runtime), interpreted languages, and JIT-compiled languages.

In this post, we will continue our journey to find fantastic symbols. And we will look into where to find them for the other types of programming language implementations.

JIT-compiled language implementations

Examples of JIT-compiled languages include Java, .NET, Erlang, JavaScript (Node.js) and many others.

Just-In-Time compiled languages compile the source code into bytecode, which is then compiled into machine code at runtime, often using direct feedback from runtime to guide compiler optimizations on the fly.

Because functions are compiled on the fly, there is no pre-built, discoverable symbol table in any object files. Instead, the symbol table is created on the fly. The symbol mappings (location to symbol) are usually stored in the memory of the runtime or virtual machine and used for rendering human-readable stack traces when it is needed , e. g. when an exception occurs, the runtime will use the symbol mappings to render a human-readable stack trace.

The good thing is that most of the runtimes provide supplemental symbol mappings for the just-in-time compiled code for Linux to use perf.

perf defines an interface to resolve symbols for dynamically generated code by a JIT compiler. These files usually can be found in /tmp/perf-$PID.map, where $PID is the process ID of the process of the runtime that is running on the system.

The runtimes usually don’t enable providing symbol mappings by default. You might need to change a configuration, run the virtual machine with a specific flag/environment variable or run an additional program to obtain these mappings. For example, JVM needs an agent to provide supplemental symbol mapping files, called perf-map-agent.

Let’s see an example perf map file for NodeJS. The runtimes out there output this file with more or less the same format, more or less!

To generate a similar file for Node.js, we need to run node with --perf-basic-prof option.

# With Node.js >=v0.11.15 the following command will create a map file for NodeJS:

node --perf-basic-prof your-app.js

This will create a map file at /tmp/perf-.map that looks like this:

3ef414c0 398 RegExp:[{(]

3ef418a0 398 RegExp:[})]

59ed4102 26 LazyCompile:~REPLServer.self.writer repl.js:514

59ed44ea 146 LazyCompile:~inspect internal/util/inspect.js:152

59ed4e4a 148 LazyCompile:~formatValue internal/util/inspect.js:456

59ed558a 25f LazyCompile:~formatPrimitive internal/util/inspect.js:768

59ed5d62 35 LazyCompile:~formatNumber internal/util/inspect.js:761

59ed5fca 5d LazyCompile:~stylizeWithColor internal/util/inspect.js:267

4edd2e52 65 LazyCompile:~Domain.exit domain.js:284

4edd30ea 14b LazyCompile:~lastIndexOf native array.js:618

4edd3522 35 LazyCompile:~online internal/repl.js:157

4edd37f2 ec LazyCompile:~setTimeout timers.js:388

4edd3cca b0 LazyCompile:~Timeout internal/timers.js:55

4edd40ba 55 LazyCompile:~initAsyncResource internal/timers.js:45

4edd42da f LazyCompile:~exports.active timers.js:151

4edd457a cb LazyCompile:~insert timers.js:167

4edd4962 50 LazyCompile:~TimersList timers.js:195

4edd4cea 37 LazyCompile:~append internal/linkedlist.js:29

4edd4f12 35 LazyCompile:~remove internal/linkedlist.js:15

4edd5132 d LazyCompile:~isEmpty internal/linkedlist.js:44

4edd529a 21 LazyCompile:~ok assert.js:345

4edd555a 68 LazyCompile:~innerOk assert.js:317

4edd59a2 27 LazyCompile:~processTimers timers.js:220

4edd5d9a 197 LazyCompile:~listOnTimeout timers.js:226

4edd6352 15 LazyCompile:~peek internal/linkedlist.js:9

4edd66ca a1 LazyCompile:~tryOnTimeout timers.js:292

4edd6a02 86 LazyCompile:~ontimeout timers.js:429

4edd7132 d7 LazyCompile:~process.kill internal/process/per_thread.js:173

Each line has START, SIZE and symbolname fields, separated with spaces. START and SIZE are hex numbers without 0x. symbolname is the rest of the line, so it could contain special characters.

With the help of this mapping file, we have everything we need to symbolize the addresses in the stack trace. Of course, as always, this is just an oversimplification.

For example, these mappings might change as the runtime decides to recompile the bytecode. So we need to keep an eye on these files and keep track of the changes to resolve the address correctly with their most recent mapping.

Each runtime and virtual machine has its peculiarities that we need to adapt. But those are out of the scope of this post.

Interpreted language implementations

Examples of interpreted languages include Python, Ruby, and again many others. There are also languages that commonly use interpretation as a stage before JIT compilation, e. g. Java. Symbolization for this stage of compilation is similar to interpreted languages.

Interpreted language runtimes do not compile the program to machine code. Instead, interpreters and virtual machines parse and execute the source code using their REPL routines. Or execute their own virtual processor. So they have their own way of executing functions and managing stacks.

If you observe (profile or debug) these runtimes using something like perf, you will see symbols for the runtime. However, you won't see the language-level context you might be expecting.

Moreover, the interpreter itself is probably written in a more low-level language like C or C++. And when you inspect the object file of the runtime/interpreter, the symbol table that you would find would show the internals of the interpreter, not the symbols from the provided source code.

Finding the symbols for our runtime

The runtime symbols are useful because they allow you to see the internal routines of the interpreter. e. g. how much time your program spends on garbage collection. And it’s mostly like the stack traces you would see in the debugger or profiler will have calls to the internals of the runtime. So these symbols are also helpful for debugging.

Most of the runtimes are compiled with production mode, and they most likely lack the debug symbols in their release binaries. You might need to manually compile your runtime in debug mode to actually have them in the resulting binary. Some runtimes, such as Node.js, already have them in their productiondistributions.

Lastly, to completely resolve the stack traces of the runtime, we might need to obtain the debug information for the linked libraries. If you remember from the first blog post, debuginfo files can help us. Debuginfo files for software packages are available through package managers in Linux distributions. Usually for an available package called mypackage there exists a mypackage-dbgsym, mypackage-dbg or mypackage-debuginfopackage. There are also public servers that serve debug information. So we need to find the debuginfo files for the runtime we are using and all the linked libraries.

Finding the symbols for our target program

The symbols that we look for in our own program likely are stored in a memory table that is specific to the runtime. For example, in Python, the symbol mappings can be accessed using symtable.

As a result, you need to craft a specific routine for each interpreter runtime (in some cases, each version of that runtime) to obtain symbol information. Educated eyes might have already noticed, it’s not an easy undertaking considering the sheer amount of interpreted languages out there. For example, a very well known Ruby profiler, rbspy, generates code for reading internal structs of the Ruby runtime for each version.

If you were to write a general-purpose profiler, like us, you would need to write a special subroutine in your profiler for each runtime that you want to support.

Again, don’t worry, we got you covered

The good news is we got you covered. If you are using Parca Agent, we already do the heavy lifting for you to symbolize captured stack traces. And we keep extending our support for the different languages and runtimes. For example, Parca has already support for parsing perf JIT interface to resolve the symbols for collected stack traces.

Check Parca out and let us know what you think, on Discord channel.

Fantastic Symbols and Where to Find Them — Part 1

Kemal Akkoyun — Sat, 15 Jan 2022 08:51:50 GMT

Fantastic Symbols and Where to Find Them — Part 1

Originally published on polarsignals.com/blog on 13.01.2022

Symbolization is a technique that allows you to translate machine memory addresses to human-readable symbol information (symbols).

Why do we need to read what programs do anyways? We usually do not need to translate everything to a human-readable format when things run smoothly. But when things go south, we need to understand what is going on under the hood. Symbolization is needed by introspection tools like debuggers, profilers and core dumps or any other program that needs to trace the execution of another program. While a target program is executing on a machine, these types of programs capture the stack traces of the program that is being executed.

A stack trace (also called stack backtrace or stack traceback) is a report of the active stack frames at a certain point in time during the execution of a program.

In raw stack traces, the addresses of the functions that are being called are recorded. The addresses are hexadecimal numbers representing the memory return addresses of the functions. Symbols are needed to translate memory addresses into function and variable names precisely as in the program’s source code to be read by us humans. Without symbols, all we see are hexadecimal numbers representing the memory addresses that we have captured.

It sounds simple enough, right? Well, it’s not. As with everything else about computers, it’s a bit of sorcery. It has its challenges, such as associating them with correct symbols, transforming addresses, and most importantly, actually finding the symbols! The strategies to get symbol information varies depending on the platform and the programming language implementation that the program is written in.

For the sake of simplicity, we will be focusing on Linux as the target platform and ignore Windows, macOS and many other platforms. Otherwise, I could end up writing a small size book in here :)

Fantastic Symbols …

A symbol (or debug symbol, to be precise) is a special kind of symbol that attaches additional information to the symbol table of a program. This symbol information allows a debugger or a profiler to gain access to information from the program’s source code, such as the names of identifiers, including variables and functions. But where can we find these symbols?

… and Where to Find Them

If the program is a compiled one, these may be compiled together with the binary file, distributed in a separate file, or discarded during the compilation and/or linking. Or, if the program is interpreted, these may be stored in the program itself. Let’s briefly look at where and how we can find these symbols depending on the programming language implementation.

Compiled language implementations

Examples of compiled languages include C, C++, Go, Rust and many others.

The compiled languages usually have a symbol table that contains all the symbols used in the program. The symbol table is usually compiled in the executable binary file. And the binary file is typically in the ELF format (for Linux systems). Symbol tables are included in the ELF binary file, specifically for mapping the addresses to function names and object names. In rare cases, it is stored in a separate file, usually with the same name as the binary file, but with a different extension.

The ELF format is not an easy one to describe in a couple of sentences. For the purpose of this article, we will focus on what we need to know about the ELF format. Each ELF file is made up of one ELF header, followed by file data. The ELF header is a fixed size and contains information about the data sections. The relevant part for us is the symbols can live in a special section called .symtab and .dynsym. .dynsym is the “dynamic symbol table” and it is a smaller version of the .symtab that only contains global symbols.

Contents of .dynsym and .symtab section using readelf -s /bin/go:

Symbol table '.dynsym' contains 38 entries:

Num: Value Size Type Bind Vis Ndx Name

0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND

1: 00000000006355e0 99 FUNC GLOBAL DEFAULT 1 crosscall2

2: 00000000006355a0 55 FUNC GLOBAL DEFAULT 1 _cgo_panic

3: 0000000000465560 25 FUNC GLOBAL DEFAULT 1 _cgo_topofstack

4: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (6)

5: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (4)

6: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (4)

7: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (4)

8: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (4)

9: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (4)

10: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (4)

...

Symbol table '.symtab' contains 13199 entries:

Num: Value Size Type Bind Vis Ndx Name

0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND

1: 0000000000000000 0 FILE LOCAL DEFAULT ABS go.go

2: 0000000000401000 0 FUNC LOCAL DEFAULT 1 runtime.text

3: 0000000000401000 214 FUNC LOCAL DEFAULT 1 net(.text)

4: 00000000004010e0 214 FUNC LOCAL DEFAULT 1 runtime/cgo(.text)

5: 00000000004011c0 601 FUNC LOCAL DEFAULT 1 runtime/cgo(.text)

6: 0000000000401420 480 FUNC LOCAL DEFAULT 1 runtime/cgo(.text)

7: 0000000000401420 47 FUNC LOCAL HIDDEN 1 threadentry

8: 0000000000401600 70 FUNC LOCAL DEFAULT 1 runtime/cgo(.text)

9: 0000000000401646 5 FUNC LOCAL DEFAULT 1 runtime/cgo(.tex[...]

10: 0000000000401646 5 FUNC LOCAL HIDDEN 1 x_cgo_munmap.cold

Go has a unique table (of course). It stores its symbols in a section called .gopclntab. This is a table of functions, line numbers and addresses. Go does this because it needs to be able to render human-readable stack traces when a panic occurs in runtime;

Note that addresses in the symbol table do not move during execution so that they can be read any time during the execution of the program. They can easily be loaded into memory independent of the running program and an observer can easily read them.

We assumed that the binary file is a statically linked executable until this point. However, this might not be the case. The binary file might be dynamically linked to other libraries. From now on, we will refer to these shared library files and executables (both in ELF format) as object files. Each object file can have its own symbol table.

We need to note that when we take a snapshot of the stack (a.k.a stack trace), it could include addresses from linked shared libraries and Kernel functions.

Kernel-level software differs as it has its own dynamic symbol table in /proc/kallsyms, which is a file that contains all the symbols that are used in the kernel. And it can grow as the kernel modules are loaded.

We can read the object files by using binary utilities such as objdump, readelf and nm.

To read the .symtab:

nm $FILE

# or

objdump --syms $FILE

# or

readelf -a $FILE

To read the .dynsym:

nm -D $FILE

# or

objdump --dynamic-syms $FILE

# or

readelf -a $FILE

For the compiled languages, the symbol table is not the only source of symbols. There are also DWARFs!

Debuginfo

ELFs and DWARFs, welcome to fairyland.

Another way to obtain the symbols from an object file is to use the debug information or debuginfo in short. Same as the symbol table, this information can be compiled in the binary file, formatted in the DWARF(Debugging With Attributed Record Formats) or in a separate file.

DWARF is the debug information format most commonly used with ELF. It’s not necessarily tied to ELF, but the two were developed in tandem and work very well together. This information is split across different ELF sections (.debug_* and .zdebug_* for compressed ones), each with its own piece of information to relay. For our specific needs, we need to use the .debug_info section to find corresponding functions and .debug_line section to corresponding line numbers.

Debuginfo files for software packages are available through package managers in Linux distributions. Usually for an available package called mypackage there exists a mypackage-dbgsym, mypackage-dbg or mypackage-debuginfo package. There are also public servers that serve debug information.

One Program to bring them all, and in the darkness bind them: addr2line

Wait, what?! Isn’t that from another fantasy book?

Now that we have the symbol table or debug information, we can use addr2line (address to line) to get the source code location of a given address. addr2line converts addresses back to function and line numbers.

Let’s see it in action addr2line -a 0x0000000000001154 -e :

For addr2line can be any object file compiled with debug information or symbols. It can be an executable, a shared library or output of a strip operation.

Voilà!

0x0000000000001154

main

/home/newt/Sandbox/hello-c/hello.c:14

I used a simple C executable for this example. And we have got our symbol and attached source information for the corresponding address 🎉

I only wish we had compiled programming language implementations out there, then our job here could have been finished. But we are not. We need to keep digging. But for that, you need to wait for another week. As we hinted at in the title of this post, there will be a part 2! All the best franchises are sequels, right?! In part 2, we will see how interpreted languages and Just-In-Time compiled languages handle symbols.

Please stay tuned!

Don’t worry we got you covered

Even though we simplified things a bit here, if you want to write a program to utilize symbolization, you still have a lot of work to do. Many open-source tools out there already handle nitty-gritty details of symbolization, like perf.

Check Parca out and let us know what you think, on Discord channel.

Sources

Making Drone Builds 10 Times Faster!

Kemal Akkoyun — Wed, 10 Apr 2019 00:00:00 GMT

We open sourced drone-cache, a plugin for the popular Continuous Delivery platform Drone. It allows you to cache dependencies and interim files between builds to reduce your build times. This post explains why we are using Drone, why we needed a cache plugin, and what I learned while trying to release drone-cache as open source software.

Read on for the story behind drone-cache or if you want to jump into action directly, go to the github.com/meltwater/drone-cache, and try it for yourself.

Originally published at underthehood.meltwater.com on April 10, 2019.

Why are we using Drone?

At Meltwater, we empower self-sufficient teams. Teams are free to choose their technology stacks. As a result, we have a diverse set of tools in our stack. In my team, we had been using a combination of TravisCI, CircleCI and Jenkins as our CI/CD pipeline.

In 2018, we decided to migrate to Kubernetes. In doing so, we wanted to simplify our toolchain and migrate to a more flexible, cloud-native and on-premise CI/CD pipeline solution. We ended up choosing Drone, and with one year of experience under our belt, we are more than happy with it.

How we made the builds faster

My team lives and breaths the “release early, release often” philosophy. We release and deploy our software to production several times a day. When we moved from CircleCI to Drone, our build times went up drastically.

Build times went up so much because, for each build, our package manager was downloading the Internet (you know, usual suspects are npm, RubyGems, etc.). This was not a problem with CircleCI because of their built-in caching facilities. So with our pace of continuous releases and increased build times, we got frustrated quickly.

Since we had been spoiled with the wonderful caching features of CircleCI, we wanted the same features in Drone but they are not available by default. However, Drone offers plugins which are “special Docker containers used to drop preconfigured tasks into a Pipeline”. We found tens of plugins related to caching in Drone.

We first tried drone-volume-cache, but because volumes are local to the currently running Drone worker node, you cannot be sure your that next build will run on the same machine. Using a storage layer that could persist the cache between builds would be a better option. So we quickly abandoned this approach.

Our Drone deployment runs on AWS, hence we looked for plugins that use S3 as their storage. We found lots of them and decided to use drone-s3-cache. It’s a well-written, simple Go program which follows the Drone plugin starter conventions.

Why did we decide to build our own caching plugin?

After using drone-s3-cache for a couple of weeks, we needed to add another parameter to pass to S3. To do so we forked drone-s3-cache and modified it. We thought that nobody would need those minor changes. So rather than contributing back to upstream, we built a docker image of our own and pushed it to our private registry to use as a custom Drone plugin.

Months later, I have received a feature request from one of my colleagues working in a different team, and I was surprised because I didn’t think other teams used drone-cache. When I checked, I realised that various teams throughout Meltwater heavily used it. Then we started to get similar messages and requests from other teams.

I received this message when I was looking for a problem to solve during our internal Hackathon. What are the chances? So I decided to work on this plugin and add the requested feature. Building something to make life easy for fellow developers always gives me pure joy. Long story short, stars were aligned, and we decided to work on our fork and improve it.

I had not worked with Go much, but I always wanted to learn. Thanks to this plugin, I have also achieved this goal of mine. I changed, refactored and churned a lot of code. I experimented with a lot of different ideas. I have added features that nobody has asked for. I tried different things just for the sake of trying. That’s why when I decided to open source my changes, I realised I had re-written the plugin. So rather than sending a pull-request, I created a new repository. drone-cache has born!

How does it work?

What does a Drone cache plugin actually have to accomplish? In Drone, each step in the build pipeline is a container which is thrown away after it serves its purpose. So a caching system has to persist current workspace files between builds. You can think of workspace as the root of your git repository. It is a mounted volume shared by all steps in your Drone build pipeline.

With drone-cache, after your initial pipeline run, a snapshot of your current workspace will be stored. Then you can restore that snapshot in your next build, which saves you time.

The best example would be to use this plugin with your package managers such as npm, Mix, Bundler or Maven. With restored dependencies from a cache, commands such as npm install would only need to download new dependencies, rather than re-download every package on each build.

What makes drone-cache different from other Drone caching solutions?

The most useful feature of drone-cache is that you can provide your own custom cache key templates. This means you can store your cached files under keys which prescribes your use cases. For example, with a custom key generated from a checksum of a file (say package.json), you keep your cached files until you actually touch that file again.

All other caching solutions for drone offer only a single storage form for your cache. drone-cache in contrast offers 2 storage forms out of the box: an S3 bucket or a mounted volume. Even better, drone-cache provides a pluggable backend system, so you can implement your own storage backend.

Last but not least, drone-cache is a small CLI program, written in Go without any external OS dependencies. So even if you are not using Drone as your build system, you can still fork and tinker with drone-cache to make it fit your needs.

What we have learned?

Building a caching solution is hard. Especially, if every team in your company uses it every time they push something to their repositories. It is also fun because it means you have users who give you feedback from the beginning. With the help of my colleagues’ feedback and feature requests, we have crafted this plugin.

There are only two hard things in Computer Science: cache invalidation and naming things.

Phil Karlton

What could we have done better? As I have mentioned before, rather than forking and modifying a new code base, we could have contributed back to the original project. We could have applied “release early and often” philosophy to open sourcing this repository, and we would have collected feedback from the outside world as well. However we didn’t, that’s mostly on me. This is the first time I actually open sourced a project and contributed back to the community. So next time I will know better :)

In Meltwater we are using drone-cache in 20 teams and 120 components now. It works and gets things done for us. We have learned a lot while we build it. We hope this also solves similar problems of yours.

Please try it in your pipeline, give us feedback, feel free to open issues and send us pull-requests. Personally, I am also very interested to discuss your experiences with open sourcing in general, so if you have any thoughts on that, please share them in the comments below.

Image Credits

xkcd.com — How standards proliferate https://xkcd.com/927/

Originally published at underthehood.meltwater.com on April 10, 2019.

Stories by Kemal Akkoyun on Medium

Vibe Coding with Cursor: My R&D Week Adventure

The Setup: R&D Week Vibes

What Makes Cursor Different?

The Good Parts

Second Brain Management: A Pleasant Surprise

The Rules Feature: A Game Changer

Notepads: Beyond Simple Notes

Small Projects, Big Impact

Lessons Learned

What’s Next?

Conclusion

FOSDEM 2025: Blimey, What a Weekend!

Another Year, Another FOSDEM

Saturday: Go, Go, Go… and the eBPF Black Hole#

Sunday: Monitoring, Metrics, and Maybe Too Many Frites#

Observability Overload#

Community Vibes

Bonus: Trains, Chaos, and a Race Against Time

Final Thoughts

Profiling Python with eBPF: A New Frontier in Performance Analysis

Fantastic Symbols and Where to Find Them — Part 2

Fantastic Symbols and Where to Find Them — Part 2

JIT-compiled language implementations

Interpreted language implementations

Finding the symbols for our runtime

Finding the symbols for our target program

Again, don’t worry, we got you covered

Further reading

Fantastic Symbols and Where to Find Them — Part 1

Fantastic Symbols and Where to Find Them — Part 1

Fantastic Symbols …

… and Where to Find Them

Compiled language implementations

Debuginfo

One Program to bring them all, and in the darkness bind them: addr2line

Don’t worry we got you covered

Further reading

Sources

Making Drone Builds 10 Times Faster!

Why are we using Drone?

How we made the builds faster

Why did we decide to build our own caching plugin?

How does it work?

What makes drone-cache different from other Drone caching solutions?

What we have learned?