- Published on
Leveraging Observability Practices in Software Development
Introduction
I've been playing around with observability tools recently.
So I decided to build something and break it on purpose. Made a simple cottage booking app and intentionally created some database lock issues to see if observability could actually help me debug the mess.
Used OpenTelemetry, Prometheus, Grafana, and Loki - the usual suspects for metrics, traces, and logs.
Code's all here if you want to see: github.com/morifky/cottage-booking-app
The Setup
OpenTelemetry does most of the heavy lifting - automatically instruments GORM calls and HTTP requests. Pretty neat.
For metrics, I'm tracking request times, booking counts, database stuff. You know, the usual things that tell you when something's going wrong.
Logs go through Zap to Loki. The nice thing is everything connects with trace IDs, so you can jump from a log line to see the full request flow.
How it works
App sends everything to the OpenTelemetry Collector, which routes metrics to Prometheus, logs to Loki, traces to Tempo. Grafana pulls it all together.
+-----------------+
| Grafana |
| (Visualization) |
+--+----+---+----+-+
^ ^ ^ ^
| | | |
+-----------------------+ +---+----+---+----+---+ +-----------------+
| Cottage Booking App |----->| OpenTelemetry |----->| Prometheus |
| (Instrumented w/SDK) | | Collector | | (Metrics) |
+-----------------------+ +-------------------+ +-----------------+
|
+--------------->| Loki |
| | (Logs) |
| +-----------------+
|
+--------------->| Grafana Tempo |
| (Traces) |
+-----------------+
Breaking things
Here's the fun part - I wrote some code to create database lock contention on purpose:
func (br *BookingRepository) SaveWithLock(ctx context.Context, booking *models.Booking, holdDuration time.Duration) error {
tracer := otel.Tracer("booking-repository")
ctx, span := tracer.Start(ctx, "SaveWithLock")
defer span.End()
span.SetAttributes(
attribute.Int("visitor_id", int(booking.VisitorID)),
attribute.Int("room_id", int(booking.RoomID)),
attribute.String("hold_duration", holdDuration.String()),
)
return br.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
// Acquire exclusive table lock
if err := tx.Exec("LOCK TABLE bookings IN ACCESS EXCLUSIVE MODE").Error; err != nil {
span.SetStatus(codes.Error, "Failed to acquire table lock")
return err
}
// Hold the lock for specified duration
select {
case <-time.After(holdDuration):
// Continue after hold duration
case <-ctx.Done():
span.SetStatus(codes.Error, "Request timed out")
return ctx.Err()
}
return tx.Create(booking).Error
})
}
Without observability
This is what usually happens:
- App times out randomly
- Can't reproduce the issue
- Users are mad, you're confused
- Lots of guessing and hoping
With observability
Completely different story.
Here is how i simulate the lock the lock contention:
Terminal 1 - grab the lock:
curl -X POST http://localhost:8080/booking/with-lock \
-H "Content-Type: application/json" \
-d '{"visitor_id":1,"room_id":1,"hold_lock_seconds":100}'
Terminal 2 - try another booking:
curl -X POST http://localhost:8080/booking \
-H "Content-Type: application/json" \
-d '{"visitor_id":2,"room_id":2}'
Now I can see exactly what's happening - when the lock was taken, how long it lasted, why other requests are waiting. No more guessing.


What I learned
Tracing changes everything
Instead of guessing what went wrong, you can see the exact sequence of events. When that database lock happened, I could watch other requests pile up in real-time.
Structured logs with trace IDs
This is probably the best part. See an error in the logs? Click the trace ID and boom - you can see the entire request flow that caused it.
Final thoughts
Observability isn't just about fixing bugs. It's about actually understanding what your system is doing. Once you have it set up properly, debugging becomes way less painful.
Definitely worth the setup time.