Setting up New Relic for SRE work can feel like building a custom dashboard for your car while it’s still on the assembly line.
Here’s how a typical New Relic APM setup looks in the wild, focusing on the core components:
Let’s say you have a simple Go application that serves HTTP requests.
package main
import (
"fmt"
"log"
"net/http"
"time"
)
func handler(w http.ResponseWriter, r *http.Request) {
// Simulate some work
time.Sleep(50 * time.Millisecond)
fmt.Fprintf(w, "Hello, SRE!")
}
func main() {
http.HandleFunc("/", handler)
log.Println("Starting server on :8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
To instrument this with New Relic APM, you’d add the New Relic Go agent. This typically involves:
-
Adding the agent to your
go.mod:go get github.com/newrelic/go-agent/v3 -
Initializing the agent in your
mainfunction:package main import ( "fmt" "log" "net/http" "time" newrelic "github.com/newrelic/go-agent/v3" ) func handler(w http.ResponseWriter, r *http.Request) { // Simulate some work time.Sleep(50 * time.Millisecond) fmt.Fprintf(w, "Hello, SRE!") } func main() { // Initialize New Relic // Replace "My Go App" with your application name // Replace "YOUR_LICENSE_KEY" with your actual New Relic license key nr, err := newrelic.NewApplication( newrelic.ConfigAppName("My Go App"), newrelic.ConfigLicense("YOUR_LICENSE_KEY"), ) if err != nil { log.Fatal("Failed to initialize New Relic:", err) } // Instrument your HTTP server http.HandleFunc("/", newrelic.WrapHandleFunc(nr, "/", handler)) log.Println("Starting server on :8080") log.Fatal(http.ListenAndServe(":8080", nil)) }
When this application runs and receives requests, the New Relic agent automatically collects data:
- Transactions: Each incoming HTTP request becomes a transaction. The agent records its duration, endpoint, HTTP method, and status code.
- External Services: If your application makes calls to other services (databases, external APIs), the agent tracks those as external service calls, measuring latency and success rates.
- Databases: For database queries, the agent captures query execution times and the SQL statements themselves (if configured).
- Errors: Any unhandled panics or errors returned by your code are captured and reported.
This raw data is then sent to New Relic’s platform for analysis.
Alerts: Turning Data into Action
The core purpose of APM data for SREs is to detect and alert on problems. New Relic alerts are built on NRQL (New Relic Query Language), which is a SQL-like language for querying your New Relic data.
Example Alert: High Transaction Error Rate
- Goal: Notify us if more than 1% of requests to
/api/v1/usersfail within a 5-minute window. - NRQL Query:
SELECT percentage(count(*), WHERE httpResponseCode >= 400 OR error IS TRUE) FROM Transaction WHERE appName = 'My Go App' AND name = 'WebTransaction/Go/GET//api/v1/users' - Condition:
above1%over5minutes. - Notification Channels: PagerDuty, Slack.
This alert fires when the calculated error percentage for that specific transaction exceeds 1% over the specified time.
Dashboards: Visualizing System Health
Dashboards provide a high-level overview of your application’s performance and health. They combine various charts and graphs, often drawing from APM data, infrastructure metrics, and custom events.
Example Dashboard Widget: Transaction Throughput and Latency
- Chart Type: Line Chart.
- Query 1 (Throughput):
This shows the number of requests per minute.SELECT rate(count(*), 1 minute) FROM Transaction WHERE appName = 'My Go App' TIMESERIES - Query 2 (Average Response Time):
This shows the average response time in milliseconds per minute.SELECT average(duration) * 1000 FROM Transaction WHERE appName = 'My Go App' TIMESERIES
Combining these queries on a single chart allows you to quickly correlate spikes in errors or latency with changes in traffic volume.
The Mental Model: How it All Connects
New Relic APM acts as a distributed tracing system. The agent in your application is the "instrumentation." It doesn’t just count requests; it creates a lineage for each request, tracking its journey through your application and any downstream services. This lineage is what allows you to pinpoint bottlenecks. When a transaction is slow, you can drill down into its trace to see which specific function call, database query, or external API call was the culprit. Alerts are essentially automated checks against this data, and dashboards are curated views of the most critical metrics derived from it.
The New Relic agent’s WrapHandleFunc function doesn’t just add tracing; it also automatically captures HTTP status codes and errors that are returned by your handler function, not just panics. This means that even if your handler successfully completes but returns a 500 Internal Server Error status code, the agent will correctly classify it as an error transaction.