PCA 🔥 120 Questions for Prometheus Certified Associate Exam

Hey there, fellow Prometheus enthusiasts!

As part of my preparation, I created a list of sample exam questions. These not only gauged my progress but also highlighted areas where I needed to improve. I made it a point to review my answers thoroughly, learning from my mistakes and solidifying my understanding of the Prometheus ecosystem. It was specially helpful to utilize AI Chats for asking mock questions or simular topic questions and delving deeper into the topic.

Furthermore, I highly recommend the book “Prometheus: Up & Running” as an essential resource to comprehend all the vital topics of Prometheus. Gaining proficiency in PromQL and queries was greatly aided by reading the informative posts from “PromLabs” and referring to the “promql-cheat-sheet”.

Good luck to everyone, happy learning and happy exam-taking!🤞

Exam Info

Duration: 90 min
Question: 60
Pass: x > 75% (min 45/60)

💬 120 Sample Exam Questions 💬

Observability Concepts (18%)

What is the preferred approach used by Prometheus to collect metrics from a target?
What is the Observability?
What is RED Method?
What are the distinctions between SLO, SLI, and SLA?
In the context of tracing, what is the meaning or representation of a span?
In which scenarios is distributed tracing less beneficial or NOT as applicable?
What are typically tracked within a span of a trace?
What is the good and bad metric?
Which type of data is monitored by Prometheus?
In the context of monitoring and observability, what type of data is typically used to define a SLI?
What is the meaning or purpose of an error budget policy?
What is one advantage of the push model for recoring metrics compared to pull models?
How do Prometheus, ELK stack, and InfluxDB differ in terms of their functionalities and use cases?
What is the definition of a metric?
What are the Prometheus exemplars?
What is one of the main purposes or goals of logging?
What are the 3 core components of observability?
What is the Monitoring
What is the Telemetry?
What is the challenges of observability?

Prometheus Fundamentals (20%)

What is the CLI utility tool for Prometheus called?
What are the limitations of Prometheus?
What is Service Discovery and which categories are there?
Which property configures the timing to scrape metrics from targets?
Which section in the Prometheus configuration file governs the selection of targets to be scraped?
Which action in the label configuration is used to delete a specific target?
How is managed data retention in prometheus?
What are the essential 3 components of Prometheus?
What is required to be able to reload Prometheus?
What are 3 methods to restart the Prometheus server?
What HTTP method does Prometheus employ for performing scrapes?
Which SD configuration is recommended for scraping EC2 instances?
Which SD configuration is recommended for nodes of Elastic Kubernetes Service on AWS?
What is the purpose of the scrape_interval configuration in Prometheus?
Which type of database does Prometheus utilize?
What component is responsible for collecting metrics from an instance and exposing them in a format that Prometheus expects?
Which component is suitable for collecting metrics from batch/cron jobs?
When is the configuration option honor_labels:true used?
What is the purpose of port 9090/9093/9100/9091/9115 in Prometheus?
what are 2 default metric labels?
Which of the file systems is recommended/supported by Prometheus?
How can you configure a Blackbox Exporter probe to check the successful response of your servers to PING?
How do you configure the targets that Prometheus should scrape?
What is the agent deployment mode of Prometheus?
Which CLI command is suitable for unit testing Prometheus rules?
Which CLI command is suitable for checking validity of the config files?
How do you define the targets with SD that Prometheus should collect metrics from?
How can you delete the specific time series metrics of Prometheus?
How can you delete the all time series metrics of Prometheus?
Which format does file-based SD provide?

PromQL (28%)

What is PromQL?
What is histogram metric in Prometheus?
Which 4 data types are used in PromQL?
What is the name of the vector in Prometheus that stores a single sample value?
Which PromQL function is used to estimate the value of a time series at a future time, t seconds from the current time, based on the range vector v?
Between what type of expressions can logical operators be defined?
Which function can be used to calculate the average of a range vector in Prometheus?
What is the diff between avg(...) and avg_over_time(...)?
With which type of metrics is the rate(...) function primarily used in Prometheus?
What does the term “offset” refer to in Prometheus?
What distinguishes the rate(...) and irate(...) query functions in Prometheus?
What distinguishes the rate(...) and deriv(...) query functions in Prometheus?
Which type of metric is suitable for measuring the internal temperature of a server?
What is the data type of Prometheus metric values?
How many unique series are generated by a histogram metric type?
What are the 4 components of the Prometheus metrics data model?
What is the difference between the ceil and floor functions?
Which query function among the following returns a result of 1 in case the specified time series does not exist?
What is the logical/arithmetic/comparison binary operator?
What is the vector matching?
What is the group modifiers?
Which function is NOT using counter metrics? irate(), increase(), reset(), idelta(), avg(), rate()
How to calc the time in days until the LAST certificate expiration?
What is the dimensional aggregation?
What is the significance of the double underscore “__” before a label name?

Instrumentation and Exporters (16%)

What is the HTTP headers to establish by Prometheus during each scrape?
Which 2 query parameters are required when configuring a Blackbox Exporter probe?
What is the exposition format of Prometheus?
Does Prometheus need to perform any format conversion on the metrics returned by a monitored Linux machine?
What is the default endpoint that Prometheus uses to scrape the metrics from the target?
Where is the version of the Prometheus exporter typically defined?
What is the most suitable exporter for monitoring an HTTP web server endpoint to verify that it returns a 200 status code?
Which Prometheus exporter is recommended for monitoring network devices?
Which networking protocol does Prometheus utilize for performing scrapes?
What is the purpose of a Prometheus metrics registry?
What is the purpose or definition of a Prometheus exporter?
In what scenarios would you use the Blackbox Exporter?
How does Prometheus identify the scrape path for its targets?
Which endpoints allows blackbox probing?
In a scenario where you have a dynamic etcd database containing scrape targets for Prometheus, how should you configure service discovery?
What are the 2 types of attributes that can be present in the /metrics endpoint?
Which exporter is the most suitable for monitoring Scala metrics among the following options?
How to keep pushgateway job labels? normally there are overwritten
How does Prometheus scrape the last batch job push time?
What is the 3 types of service system?

Recording & Alerting & Dashboarding (18%)

Is there a way to deactivate a specific route in Alertmanager for a specific time frame?
What is considered a best practice when it comes to alerting in monitoring systems: focusing on alerting based on symptoms or alerting based on causes?
What is the meaning of “alert symptoms” and “alert causes” in the context of monitoring systems?
Which aspect, symptoms or causes, is more visible to customers in the context of an issue?
What is the good naming convention for the recoring rules?
What is the acknowledge-based throttling and Waht is the time-based throttling?
What are the 3 statuses of a Prometheus alert?
How can I use a PromQL query to retrieve the currently active alerts in Alertmanager?
What is the recording rules in Prometheus?
How to define the recording rules?
Whas is the alert fatigue?
Which feature of Alertmanager is responsible for formatting and customizing the alerts?
How can you configure Alertmanager to disable the grouping of alerts for a specific route effectively?
Which software is commonly used for visualizing Prometheus metrics?
What does the term “inhibiting” refer to in the context of Alertmanager?
What is the format used for defining alerting rules?
What is the significance of the for attribute in a Prometheus alert rule?
How can you temporarily mute/snooze/suppress an alert during maintenance in Prometheus?
What is the name of Prometheus native dashboarding and visualization feature?
How can you coordinate the simultaneous sending of multiple alerts with similar label sets in Prometheus?
Which feature of Alertmanager is resonsilbe for sending alert to the right receiver?
What is the purpose of the repeat_interval/conitnue/group_wait/group_inteval attribute in an Alertmanager route configuration?
Which 2 attributes of an alerting rule can be used to include extra metadata?
What are required for a high-availability configuration of Alertmanager?
What are the 3 statuses of Alertmanager Silences?

💡 Answer of Questions 💡

Observability Concepts (18%)

pull-based
Observability: understand what’s happening inside a system and predict how it will behave in the future
RED Method consists of: (Request) Rate + (Request) Errors + (Request) Duration
SLO: Service Level Objective (Goal), SLA: Service Level Agreement (Contract), SLI: Service Level Indicator (Metrics)
Span is a single operation/unit of work within a distributed system and captures the start and end times, duration, and associated metadata of a specific operation
for monolith system
Operation Name, Trace ID and Span ID, Start and End Timestamps, Duration, Parent Span ID
bad: a metric with a lot fo variance and poor correlation with user experience, good: metric to set easier threashold for bcs there is no overlap at all.
Metrics (numeric value)
SLI is typically derived from metrics
An error budget policy is a concept used in the context of SLOs and SLAs and is to define the acceptable level of errors or service disruptions that a system or service can experience within a given time period.
timely and proactive data collection (real-time or near real-time) / pushing into the centralized data system
InfluxDB is a pull-based time-series database designed to handle high volumes of time-stamped data (IoT, Sensor, Analytics).
ELK stack is a push-based system, used for collecting, processing, storing, and visualizing log data.
Prometheus is a pull-based time-series database and monitoring system specifically designed for monitoring dynamic cloud-native environments.
numeric time-series data point
An exemplar is a specific trace representative of measurement taken in a given time interval and provides additional information about a specific data point.
To gather and aggregate textual event data from a service for troubleshooting
Logging, Trace and Metrics
Monitoring: continues observation of a system to detect and alert on abnormal behavior.
Telemetry: automates collection and transmission of data from remote source.
Data silos, Volume, velocity, variety, and complexity of data, Lack of pre-production

Prometheus Fundamentals (20%)

promtool
scalability for large-scale deployments with millions of TS, Long-term storage, High cardinality, HA and Replication
SD is a mechanism that allow to automatically discover and monitor targets and services. There are 2 categories: top-down (e.i. ec2) and bottom-up (e.i. consol) mechanisms of static SD
scrape_interval
scrape_configs
scrape_configs -> relabel_configs -> action: drop or action: keep
with the flag --storage.tsdb.retention.time and --storage.tsdb.retention.size
Retrieval, TSDB, HTTP Server
with the flag --web.enable-lifecycle
Sending a SIGHUP signal to the Prometheus process, Using the Prometheus API POST or PUT + /-/reload, Using a service manager (systemctl) or orchestration tool (k8s)
HTTP GET method
ec2_sd_configs
ec2_sd_configs
how frequently Prometheus collects and updates the metrics
time-series database
Prometheus exporter
Pushgateway
Using honor_labels can make your collected metrics more informative and allow you to differentiate between different metrics coming from various sources or probe targets
9090:prometheus-server, 9093:altermanger, 9100:node-exporter, 9091:pushgateway, 9115:blackbox-exporter
instance and job
ext4, XFS, and NTFS
Internet Control Message Protocol (ICMP) -> prober:icmp
scrape_configs > static_configs -> targets:xxx
agent mode is a light promtheus mode, which is focused for remote-write (remote storage), service-discovery and scraping specially for edge-computing/IoT and reducing for querying, alerting and local storage
./promtool test rules test.yml
./promtool check rules test.yml
scrape_configs and *_sd_configs on per-job basis
starting the server with the flag --web.enable-admin-api + curl - X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={xxxx="yyy"}'
starting the server with the flag --web.enable-admin-api + $ curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/clean_tombstones'
YAML and JSON

PromQL & Metrics (28%)

Query Language for Prometheus
Histogramm samples observations (e.g. request durations or response sizes) and counts them in configurable buckets
Scalar, String, Instant Vector, Range Vector
Instant Vector
predict-linear()
boolean
avg_over_time(metrics[x])
avg_over_time(...) has range vector as input and returns range vector as output. avg(...) has instant vector as input and returns aggregated number.
rate(...) needs COUNTER type metrics
offset refers to the past time as duration
rate(...) calc avg rate of change of a time series over the specified time range, irate(...) calc avg rate of change of a time series at the last 2 data points
deriv(...) operates on gauge and rate(...) operates on counter
guage
float64.
<basename>_bucket, <basename>_sum and <basename>_count
metric name, metrics label, timestamp, value
floor(...) = round a number down, ceil(...) = round a number up
absent(...)
logical => OR, AND, UNLESS, arithmetic => + - * / % ^, comparison => ==, !=, >, <, >=, <=
on, ignoring
a part of vector matching. on, ignoring + group_left, group_right
idelta()
max(cert_expiry - time()) / 86400
sum(), min(), max(), avg(), count()
The label is a reserved label

Instrumentation and Exporters (16%)

X-Prometheus-Scrape-Timeout-Seconds
target + module
text-based format for exposing metrics
No
/metrics
build_info
Blackbox Exporter
SNMP exporter
HTTP protocol
Registry serves as a central repository for collecting, storing, and managing metrics
Exporter is responsible for collecting metrics from a specific system, application, or service and exposing them for Prometheus
Network Service Monitoring, Helth Check, Externe Monitoring
scrape_configs > metrics_path: /metrics
Blackbox Exporter allows blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP, ICMP, gRPC
file_sd_configs
HELP, TYPE
JMX Exporter
honor_labels:true
PromQL > job_last_success_unixtime
online-serving, offline-processing, and batch jobs

Recording & Alerting & Dashboarding (18%)

attribute in the route time_intervals ex. time_intervals: [holidays, offhours]. mute_time_interval is DEPRECATED.
symptom-based and NOT causes-based
sympton: The “what’s broken”, cause: “why broken”
symptom is customer visible error
<<level>>:<<metric>>:<<operations>>, e.g.job:node_cpu_seconds:avg_idle
acknowledge-based = notifications for an alert are sent to the recipient only once until the alert is acknowledged or resolved
time-based = timiting the rate of notifications based on a specific time interval (ex. goup_interval, scrape_interval)
firing, pending, inactive
Mene > Alerts > Query > ALERTS
aggregate and filter metrics with PromQL and storing them into Prometheus DB
rules -> record: xxx,expr: xxx
Alert fatigue refers to a situation where individuals or teams become overwhelmed or desensitized by a large volume of alerts
notification templates
group_by
Grafana
Inhibiting refers to a feature that allows certain alerts to be stopped or prevented from generating notifications for a specified duration of time
YAML
for allows for a delay or threshold before an alert is firing, helping to prevent false positives and reduce noise in alerting systems
Slience
Prometheus Console
Grouping
Routing
repeat_interval: is used to determine the wait time before a firing alert that has already been successfully
continue: specifies whether to continue processing subsequent routes after sending a notification for an alert
group_wait: sets how long to initially wait to send a notification
group_interval: dictates how long to wait before sending notifications about new alerts
annotations + labels
This can be configured using the --cluster-* flags
Active, Pending, Expired