Datadog Agent Manager Changelog

What's new in Datadog Agent Manager 7.53.0

May 1, 2024
  • New Features:
  • Support database-monitoring autodiscovery for Aurora cluster instances. Adds a new configuration listener to poll for a specific set of Aurora cluster IDs and then create a new database-monitoring supported check configuration for each endpoint. This allows for monitoring of endpoints that scale dynamically.
  • Add new core check orchestrator_ecs to collect running ECS tasks
  • APM stats now include an is_trace_root field to indicate if the stats are from the root span of a trace.
  • The cluster-agent now collects network policies from the cluster.
  • Enable 'host_benchmarks' by default when running the security-agent compliance module.
  • OTLP ingest now has a feature flag to identify top-level spans by span kind. This new logic can be enabled by adding enable_otlp_compute_top_level_by_span_kind in DD_APM_FEATURES.
  • With this new logic, root spans and spans with a server or consumer span.kind will be marked as top-level. Additionally, spans with a client or producer span.kind will have stats computed.
  • Enabling this feature flag may increase the number of spans that generate trace metrics, and may change which spans appear as top-level in Datadog.
  • Experimental: The process-agent checks (process, container, and process-discovery) can be run from the Core Agent in Linux. This feature can be toggled on by setting the process_config.run_in_core_agent.enabled flag to true in the datadog.yaml file. This feature is disabled by default.
  • Enhancement Notes:
  • Add the container image and container lifecycle checks to the output of the Agent status command.
  • Add kubelet_core_check_enabled flag to Agent config to control whether the kubelet core check should be loaded.
  • Added LastSuccessfulTime to cronjob status payload.
  • Add a retry mechanism to Software Bill of Materials (SBOM) collection for container images. This will help to avoid intermittent failures during the collection process.
  • Add startup timestamp to the Agent metadata payload.
  • Agents are now built with Go 1.21.9.
  • Adds image repo digest string to the container payload when present
  • CWS: Add selftests report on Windows and platforms with no eBPF support.
  • CWS: Add visibility for cross container program executions on platforms with no eBPF support.
  • APM: Enable credit card obfuscation by default. There is a small chance that numbers that are similar to valid credit cards may be redacted, this feature can be disabled by using apm_config.obfuscation.credit_cards.enabled. Alternatively, it can be made more accurate through luhn checksum verification by using apm_config.obfuscation.credit_cards.luhn, however, this increases the performance penalty of this check.
  • logs_config.expected_tags_duration now works for journald logs.
  • [oracle] Adds oracle.can_query service check.
  • [oracle] Automatically fall back to deprecated Oracle integration mode if privileges are missing.
  • [oracle] Add service configuration parameter.
  • The connections check no longer relies on the process/container check as it can now fetch container data independently.
  • The performance of Remote Config has been significantly improved when large amounts of configurations are received.
  • Send ECS task lifecycle events in the container lifecycle check.
  • dbm: add new SQL obfuscation mode normalize_only to support normalizing SQL without obfuscating it. This mode is useful for customers who want to view unobfuscated SQL statements. By default, ObfuscationMode is set to obfuscate_and_normalize and every SQL statement is obfuscated and normalized.
  • USM: Handle the HTTP TRACE method.
  • Deprecation Notes:
  • [oracle] Deprecating Oracle integration code. The functionality is fully implemented in the oracle-dbm check which is now renamed to oracle.
  • Bug Fixes:
  • The windows_registry check can be run with the check sub-command.
  • CWS: Fix very rare event corruption.
  • Fixes issue where processes for ECS Fargate containers would sometimes not be associated with the correct container.
  • Fixed a bug in the Dual Shipping feature where events were not being emitted on endpoint recovery.
  • Fix issue with display_container_name being tagged as N/A when container_name information is available.
  • Fix a Windows process handle leak in the Process Agent, which was introduced in 7.52.0 when process_collection is enabled.
  • Fixes a bug where the tagger server did not properly handle a closed channel.
  • [oracle] Set the default for metric_prefix in custom_queries to oracle.
  • [oracle] Fix global_custom_queries bug.
  • [oracle] Adds the oracle.process.pga_maximum_memory metric for backward compatibility.
  • Stop sending systemd metrics when they are not set

New in Datadog Agent Manager 7.52.1 (Apr 5, 2024)

  • Add a check to the Windows installer to verify that the caller has the correct membership to install the Agent.
  • Ensure the metadata requests are delayed at Agent startup to reduce host tag delays.

New in Datadog Agent Manager 7.52.0 (Mar 21, 2024)

  • Upgrade Notes:
  • To prevent misconfigurations, the Windows Datadog Agent installer now raises an error if the user account running the installer MSI is provided as the ddagentuser (DDAGENTUSER_NAME) account. If the account is a service account, such as LocalSystem or a gMSA account, no action is needed. If the account is a regular account, configure a different Datadog Agent service account.
  • New Features:
  • Add device_type to the device metadata.
  • Attach host tags to metrics for expected_tags_duration amount of time.
  • APM stats will now include, if present, the Git commit SHA from traces (or container tags) and the image tag from container tags.
  • Creation of a new packageSigning component to collect Linux package signature information and improve signature rotation process. More information can be found in DataDog documentation at 2024 linux key rotation.
  • Adds support for span links in the trace agent. This field contains a list of casual relationships between spans and is only populated when v0.4 of the Trace API is used.
  • The Windows Agent now supports CWS for process and network threats.
  • CWS: Add chdir event to allow recent container escape detection.
  • CWS: [BETA] Add File Integrity Monitoring support on Windows, supporting both files and registry.
  • CWS: The Agent now automatically suppresses benign security events if they have already been reported for a particular container image.
  • Updating process agent discovery configuration to include a Data Scrubber for obfuscating sensitive information such as passwords, API keys, or tokens.
  • Add support for pinging network devices in the SNMP integration.
  • [oracle] Add oracle.locks.transaction_duration metric.
  • APM: Add support for Single Step Instrumentation remote configuration
  • Enhancement Notes:
  • [DBM] Increase the DBM dbm-metrics-intake endpoint's defaultInputChanSize value to 500.
  • Add debug level logs when files are evicted from registry.json after their TTL expires.
  • Add the instance ID returned by the IMDSv2 metadata endpoint to the list of EC2 host aliases.
  • This change adds journald permissions to the flare in the logs_file_permissions.log file, in the form of either the journald directory or a specific file (if specified by the Agent journald configuration).
  • The Logs Agent now creates a file in the flare, called logs_file_permissions.log, which lists every file and that file's permissions that the Logs Agent can detect.
  • Add the SBOM check to the output of the Agent status command and the Agent flare.
  • Add the Software Bill of Materials (SBOM) for container images to the output of the flare command.
  • Add repo_digest to containerd ContainerImage to remove duplicate images in container images UI.
  • Agents are now built with Go 1.21.7.
  • Agents are now built with Go 1.21.8.
  • CWS: Improved coverage on platforms with no eBPF support.
  • CWS: Send context of variables in events.
  • Add DD_APM_DEBUGGER_DIAGNOSTICS_DD_URL, DD_APM_DEBUGGER_DIAGNOSTICS_API_KEY, and DD_APM_DEBUGGER_DIAGNOSTICS_ADDITIONAL_ENDPOINTS to allow sending Live Debugger / Dynamic Instrumentation diagnostic data to multiple intakes.
  • Added config that allows user to toggle on and off the collection of zombie processes in the Process Agent.
  • [oracle] Add ddagenthostname tag.
  • [oracle]: Add oracle.tablespace.maxsize metric.
  • OTLP ingest supports stable Java runtime metrics introduced in opentelemetry-java-instrumentation v2.0.0. OTLP ingest supports Kafka metrics mapping. This allows users of the JMX Receiver/JMX Metrics Gatherer and Kafka metrics receiver to have access to the OOTB Kafka Dashboard.
  • Modified the process check to populate process with the newly created field "ProcessContext"
  • Rename the kubelet_core check to kubelet and change the metrics prefix from kubernetes_core to kubernetes so that it can replace the Python kubelet check.
  • APM: Adds msgp_short_bytes reason for trace payloads dropped to distinguish them from EOF errors.
  • When getting resource tags from an ECS task with zero containers, print a warn log instead of error log.
  • Deprecation Notes
  • Removal of the pod check from the process agent. The current check will run from the core agent.
  • This release drops support for Red Hat Enterprise Linux 6 and its derivatives.
  • [oracle] Deprecate the configuration parameter instant_client. Replacing it with oracle_client.
  • Removed the system-probe configuration value data_streams_config.enabled and replaced it with service_monitoring_config.enable_kafka_monitoring. This also implies that the DsmEnabled field in the AgentConfiguration proto will consistently be set to false.
  • Bug Fixes:
  • Upgrade dependencies for systemd core check. This silences excessive warning logs on systemd v252.
  • oracle: Fix wrong tablespace metrics.
  • APM: Stop dropping incoming OTel payloads when the processing channel is full and eliminate OOM issues in the trace agent and collector component in high load scenarios, making the OTel pipeline more reliable.
  • Fix dogstatsd-capture. Message PID was not set after the 7.50 release.
  • Fix a memory exception where the flare controller tries to stat a file that doesn't exist.
  • Fleet Automation filters in the Datadog UI now accurately reflect which products are enabled when deployed with the official DataDog Helm chart on Kubernetes.
  • Corrected a problem where the ignore_autodiscovery_tags parameter was not functioning correctly with pod annotations or autodiscovery version 2 (adv2) annotations. This fix ensures that when this parameter is set to true, autodiscovery tags are ignored as intended. Example:
  • yaml ad.datadoghq.com/redis.checks: | { "redisdb": { "ignore_autodiscovery_tags": true, "instances": [ { "host": "%%host%%", "port": "6379" } ] } }
  • Moving forward, configurations that attempt to use hybrid setups—combining adv2 for check specification while also employing `adv1 for ignore_autodiscovery_tags—are no longer supported by default. Users should set the configuration parameter cluster_checks.support_hybrid_ignore_ad_tags to true to enable this behavior.
  • [oracle]: Add support for more Asian character sets.
  • Prevention of OOMs when collecting a large number of zombie processes.
  • Fixed race conditions caused by concurrent execution of etw.StartEtw() and etw.StopEtw() functions which may concurrently access and modify a global map.
  • Fix recent PR #22664 which in turn fixes a race condition in the ETW package. The previous PR introduced a minor error addressed in this PR.
  • [oracle] Add resource_manager configuration to conf.yaml.example.
  • [oracle] Fix multi-tagging bug.
  • Fixes a bug in OTLP ingest where empty histograms were not being sent to the backend in the distributions mode. Empty histograms are now mapped as if they had a single (min, max) bucket.
  • Scrub authentication bearer token of any size, even invalid, from integration configuration (when being printed through the checksconfig CLI command or other).
  • Empty UDS payloads no longer cause the DogStatsD server to close the socket.
  • Other Notes:
  • The version of Python required for tooling in README matches that which the CI uses.
  • Datadog Cluster Agent:
  • New Features:
  • Add agent sidecar injection webhook in cluster-agent Kubernetes admission controller. This new webhook adds the Agent as sidecar container in applicative Pods when it is required by the environment. For example with the EKS Fargate environment.
  • Enhancement Notes:
  • Introduces a new config option in the Cluster Agent to set the rebalance period when advanced dispatching is enabled: cluster_checks.rebalance_period. The default value is 10 min.
  • Bug Fixes:
  • Fix an issue where the admission controller would remove the field restartPolicy from native sidecar containers, preventing pod creation on Kubernetes 1.29+.
  • Fix missing kube_api_version tag on HPA and VPA resources.

New in Datadog Agent Manager 7.50.2 (Jan 11, 2024)

  • Enhancement Notes:
  • Agents are now built with Go 1.20.12.
  • Bug Fixes:
  • The CWS configuration parameter to enable anomaly detection is now working and taken into account by the Agent.
  • Fix issue introduced in 7.47 that allowed all users to start/stop the Windows Datadog Agent services. The Windows installer now, as in versions before 7.47, grants this permission explicitly to ddagentuser.

New in Datadog Agent Manager 7.50.1 (Dec 21, 2023)

  • Fixes a bug introduced in 7.50.0 preventing DD_TAGS to be added to kubernetes_state.* metrics.

New in Datadog Agent Manager 7.50.0 (Dec 19, 2023)

  • New Features:
  • The orchestrator check is moving from the Process Agent to the Node Agent. In the next release, this new check will replace the current pod check in the Process Agent. You can start using this new check now by manually setting the environment variable DD_ORCHESTRATOR_EXPLORER_RUN_ON_NODE_AGENT to true.
  • Adds the following CPU manager metrics to the kubelet core check: kubernetes_core.kubelet.cpu_manager.pinning_errors_total, kubernetes_core.kubelet.cpu_manager.pinning_requests_total.
  • Add a diagnosis for connecting to the agent logs endpoints. This is accessible through the agent diagnose command.
  • Add FIPS mode support for Network Device Monitoring products
  • Added support for collecting Cloud Foundry container names without the Cluster Agent.
  • The Kubernetes State Metrics Core check now collects kubernetes_state.ingress.tls.
  • APM: Added a new endpoint tracer_flare/v1/. This endpoint acts as a proxy to forward HTTP POST request from tracers to the serverless_flare endpoint, allowing tracer flares to be triggered via remote config, improving the support experience by automating the collection of logs.
  • CWS: Ability to send a signal to a process when a rule was triggered. CWS: Add Kubernetes user session context to events, in particular the username, UID and groups of the user that ran the commands remotely.
  • A new rule post action - 'kill' - can now be used to send a specific signal to a process that caused a rule to be triggered. By default, this signal is SIGTERM.
  • ` - id: my_rule expression: ... actions: - kill: signal: SIGUSR1`
  • Enable container image collection by default.
  • Enable container lifecycle events collection by default. This feature helps stopped containers to be cleaned from Datadog faster.
  • [netflow] Allow collecting configurable fields for Netflow V9/IPFIX
  • Add support for Oracle 12.1 and Oracle 11.
  • Add monitoring of Oracle ASM disk groups.
  • Add metrics for monitoring Oracle resource manager.
  • [corechecks/snmp] Load downloaded profiles
  • DBM: Add configuration option to SQL obfuscator to use go-sqllexer package to run SQL obfuscation and normalization
  • Support filtering metrics from endpoint and service checks based on namespace when the DD_CONTAINER_EXCLUDE_METRICS environment variable is set.
  • The Windows Event Log tailer saves its current position in an event log and resumes reading from that location when the Agent restarts. This allows the Agent to collect events created before the Agent starts.
  • Enhancement Notes:
  • [corechecks/snmp] Support symbol modifiers for global metric tags and metadata tags.
  • Update the go-systemd package to the latest version (22.5.0).
  • Added default peer tags for APM stats aggregation which can be enabled through a new flag (peer_tags_aggregation).
  • Add a stop timeout to the Windows Agent services. If an Agent service does not cleanly stop within 15 seconds after receiving a stop command from the Service Control Manager, the service will hard stop. The timeout can be configured by setting the DD_WINDOWS_SERVICE_STOP_TIMEOUT_SECONDS environment variable. Agent stop timeouts are logged to the Windows Event Log and can be monitored and alerted on.
  • APM: OTLP: Add support for custom container tags via resource attributes prefixed by datadog.container.tag.*.
  • Agents are now built with Go 1.20.11.
  • CWS: Support for Ubuntu 23.10. CWS: Reduce memory usage of ring buffer on machines with more than 64 CPU cores. CSPM: Move away from libapt to run Debian packages compliance checks.
  • DBM: Bump the minimum version of the go-sqllexer library to 0.0.7 to support collecting stored procedure names.
  • Add subcommand diagnose show-metadata gohai for gohai data
  • Upgraded JMXFetch to 0.49.0 which adds some more telemetry and contains some small fixes.
  • Netflow now supports the datadog-agent status command, providing configuration information. Any configuration errors encountered will be listed.
  • Emit database_instance tag with the value host/cdb. The goal is to show each database separately in the DBM entry page. Currently, the backend initializes database_instance to host. Also, the Agent will emit the new db_server tag because we have to initialize the host tag to host/cdb.
  • Improve obfuscator formatting. Prevent spaces after parentheses. Prevent spaces before # when # is a part of an identifier.
  • Emit query metrics with zero executions to capture long runners spanning over several sampling periods.
  • Impose a time limit on query metrics processing. After exceeding the default limit of 20s, the Agent stops emitting execution plans and fqt events.
  • Add oracle.inactive_seconds metric. Add tags with session attributes to oracle.process_pga* metrics.
  • Stop override peer.service with other attributes in OTel spans.
  • Process-Agent: Improved parsing performance of the '/proc/pid/stat' file (Linux only)
  • [snmp_listener] Enable collect_topology by default.
  • dbm: add SQL obfuscation options to give customer more control over how SQL is obfuscated and normalized.
  • RemoveSpaceBetweenParentheses - remove spaces between parentheses. This option is only valid when ObfuscationMode is obfuscate_and_normalize.
  • KeepNull` - disable obfuscating null values with ?. This option is only valid whenObfuscationModeis "obfuscate_only" orobfuscate_and_normalize``.
  • KeepBoolean - disable obfuscating boolean values with ?. This option is only valid when ObfuscationMode is obfuscate_only or obfuscate_and_normalize.
  • KeepPositionalParameter - disable obfuscating positional parameters with ?. This option is only valid when ObfuscationMode is obfuscate_only or obfuscate_and_normalize.
  • Add logic to support multiple tags created by a single label/annotaion. For example, add the following config to extract tags for chart_name and app_chart_name. podLabelsAsTags: chart_name: chart_name, app_chart_name Note: the format must be a comma-separated list of tags.
  • The logs collection pipeline has been through a refactor to support processing only the message content (instead of the whole raw message) in the journald and Windows events tailers. This feature is experimental and off by default since it changes how existing log_processing_rules behaves with journald and Windows events tailer. Note that it will be switched on by default in a future release of the Agent. A warning notifying about this is shown when the journald and Windows events tailers are used with some log_processing_rules.
  • The Datadog agent container image is now using Ubuntu 23.10 mantic as the base image.
  • The win32_event_log check now continuously collects and reports events instead of waiting for min_collection_interval to collect. min_collection_interval now controls how frequently the check attempts to reconnect when the event subscription is in an error state.

New in Datadog Agent Manager 7.49.1 (Nov 16, 2023)

  • CWS: add arch field into agent context included in CWS events.
  • APM: Fix a deadlock issue which can prevent the trace-agent from shutting down.
  • CWS: Fix the broken lineage check for process activity in CWS.
  • APM: fix a regression in the Trace Agent that caused container tagging with UDS and cgroup v2 to fail.

New in Datadog Agent Manager 7.49.0 (Nov 2, 2023)

  • New Features:
  • Add --use-unconnected-udp-socket flag to agent snmp walk command.
  • Add support for image pull metrics in the containerd check.
  • Add kubelet stats.summary check (kubernetes_core.kubelet.*) to the Agent's core checks to replace the old kubernetes.kubelet check generated from Python.
  • APM: [BETA] Adds peer_tags configuration to allow for more tags in APM stats that can add granularity and clarity to a peer.service. To set this config, use DD_APM_PEER_TAGs='["aws.s3.bucket", "db.instance", ...] or apm_config.peer_tags: ["aws.s3.bucket", "db.instance", ...] in datadog.yaml. Please note that DD_APM_PEER_SERVICE_AGGREGATION or apm_config.peer_service_aggregation must also be set to true.
  • Add a check to collect Windows registry values.
  • Introduces new Windows crash detection check. Upon initial check run, sends a DataDog event if it is determined that the machine has rebooted due to a system crash.
  • Install the Aerospike integration on ARM platforms for Python 3
  • CWS: Detect patterns in processes and files paths to improve accuracy of anomaly detections.
  • Add Dynamic Instrumentation diagnostics proxy endpoint to the trace-agent http server.
  • At present, diagnostics are forwarded through the debugger endpoint on the trace-agent server to logs. Since Dynamic Instrumentation also allows adding dynamic metrics and dynamic spans, we want to remove the dependency on logs for diagnostics - the new endpoint uploads diagnostic messages on a dedicated track.
  • Adds a configurable jmxfetch telemetry check that collects additional data on the running jmxfetch JVM in addition to data about the JVMs jmxfetch is monitoring. The check can be configured by enabling the jmx_telemetry_enabled option in the Agent.
  • [NDM] Collect diagnoses from SNMP devices.
  • Adding support for Oracle 12.2.
  • Add support for Oracle 18c.
  • CWS now computes hashes for all the files involved in the generation of a Security Profile and an Anomaly Detection Event
  • [Beta] Cluster agent supports APM Single Step Instrumentation for Kubernetes. Can be enabled in Kubernetes cluster by setting `DD_APM_INSTRUMENTATION_ENABLED=true. Single Step Instrumentation can be turned on in specific namespaces using environment variable DD_APM_INSTRUMENTATION_ENABLED_NAMESPACES. Single Step Instrumentation can be turned off in specific namespaces using environment variable DD_APM_INSTRUMENTATION_DISABLED_NAMESPACES.
  • Enhancement Notes:
  • Moving the Orchestrator Explorer pod check from the process agent to the core agent. In the following release we will be removing the process agent check and defaulting to the core agent check. If you want to migrate ahead of time you can set orchestrator_explorer.run_on_node_agent = true in your configuration.
  • Add new GPU metrics in the KSM Core check:
  • kubernetes_state.node.gpu_capacity tagged by node, resource, unit and mig_profile.
  • kubernetes_state.node.gpu_allocatable tagged by node, resource, unit and mig_profile.
  • kubernetes_state.container.gpu_limit tagged by kube_namespace, pod_name, kube_container_name, node, resource, unit and mig_profile.
  • Tag container entity with image_id tag.
  • max_message_size_bytes can now be configured in logs_config. This allows the default message content limit of 256,000 bytes to be increased up to 1MB. If a log line is larger than this byte limit, the overflow bytes will be truncated.
  • APM: Add regex support for filtering tags by apm_config.filter_tags_regex or environment variables DD_APM_FILTER_TAGS_REGEX_REQUIRE and DD_APM_FILTER_TAGS_REGEX_REJECT.
  • Agents are now built with Go 1.20.10.
  • CWS: Support fentry/fexit eBPF probes which provide lower overhead than kprobe/kretprobes (currently disabled by default and supported only on Linux kernel 5.10 and later).
  • CWS: Improved username resolution in containers and handle their creation and deletion at runtime.
  • CWS: Apply policy rules on processes already present at startup.
  • CWS: Reduce memory usage of BTF symbols.
  • Remote Configuration for Cloud Workload Security detection rules is enabled if Remote Configuration is globally enabled for the Datadog Agent. Remote Configuration for Cloud Workload Security can be disabled while Remote Configuration is globally enabled by setting the runtime_security_config.remote_configuration.enabled value to false. Remote Configuration for Cloud Workload Security cannot be enabled if Remote Configuration is not globally enabled.
  • Add gce-container-declaration to default GCE excluded host tags. See exclude_gce_tags configuration settings for more.
  • Add metrics for the workloadmeta extractor to process-agent status output.
  • Add a heartbeat mechanism for SBOM collection to avoid having to send the whole SBOM if it has not changed since the last computation. The default interval for the host SBOM has changed from 24 hours to 1 hour.
  • Prefix every entry in the log file with details about the database server and port to distinguish log entries originating from different databases.
  • JMXFetch internal telemetry is now included in the agent status output when the verbose flag is included in the request.
  • Sensitive information is now scrubbed from pod annotations.
  • The image_id tag no longer includes the docker-pullable:// prefix when using Kubernetes with Docker as runtime.
  • Improve SQL text collection for self-managed installations. The Agent selects text from V$SQL instead of V$SQLSTATS. If it isn't possible to query the text, the Agent tries to identify the context, such as parsing or closing cursor, and put it in the SQL text.
  • Improve the Oracle check example configuration file.
  • Collect Oracle execution plans by default.
  • Add global custom queries to Oracle checks.
  • Add connection refused handling.
  • Add the hosting-type tag, which can have one of the following values: self-managed, RDS, or OCI.
  • Add a hidden parameter to log unobfuscated execution plan information.
  • Adding real_hostname tag.
  • Add sql_id and plan_hash_value to obfuscation error message.
  • Add Oracle pga_over_allocation_count_metric.
  • Add information about missing privileges with the link to the grant commands.
  • Add TCPS configuration to conf.yaml.example.
  • The container check reports two new metrics:
  • container.memory.page_faults
  • container.memory.major_page_faults
  • to report the page fault counters per container.
  • prometheus_scrape: Adds support for multiple OpenMetrics V2 features in the prometheus_scrape.checks[].configurations[] items:
  • exclude_metrics_by_labels
  • raw_line_filters
  • cache_shared_labels
  • use_process_start_time
  • hostname_label
  • hostname_format
  • telemetry
  • ignore_connection_errors
  • request_size
  • log_requests
  • persist_connections
  • allow_redirects
  • auth_token
  • For a description of each option, refer to the sample configuration in https://github.com/DataDog/integrations-core/blob/master/openmetrics/datadog_checks/openmetrics/data/conf.yaml.example.
  • Improved the SBOM check function to now communicate the status of scans and any potential errors directly to DataDog for more streamlined error management and resolution.
  • Separate init-containers from containers in the KubernetesPod structure of workloadmeta.
  • Improve marshalling performance in the system-probe -> process-agent path. This improves memory footprint when NPM and/or USM are enabled.
  • Raise the default logs_config.open_files_limit to 500 on Windows.
  • Deprecation Notes:
  • service_monitoring_config.enable_go_tls_support is deprecated and replaced by service_monitoring_config.tls.go.enabled. network_config.enable_https_monitoring is deprecated and replaced by service_monitoring_config.tls.native.enabled.
  • Security Notes:
  • APM: The Agent now obfuscates the entire Memcached command by default. You can revert to the previous behavior where only the values were obfuscated by setting DD_APM_OBFUSCATION_MEMCACHED_KEEP_COMMAND=true or apm_config.obfuscation.memcached.keep_command: true in datadog.yaml.
  • Fix CVE-2023-39325
  • Bump golang.org/x/net to v0.17.0 to fix CVE-2023-44487.
  • Bug Fixes:
  • Fix Agent Flare not including Trace Agent's expvar output.
  • Fixes a panic that occurs when the Trace Agent receives an OTLP payload during shutdown
  • Fixes a crash upon receiving an OTLP Exponential Histogram with no buckets.
  • CWS: Scope network context to DNS events only as it may not be available to all events.
  • CWS: Fix a bug

New in Datadog Agent Manager 7.48.1 (Oct 17, 2023)

  • Upgrade Notes:
  • Upgraded Python 3.9 to Python 3.9.18
  • Security Notes:
  • Bump embedded curl version to 8.4.0 to fix CVE-2023-38545 and CVE-2023-38546
  • Updated the version of OpenSSL used by Python on Windows to 1.1.1w; addressed CVE-2023-4807, CVE-2023-3817, and CVE-2023-3446
  • Bug Fixes:
  • On some slow drives, when the Agent shuts down suddenly the Logs Agent registry file can become corrupt. This means that when the Agent starts again the registry file can't be read and therefore the Logs Agent reads logs from the beginning again. With this update, the Agent now attempts to update the registry file atomically to reduce the chances of a corrupted file.

New in Datadog Agent Manager 7.48.0 (Oct 10, 2023)

  • Added the kubernetes_state.pod.tolerations metric to the KSM core check
  • Grab, base64 decode, and attach trace context from message attributes passed through SNS->SQS->Lambda
  • Add kubelet healthz check (check_run.kubernetes_core.kubelet.check) to the Agent's core checks to replace the old kubernetes.kubelet.check generated from Python.
  • Tag the aws.lambda span generated by the datadog-extension with a language tag based on runtime information in dotnet and java cases
  • Extended the "agent diagnose" CLI command to allow the easy addition of new diagnostics for diverse and dispersed Agent code.
  • Add support for the otlp_config.metrics.sums.initial_cumulative_monotonic_value setting.
  • [BETA] Adds Golang language and version detection through the system probe. This beta feature can be enabled by setting system_probe_config.language_detection.enabled to true in your system-probe.yaml.
  • Add new kubelet corecheck, which will eventually replace the existing kubelet check.
  • Add custom queries to Oracle monitoring.
  • Adding new configuration setting otlp_config.logs.enabled to enable/disable logs support in the OTLP ingest endpoint.
  • Add logsagentexporter, which is used in OTLP agent to translate ingested logs and forward them to logs-agent
  • Flush in-flight requests and pending retries to disk at shutdown when disk-based buffering of metrics is enabled (for example, when forwarder_storage_max_size_in_bytes is set).
  • Added a new collector in the process agent in workloadmeta. This collector allows for collecting processes when the process_config.process_collection.enabled is false and language_detection.enabled is true. The interval at which this collector collects processes can be adjusted with the setting workloadmeta.local_process_collector.collection_interval.
  • Tag lambda cold starts and proactive initializations on the root aws.lambda span
  • APM - This change improves the acceptance and queueing strategy for trace payloads sent to the Trace Agent. These changes create a system of backpressure in the Trace Agent, causing it to reject payloads when it cannot keep up with the rate of traffic, rather than buffering and causing OOM issues.
  • This change has been shown to increase overall throughput in the Trace Agent while decreasing peak resource usage. Existing configurations for CPU and memory work at least as well, and often better, with these changes compared to previous Agent versions. This means users do not have to adjust their configuration to take advantage of these changes, and they do not experience performance degredation as a result of upgrading.

New in Datadog Agent Manager 7.47.1 (Sep 21, 2023)

  • Fixes issue with NPM driver restart failing with "File Not Found" error on Windows.
  • APM: The DD_APM_REPLACE_TAGS environment variable and apm_config.replace_tags setting now properly look for tags with numeric values.
  • Fix the issue introduced in 7.47.0 that causes the SE_DACL_AUTO_INHERITED flag to be removed from the installation drive directory when the installer fails and rolls back.

New in Datadog Agent Manager 7.47.0 (Aug 31, 2023)

  • Agent:
  • Prelude:
  • Upgrade Notes:
  • Embedded Python 3 interpreter is upgraded to 3.9.17 in both Agent 6 and Agent 7. Embedded OpenSSL is upgraded to 3.0.9 in Agent 7 on Linux and macOS. On Windows, Python 3.9 in Agent 7 is still compiled with OpenSSL 1.1.1.
  • New Features:
  • Add ability to send an Agent flare from the Datadog Application for Datadog support team troubleshooting. This feature requires enabling Remote Configuration.
  • Added workloadmeta remote process collector to collect process metadata from the Process-Agent and store it in the core agent.
  • Added new parameter workloadmeta.remote_process_collector.enabled to enable the workloadmeta remote process collector.
  • Added a new tag collector to datadog.agent.workloadmeta_remote_client_errors.
  • APM: Added support for obfuscating all Redis command arguments. For any Redis command, all arguments will be replaced by a single "?". Configurable using config variable apm_config.obfuscation.redis.remove_all_args and environment variable DD_APM_OBFUSCATION_REDIS_REMOVE_ALL_ARGS. Both accept a boolean value with default value false.
  • Added an experimental setting process_config.language_detection.enabled. This enables detecting languages for processes. This feature is WIP.
  • Added an experimental gRPC server to process-agent in order to expose process entities with their detected language. This feature is WIP and controlled through the process_config.language_detection.enabled setting.
  • The Agent now sends its configuration to Datadog by default to be displayed in the Agent Configuration section of the host detail panel. See https://docs.datadoghq.com/infrastructure/list/#agent-configuration for more information. The Agent configuration is scrubbed of any sensitive information and only contains configuration you’ve set using the configuration file or environment variables. To disable this feature set inventories_configuration_enabled to false.
  • The Windows installer can now send a report to Datadog in case of installation failure.
  • The Windows installer can now send APM telemetry.
  • Add support for Oracle Autonomous Database (Oracle Cloud Infrastructure).
  • Add shared memory (a.k.a. system global area - SGA) metric for Oracle databases: oracle.shared_memory.size
  • With this release, remote_config.enabled is set to true by default in the Agent configuration file. This causes the Agent to request configuration updates from the Datadog site.
  • To receive configurations from Datadog, you still need to enable Remote Configuration at the organization level and enable Remote Configuration capability on your API Key from the Datadog application. If you don't want the Agent to request configurations from Datadog, set remote_config.enabled to false in the Agent configuration file.
  • DD_SERVICE_MAPPING can be used to rename Serverless inferred spans' service names.
  • Adds a new agent command stream-event-platform to stream the event platform payloads being generated by the agent. This will help diagnose issues with payload generation, and should ease validation of payload changes.
  • Enhancement Notes:
  • Add two new initContainer metrics to the Kubernetes State Core check: kubernetes_state.initcontainer.waiting and kubernetes_state.initcontainer.restarts.
  • Add the following sysmetrics to improve DBA/SRE/SE perspective:
  • avg_synchronous_single_block_read_latency,
  • active_background_on_cpu, active_background,
  • branch_node_splits, consistent_read_changes,
  • consistent_read_gets, active_sessions_on_cpu, os_load,
  • database_cpu_time_ratio, db_block_changes, db_block_gets,
  • dbwr_checkpoints, enqueue_deadlocks, execute_without_parse,
  • gc_current_block_received, gc_average_cr_get_time,
  • gc_average_current_get_time, hard_parses,
  • host_cpu_utilization, leaf_nodes_splits, logical_reads,
  • network_traffic_volume, pga_cache_hit, parse_failures,
  • physical_read_bytes, physical_read_io_requests,
  • physical_read_total_io_requests, physical_reads_direct_lobs,
  • physical_read_total_bytes, physical_reads_direct,
  • physical_write_bytes, physical_write_io_requests,
  • physical_write_total_bytes, physical_write_total_io_requests,
  • physical_writes_direct_lobs, physical_writes_direct,
  • process_limit, redo_allocation_hit_ratio, redo_generated,
  • redo_writes, row_cache_hit_ratio, soft_parse_ratio,
  • total_parse_count, user_commits
  • Pause containers from the new Kubernetes community registry (registry.k8s.io/pause) are now excluded by default for containers and metrics collection.
  • [corechecks/snmp] Add forced type rate as an alternative to counter.
  • [corechecks/snmp] Add symbol level metric_type for table metrics.
  • Adds support for including the span.kind tag in APM stats aggregations.
  • Allow ad_identifiers to be used in file based logs integration configs in order to collect logs from disk.
  • Agents are now built with Go 1.20.5
  • Agents are now built with Go 1.20.6. This version of Golang fixes CVE-2023-29406.
  • Improve error handling in External Metrics query logic by running queries with errors individually with retry and backoff, and batching only queries without errors.
  • CPU metadata is now collected without running the sysctl binary on Darwin.
  • Memory metadata is now collected without running the sysctl binary on Darwin.
  • Always send the swap size value in metadata as an integer in kilobytes.
  • Platform metadata is now collected without running the uname binary on Linux and Darwin.
  • Add new metrics for resource aggregation to the Kubernetes State Core check:
  • kubernetes_state.node.<cpu|memory>_capacity.total
  • kubernetes_state.node.<cpu|memory>_allocatable.total
  • kubernetes_state.container.<cpu|memory>_requested.total
  • kubernetes_state.container.<cpu|memory>_limit.total
  • The kube node name is now reported a host tag kube_node
  • [pkg/netflow] Collect flow_process_nf_errors_count metric from goflow2.
  • APM: Bind apm_config.obfuscation.* parameters to new obfuscation environment variables. In particular, bind:
  • apm_config.obfuscation.elasticsearch.enabled to DD_APM_OBFUSCATION_ELASTICSEARCH_ENABLED: It accepts a boolean value with default value false.
  • apm_config.obfuscation.elasticsearch.keep_values to DD_APM_OBFUSCATION_ELASTICSEARCH_KEEP_VALUES It accepts a list of strings of the form ["id1", "id2"].
  • apm_config.obfuscation.elasticsearch.obfuscate_sql_values to DD_APM_OBFUSCATION_ELASTICSEARCH_OBFUSCATE_SQL_VALUES It accepts a list of strings of the form ["key1", "key2"].
  • apm_config.obfuscation.http.remove_paths_with_digits to DD_APM_OBFUSCATION_HTTP_REMOVE_PATHS_WITH_DIGITS, It accepts a boolean value with default value false.
  • apm_config.obfuscation.http.remove_query_string to DD_APM_OBFUSCATION_HTTP_REMOVE_QUERY_STRING, It accepts a boolean value with default value false.
  • apm_config.obfuscation.memcached.enabled to DD_APM_OBFUSCATION_MEMCACHED_ENABLED: It accepts a boolean value with default value false.
  • apm_config.obfuscation.mongodb.enabled to DD_APM_OBFUSCATION_MONGODB_ENABLED: It accepts a boolean value with default value false.
  • apm_config.obfuscation.mongodb.keep_values to DD_APM_OBFUSCATION_MONGODB_KEEP_VALUES It accepts a list of strings of the form ["id1", "id2"].
  • apm_config.obfuscation.mongodb.obfuscate_sql_values to DD_APM_OBFUSCATION_MONGODB_OBFUSCATE_SQL_VALUES It accepts a list of strings of the form ["key1", "key2"].
  • apm_config.obfuscation.redis.enabled to DD_APM_OBFUSCATION_REDIS_ENABLED: It accepts a boolean value with default value false.
  • apm_config.obfuscation.remove_stack_traces to DD_APM_OBFUSCATION_REMOVE_STACK_TRACES: It accepts a boolean value with default value false.
  • apm_config.obfuscation.sql_exec_plan.enabled to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_ENABLED: It accepts a boolean value with default value false.
  • apm_config.obfuscation.sql_exec_plan.keep_values to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_KEEP_VALUES It accepts a list of strings of the form ["id1", "id2"].
  • apm_config.obfuscation.sql_exec_plan.obfuscate_sql_values to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_OBFUSCATE_SQL_VALUES It accepts a list of strings of the form ["key1", "key2"].
  • apm_config.obfuscation.sql_exec_plan_normalize.enabled to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_ENABLED: It accepts a boolean value with default value false.
  • apm_config.obfuscation.sql_exec_plan_normalize.keep_values to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_KEEP_VALUES It accepts a list of strings of the form ["id1", "id2"].
  • apm_config.obfuscation.sql_exec_plan_normalize.obfuscate_sql_values to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_OBFUSCATE_SQL_VALUES It accepts a ...

New in Datadog Agent Manager 7.45.1 (Jun 27, 2023)

  • Security Notes:
  • Bump ncurses to 6.4 in the Agent embedded environment. Fixes CVE-2023-29491.
  • Updated the version of OpenSSL used by Python to 1.1.1u; addressed CVE-2023-2650, CVE-2023-0466, CVE-2023-0465 and CVE-2023-0464.

New in Datadog Agent Manager 7.45.0 (Jun 6, 2023)

  • New Features:
  • Add Topology data collection with CDP.
  • APM: Addition of configuration to add peer.service to trace stats exported by the Agent.
  • APM: Addition of configuration to compute trace stats on spans based on their span.kind value.
  • APM: Added a new endpoint in the trace-agent API /symdb/v1/input that acts as a reverse proxy forwarding requests to Datadog. The feature using this is currently in development.
  • Add support for confluent-kafka.
  • Add support for XCCDF benchmarks in CSPM. A new configuration option, 'compliance_config.xccdf.enabled', disabled by default, has been added for enabling XCCDF benchmarks.
  • Add arguments to module load events
  • Oracle DBM monitoring with activity sampling. The collected samples form the foundation for database load profiling. With Datadog GUI, samples can be aggregated and filtered to identify bottlenecks.
  • Add reporting of container.{cpu|memory|io}.partial_stall metrics based on PSI Some values when host is running with cgroupv2 enabled (Linux only). This metric provides the wall time (in nanoseconds) during which at least one task in the container has been stalled on the given resource.
  • Adding a new option secret_backend_remove_trailing_line_break to remove trailing line breaks from secrets returned by secret_backend_command. This makes it easier to use secret management tools that automatically add a line break when exporting secrets through files.
  • Enhancement Notes:
  • Cluster Agent: User config, cluster Agent deployment and node Agent daemonset manifests are now added to the flare archive, when the Cluster Agent is deployed with Helm (version 3.23.0+).
  • Datadog Agent running as a systemd service can optionally read environment variables from a text file /etc/datadog-agent/environment containing newline-separated variable assignments. See https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Environment
  • Add ability to filter kubernetes containers based on autodiscovery annotation. Containers in a pod can now be omitted by setting ad.datadoghq.com/<container_name>.exclude as an annotation on the pod. Logs can now be ommitted by setting ad.datadoghq.com/<container_name>.logs_exclude as an annotation on the pod.
  • Added support for custom resource definitions metrics: crd.count and crd.condition.
  • Remove BadgerDB cache for Trivy.
  • Add new custom LRU cache for Trivy backed by BoltDB and parametrized by:
  • Periodically delete unused entries from the custom cache.
  • Add telemetry metrics to monitor the cache:
  • sbom.cached_keys: Number of cache keys stored in memory
  • sbom.cache_disk_size: Total size, in bytes, of the database as reported by BoltDB.
  • sbom.cached_objects_size: Total size, in bytes, of cached SBOM objects on disk. Limited by sbom.custom_cache_max_disk_size.
  • sbom.cache_hits_total: Total number of cache hits.
  • sbom.cache_misses_total: Total number of cache misses.
  • sbom.cache_evicts_total: Total number of cache evicts.
  • Added DD_ENV to the SBOMPayload in the SBOM check.
  • Added kubernetes_state.hpa.status_target_metric and kubernetes_state.deployment.replicas_ready metrics part of the kubernetes_state_core check.
  • Add support for emitting resources on metrics from tags in the format dd.internal.resource:type,name.
  • APM: Dynamic instrumentation logs and snapshots can now be shipped to multiple Datadog logs intakes.
  • Adds support for OpenTelemetry span links to the Trace Agent OTLP endpoint when converting OTLP spans (span links are added as metadata to the converted span).
  • Agents are now built with Go 1.19.9.
  • Make Podman DB path configurable for rootless environment. Now we can set $HOME/.local/share/containers/storage/libpod/bolt_state.db.
  • Add ownership information for containers to the container-lifecycle check.
  • Add Pod exit timestamp to container-lifecycle check.
  • The Agent now uses the ec2_metadata_timeout value when fetching EC2 instance tags with AWS SDK. The Agent fetches instance tags when collect_ec2_tags is set to true.
  • Upgraded JMXFetch to 0.47.8 which has improvements aimed to help large metric collections drop fewer payloads.
  • Kubernetes State Metrics Core: Adds collection of Kubernetes APIServices metrics
  • Add support for URLs with the http|https scheme in the dd_url or logs_dd_url parameters when configuring endpoints. Also automatically detects SSL needs, based on the scheme when it is present.
  • [pkg/netflow] Add NetFlow Exporter to NDM Metadata.
  • SUSE RPMs are now built with RPM 4.14.3 and have SHA256 digest headers.
  • observability_pipelines_worker can now be used in place of the vector config options.
  • Add an option and an annotation to skip kube_service tags on Kubernetes pods.
  • When the selector of a service matches a pod and that pod is ready, its metrics are decorated with a kube_service tag.
  • When the readiness of a pod flips, so does the kube_service tag. This could create visual artifacts (spikes when the tag flips) on dashboards where the queries are missing .fill(null).
  • If many services target a pod, the total number of tags attached to its metrics might exceed a limit that causes the whole metric to be discarded.
  • In order to mitigate these two issues, it’s now possible to set the kubernetes_ad_tags_disabled parameter to kube_config to globally remove the kube_service tags on all pods:: kubernetes_ad_tags_disabled
  • kube_service
  • It’s also possible to add a tags.datadoghq.com/disable: kube_service annotation on only the pods for which we want to remove the kube_service tag.
  • Note that kube_service is the only tag that can be removed via this parameter and this annotation.
  • Support OTel semconv 1.17.0 in OTLP ingest endpoint.
  • When otlp_config.metrics.histograms.send_aggregation_metrics is set to true, the OTLP ingest pipeline will now send min and max metrics for delta OTLP Histograms and OTLP Exponential Histograms when available, in addition to count and sum metrics.
  • The deprecated option otlp_config.metrics.histograms.send_count_sum_metrics now also sends min and max metrics when available.
  • OTLP: Use minimum and maximum values from cumulative OTLP Histograms. Values are used only when we can assume they are from the last time window or otherwise to clamp estimates.
  • The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.75.0.
  • Secrets with ENC[] notation are now supported for proxy setting from environment variables. For more information you can refer to our [Secrets Management](https://docs.datadoghq.com/agent/guide/secrets-management/) and [Agent Proxy Configuration](https://docs.datadoghq.com/agent/proxy/) documentations.
  • [corechecks/snmp] Adds ability to send constant metrics in SNMP profiles.
  • [corechecks/snmp] Adds ability to map metric tag value to string in SNMP profiles.
  • [corechecks/snmp] Add support to format bytes into ip_address
  • Deprecation Notes:
  • APM OTLP: Field UsePreviewHostnameLogic is deprecated, and usage of this field has been removed. This is done in preparation to graduate the exporter.datadog.hostname.preview feature gate to stable.
  • The Windows Installer NPM feature option, used in ADDLOCAL=NPM and REMOVE=NPM, no longer controls the install state of NPM components. The NPM components are now always installed, but will only run when enabled in the agent configuration. The Windows Installer NPM feature option still exists for backwards compatability purposes, but has no effect.
  • Deprecate otlp_config.metrics.histograms.send_count_sum_metrics in favor of otlp_config.metrics.histograms.send_aggregation_metrics.
  • Removed the --info flag in the Process Agent, which has been replaced by the status command since 7.35.
  • Security Notes:
  • Handle the return value of Close() for writable files in pkg/forwarder
  • Fixes cwe 703. Handle the return value of Close() for writable files and forces writes to disks in system-probe
  • Bug Fixes:
  • APM: Setting apm_config.receiver_port: 0 now allows enabling UNIX Socket or Windows Pipes listeners.
  • APM: OTLP: Ensure that container tags are set globally on the payload so that they can be picked up as primary tags in the app.
  • APM: Fixes a bug with how stats are calculated when using single span sampling along with other sampling configurations.
  • APM: Fixed the issue where not all trace stats are flushed on trace-agent shutdown.
  • Fix an issue on the pod collection where the cluster name would not be consistently RFC1123 compliant.
  • Make the agent able to detect it is runn...

New in Datadog Agent Manager 7.44.0 (Apr 27, 2023)

  • New Features:
  • Added HTTP/2 parsing logic to Universal Service Monitoring.
  • Adding Universal Service Monitoring to the Agent status check. Now Datadog has visibility into the status of Universal Service Monitoring. Startup failures appear in the status check.
  • In the agent.log, a DEBUG, WARN, and ERROR log have been added to report how many file handles the core Agent process has open. The DEBUG log reports the info, the WARN log appears when the core Agent is over 90% of the OS file limit, and the ERROR log appears when the core Agent has reached 100% of the OS file limit. In the Agent status command, fields CoreAgentProcessOpenFiles and OSFileLimit have been added to the Logs Agent section. This feature is currently for Linux only.
  • APM: Collect trace agent startup errors and successes using instrumentation-telemetry "apm-onboarding-event" messages.
  • APM OTLP: Introduce OTLP Ingest probabilistic sampling, configurable via otlp_config.traces.probabilistic_sampler.sampling_percentage.
  • Experimental: The Datadog Admission Controller can inject the .NET APM library into Kubernetes containers for auto-instrumentation.
  • Enable CWS Security Profiles by default.
  • Support the config additional_endpoints for Data Streams monitoring.
  • Added support for collecting container image metadata when using Docker.
  • Added Kafka parsing logic to system-probe
  • Allow writing SECL rules against container creation time through the new container.created_at field, similar to the existing process.container_at field. The container creation time is also reported in the sent events.
  • [experimental] CWS generates an SBOM for any running workload on the machine.
  • [experimental] CWS events are enriched with SBOM data.
  • [experimental] CWS activity dumps are enriched with SBOM data.
  • Enable OTLP endpoint for receiving traces in the Datadog Lambda Extension.
  • On Windows, when service inference is enabled, process_context tags can now be populated by the service name in the SCM. This feature can be controlled by either the service_monitoring_config.process_service_inference.enabled config setting in the user's datadog.yaml config file, or it can be configured via the DD_SYSTEM_PROBE_PROCESS_SERVICE_INFERENCE_USE_WINDOWS_SERVICE_NAME environment variable. This setting is enabled by default.
  • Enhancement Notes:
  • Added kubernetes_state.hpa.status_target_metric and kubernetes_state.deployment.replicas_ready metrics part of the kubernetes_state_core check.
  • The status page now includes a Status render errors section to highlight errors that occurred while rendering it.
  • APM:Run the /debug/* endpoints in a separate server which uses port 5012 by default and only listens on 127.0.0.1. The port is configurable through apm_config.debug.port and DD_APM_DEBUG_PORT, set it to 0 to disable the server.
  • Scrub the content served by the expvar endpoint.
  • APM: apm_config.features is now configurable from the Agent configuration file. It was previously only configurable via DD_APM_FEATURES.
  • Agents are now built with Go 1.19.7.
  • The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.71.0.
  • Collect Kubernetes Pod conditions.
  • Added the "availability-zone" tag to the Fargate integration. This matches the tag emitted by other AWS infrastructure integrations.
  • Allow to report all gathered data in case of partial failure of container metrics retrieval.
  • Upgraded JMXFetch to 0.47.8 which has improvements aimed to help large metric collections drop fewer payloads.
  • JMXFetch upgraded to 0.47.5 which now supports pulling metrics from javax.management.openmbean.TabularDataSupport. Also contains a fix for pulling metrics from javax.management.openmbean.TabularDataSupport when no tags are specified.
  • Updated chunking util and use cases to use generics. No behavior change.
  • [corechecks/snmp] Add interface_configs to override interface speed.
  • No longer increments TCP retransmit count when the retransmit fails.
  • The OTLP ingestion endpoint now supports the same settings and protocols as the OpenTelemetry Collector OTLP receiver v0.70.0.
  • Changes the retry mechanism of starting workloadmeta collectors so that instead of retrying every 30 seconds, it retries following an exponential backoff with initial interval of 1s and max of 30s. In general, this should help start sooner the collectors that failed on the first try.
  • Added the "pull_duration" metric in the workloadmeta telemetry. It measures the time that it takes to pull from the collectors.
  • Deprecation Notes:
  • Marked the "availability_zone" tag as deprecated for the Fargate integration, in favor of "availability-zone".
  • Configuration enable_sketch_stream_payload_serialization is now deprecated.
  • Security Notes:
  • The Agent now checks containerd containers Spec size before parsing it. Any Spec exceeding 2MB will not be parsed and a warning will be emitted. This impacts the container_env_as_tags feature and %%hostname%% variable resolution for environments based on containerd outside of Kubernetes.
  • Bug Fixes:
  • APM: Fix issue where dogstatsd proxy would not work when bind address was set to localhost on MacOS. APM: Fix issue where setting bind_host to "::1" would break runtime metrics for the trace-agent.
  • APM: Trace Agent not printing critical init errors.
  • Fixes a bug where ignored container files (that were not tailed) were incorrectly counted against the total open files.
  • Fixes the configuration parsing of the "container_lifecycle" check. Custom config values were not being applied.
  • Corrects dogstatsd metric message validation to support all current (and some future) dogstatsd features
  • Avoid panic in kubernetes_state_core check with specific Ingress objects configuration.
  • Fixes a divide-by-zero panic when sketch serialization fails on the last metric of a given batch
  • Fix issue introduced in 7.43 that prevents the Datadog Agent Manager application from executing from the checkbox at the end of the Datadog Agent installation when the installer is run by a non-elevated administrator user.
  • Fixes a problem with USM and IIS on Windows Server 2022 due to a change in the way Microsoft reports IIS connections.
  • Fixes the labelsAsTags parameter of the kube-state metrics core check. Tags were not properly formatted when they came from a label on one resource type (for example, namespace) and turned into a tag on another resource type (for example, pod).
  • The OTLP ingest endpoint does not report the first cumulative monotonic sum value if the start timestamp of the timeseries matches its timestamp.
  • Prevent disallowlisting on empty command line for processes in the Process Agent when encountering a failure to parse, use exe value instead.
  • Make SNMP Listener support all authProtocol.
  • Fix an issue where agent status would show incorrect system-probe status for 15 seconds as the system-probe started up.
  • Fix partial loss of NAT info in system-probe for pre-existing connections.
  • Replace ; with & in the URL to open GUI to follow golang.org/issue/25192.
  • Workloadmeta now avoids concurrent pulls from the same collector. This bug could lead to incorrect or missing data when the collectors were too slow pulling data.
  • Fixes a bug that prevents the containerd workloadmeta collector from starting sometimes when container_image_collection.metadata.enabled is set to true.
  • Fixed a bug in the SBOM collection feature. In certain cases, some SBOMs were not collected.
  • Other Notes:
  • Datadog Cluster Agent:
  • New Features:
  • Add conditions to Vertical Pod Autoscalers
  • Experimental - Support Ruby library injection through the Admission Controller on Kubernetes.
  • Enhancement Notes:
  • Add new metrics for the KSM Core check for extended resources:
  • Pod requests and limits of the network bandwidth extended resource: kubernetes_state.container.network_bandwidth_limit, kubernetes_state.container.network_bandwidth_requested
  • The capacity and allocatable network bandwidth extended resource of a node: kubernetes_state.node.network_bandwidth_allocatable, kubernetes_state.node.network_bandwidth_capacity
  • Admission Controller - Add telemetry around auto-instrumentation via remote config.
  • The UDS socket volume when using the Admission Controller is now mounted in readOnly mode.

New in Datadog Agent Manager 7.43.1 (Mar 7, 2023)

  • Enhancement Notes:
  • Agents are now built with Go 1.19.6.

New in Datadog Agent Manager 7.43.0 (Feb 23, 2023)

  • Agent:
  • New Features:
  • NDM: Add snmp.device.reachable/unreachable metrics to all monitored devices.
  • Add a new container_image long running check to collect information about container images.
  • Enable orchestrator manifest collection by default.
  • Add a new sbom core check to collect the software bill of materials of containers.
  • The Agent now leverages DMI (Desktop Management Interface) information on Unix to get the instance ID on Amazon EC2 when the metadata endpoint fails or is not accessible. The instance ID is exposed through DMI only on AWS Nitro instances. This will not change the hostname of the Agent upon upgrading, but will add it to the list of host aliases.
  • Adds the option to collect and store in workloadmeta the software bill of materials (SBOM) of containerd images using Trivy. This feature is disabled by default. It can be enabled by setting container_image_collection.sbom.enabled to true. Note: This feature is CPU and IO intensive.
  • Enhancement Notes:
  • Adds a new snmp.interface_status metric reflecting the same status as within NDM.
  • APM: Ported a faster implementation of NormalizeTag with a fast-path for already normalized ASCII tags. Should marginally improve CPU usage of the trace-agent.
  • The external metrics server now automatically adjusts the query time window based on the Datadog metrics MaxAge attribute.
  • Added parity to Unix-based permissions.log Flare file on Windows. permissions.log file list the original rights/ACL of the files copied into a Agent flare. This will ease troubleshooting permissions issues.
  • [corechecks/snmp] Add id and source_type to NDM Topology Links
  • Add an --instance-filter option to the Agent check command.
  • APM: Disable max_memory and max_cpu_percent by default in containerized environments (Docker-only, ECS and CI). Users rely on the orchestrator / container runtime to set resource limits. Note: max_memory and max_cpu_percent have been disabled by default in Kubernetes environments since Agent 7.18.0.
  • Agents are now built with Go 1.19.5.
  • To reduce "cluster-agent" memory consomption when cluster_agent.collect_kubernetes_tags option is enabled, we introduce cluster_agent.kubernetes_resources_collection.pod_annotations_exclude option to exclude Pod annotation from the extracted Pod metadata.
  • Introduce a new option enabled_rfc1123_compliant_cluster_name_tag that enforces the kube_cluster_name tag value to be an RFC1123 compliant cluster name. It can be disabled by setting this new option to false.
  • Allows profiling for the Process Agent to be dynamically enabled from the CLI with process-agent config set internal_profiling. Optionally, once profiling is enabled, block, mutex, and goroutine profiling can also be enabled with process-agent config set runtime_block_profile_rate, process-agent config set runtime_mutex_profile_fraction, and process-agent config set internal_profiling_goroutines.
  • Adds a new process discovery hint in the process agent when the regular process and container checks run.
  • Added new telemetry metrics (pymem.*) to track Python heap usage.
  • There are two default config files. Optionally, you can provide override config files. The change in this release is that for both sets, if the first config is inaccessible, the security agent startup process fails. Previously, the security agent would continue to attempt to start up even if the first config file is inaccessible. To illustrate this, in the default case, the config files are datadog.yaml and security-agent.yaml, and in that order. If datadog.yaml is inaccessible, the security agent fails immediately. If you provide overrides, like foo.yaml and bar.yaml, the security agent fails immediately if foo.yaml is inaccessible. In both sets, if any additional config files are missing, the security agent continues to attempt to start up, with a log message about an inaccessible config file. This is not a change from previous behavior.
  • [corechecks/snmp] Add IP Addresses to NDM Metadata interfaces
  • [corechecks/snmp] Add LLDP remote device IP address.
  • prometheus_scrape: Adds support for tag_by_endpoint and collect_counters_with_distributions in the prometheus_scrape.checks[].configurations[] items.
  • The OTLP ingest endpoint now supports the same settings and protocols as the OpenTelemetry Collector OTLP receiver v0.68.0.
  • Deprecation Notes:
  • The command line arguments to the Datadog Agent Manager for Windows ddtray.exe have changed from single-dash arguments to double-dash arguments. For example, -launch-gui must now be provided as --launch-gui.
  • system_probe_config.enable_go_tls_support is deprecated and replaced by service_monitoring_config.enable_go_tls_support.
  • Security Notes:
  • Some HTTP requests sent by the Datadog Agent to Datadog endpoints were including the Datadog API key in the query parameters (in the URL). This meant that the keys could potentially have been logged in various locations, for example, in a forward or a reverse proxy server logs the Agent connected to. We have updated all requests to not send the API key as a query parameter. Anyone who uses a proxy to connect the Agent to Datadog endpoints should make sure their proxy forwards all Datadog headers (patricularly DD-Api-Key). Failure to not send all Datadog headers could cause payloads to be rejected by our endpoints.
  • Bug Fixes:
  • The secret command now correctly displays the ACL on a path with spaces.
  • APM: Lower default incoming trace payload limit to 25MB. This more closely aligns with the backend limit. Some users may see traces rejected by the Agent that the Agent would have previously accepted, but would have subsequently been rejected by the trace intake. The Agent limit can still be configured via apm_config.max_payload_size.
  • APM: Fix the trace-agent -info command when remote configuration is enabled.
  • APM: Fix parsing of SQL Server identifiers enclosed in square brackets.
  • Remove files created by system-probe at uninstall time.
  • Fix the kubernetes_state_core check so that the host alias name creation uses a normalized (RFC1123 compliant) cluster name.
  • Fix an issue in Autodiscovery that could prevent Cluster Checks containing secrets (ENC[] syntax) to be unscheduled properly.
  • Fix panic due to uninitialized Obfuscator logger
  • On Windows, fixes bug in which HTTP connections were not properly accounted for when the client and server were the same host (loopback).
  • The Openmetrics check is no longer scheduled for Kubernetes headless services.
  • Other Notes:
  • Upgrade of the cgosymbolizer dependency to use github.com/ianlancetaylor/cgosymbolizer.
  • The Datadog Agent Manager ddtray.exe now requires admin to launch.
  • Datadog Cluster Agent:
  • New Features:
  • Starts the collecting of Vertical Pod Autoscalers within Kubernetes clusters.
  • Enable orchestrator manifest collection by default
  • Bug Fixes:
  • Make the cluster-agent admission controller able to inject libraries for several languages in a single pod.

New in Datadog Agent Manager 7.42.2 (Feb 16, 2023)

  • Please refer to the 7.42.2 tag on integrations-core for the list of changes on the Core Checks

New in Datadog Agent Manager 7.42.1 (Feb 2, 2023)

  • Datadog_checks_base 28.0.2
  • Snowflake 4.4.6

New in Datadog Agent Manager 7.42.0 (Jan 23, 2023)

  • Upgrade Notes:
  • Downloading and installing official checks with agent integration install is no longer supported for Agent installations that do not include an embedded python3.
  • New Features:
  • Adding the kube_api_version tag to all orchestrator resources.
  • Kubernetes Pod events generated by the kubernetes_apiserver can now benefit from the new cluster-tagger component in the Cluster-Agent.
  • APM OTLP: Added compatibility for the OpenTelemetry Collector's datadogprocessor to the OTLP Ingest.
  • The CWS agent now supports rules on mount events.
  • Adding a configuration option, exclude_ec2_tags, to exclude EC2 instance tags from being converted into host tags.
  • Adds detection for a process being executed directly from memory without the binary present on disk.
  • Introducing agent sampling rates remote configuration.
  • Adds support for secret_backend_command_sha256 SHA for the secret_backend_command executable. If secret_backend_command_sha256 is used, the following restrictions are in place:
  • Value specified in the secret_backend_command setting must be an absolute path.
  • - Permissions for the datadog.yaml config file must disallow write access by users other than ddagentuser or Administrators on Windows or the user running the Agent on Linux and macOS. The agent will refuse to start if the actual SHA256 of the secret_backend_command executable is different from the one specified by secret_backend_command_sha256. The secret_backend_command file is locked during verification of SHA256 and subsequent run of the secret backend executable.
  • Collect network devices topology metadata.
  • Add support for AWS Lambda Telemetry API
  • Adds three new metrics collected by the Lambda Extension
  • `aws.lambda.enhanced.response_latency`: Measures the elapsed time in milliseconds from when the invocation request is received to when the first byte of response is sent to the client.
  • `aws.lambda.enhanced.response_duration`: Measures the elapsed time in milliseconds between sending the first byte of the response to the client and sending the last byte of the response to the client.
  • `aws.lambda.enhancdd.produced_bytes`: Measures the number of bytes returned by a function.
  • Create cold start span representing time and duration of initialization of an AWS Lambda function.
  • Enhancement Notes:
  • Adds both the StartTime and ScheduledTime properties in the collector for Kubernetes pods.
  • Add an option (hostname_trust_uts_namespace) to force the Agent to trust the hostname value retrieved from non-root UTS namespaces (Linux only).
  • Metrics from Giant Swarm pause containers are now excluded by default.
  • Events emitted by the Helm check now have "Error" status when the release fails.
  • Add an annotations_as_tags parameter to the kubernetes_state_core check to allow attaching Kubernetes annotations as Datadog tags in a similar way that the labels_as_tags parameter does.
  • Adds the windows_counter_init_failure_limit option. This option limits the number of times a check will attempt to initialize a performance counter before ceasing attempts to initialize the counter.
  • [netflow] Expose collector metrics (from goflow) as Datadog metrics
  • [netflow] Add prometheus listener to expose goflow telemetry
  • OTLP ingest now uses the minimum and maximum fields from delta OTLP Histograms and OTLP ExponentialHistograms when available.
  • The OTLP ingest endpoint now reports the first cumulative monotonic sum value if the timeseries started after the Datadog Agent process started.
  • Added the workload-list command to the process agent. It lists the entities stored in workloadmeta.
  • Allows running secrets in the Process Agent on Windows by sandboxing secret_backend_command execution to the ddagentuser account used by the Core Agent service.
  • Add process_context tag extraction based on a process's command line arguments for service monitoring. This feature is configured in the system-probe.yaml with the following configuration: service_monitoring_config.process_service_inference.enabled.
  • Reduce the overhead of using Windows Performance Counters / PDH in checks.
  • The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.64.1
  • The OTLP ingest endpoint now supports the same settings and protocols as the OpenTelemetry Collector OTLP receiver v0.66.0.
  • Deprecation Notes:
  • Removes the install-service Windows agent command.
  • Removes the remove-service Windows agent command.
  • Security Notes:
  • Upgrade the wheel package to 0.37.1 for Python 2.
  • Upgrade the wheel package to 0.38.4 for Python 3.
  • Bug Fixes:
  • APM: Fix an issue where container tags weren't working because of overwriting an essential tag on spans.
  • APM OTLP: Fix an issue where a span's local "peer.service" attribute would not override a resource attribute-level service.
  • On Windows, fixes a bug in the NPM network driver which could cause a system crash (BSOD).
  • Create only endpoints check from prometheus scrape configuration when prometheus_scrape.service.endpoint option is enabled.
  • Fix how Kubernetes events forwarding detects the Node/Host.
  • Previously Nodes' events were not always attached to the correct host.
  • Pods' events from "custom" controllers might still be not attached to a host if the controller doesn't set the host in the source.host event's field.
  • APM: Fix SQL parsing of negative numbers and improve error message.
  • Fix a potential panic when df outputs warnings or errors among its standard output.
  • Fix a bug where a misconfig error does not show when hidepid=invisible
  • The agent no longer wrongly resolves its hostname on ECS Fargate when requests to the Fargate API timeout.
  • Metrics reported through OTLP ingest now have the interval property unset.
  • Fix a PDH query handle leak that occurred when a counter failed to add to a query.
  • Remove unused environment variables DD_AGENT_PY and DD_AGENT_PY_ENV from known environment variables in flare command.
  • APM: Fix SQL obfuscator parsing of identifiers containing dollar signs.
  • Other Notes:
  • JMXFetch upgraded to 0.47.2
  • Bump embedded Python3 to 3.8.16.
  • Datadog Cluster Agent:
  • New Features:
  • Supports the collection of custom resource definition and custom resource manifests for the orchestrator explorer.
  • Enhancement Notes:
  • Collects Unified Service Tags for the orchestrator explorer product.

New in Datadog Agent Manager 7.38.2 (Aug 10, 2022)

  • Bug Fixes:
  • Fixes a bug making the agent creating a lot of zombie (defunct) processes. This bug happened only with the docker images 7.38.x when the containerized agent was launched without hostPID: true.

New in Datadog Agent Manager 7.38.1 (Aug 3, 2022)

  • Bug Fixes:
  • Fixes CWS rules with 'process.file.name !=""' expression.

New in Datadog Agent Manager 7.38.0 (Jul 26, 2022)

  • New Features:
  • Add NetFlow feature to listen to NetFlow traffic and forward them to Datadog.
  • The CWS agent now supports filtering events depending on whether they are performed by a thread. A process is considered a thread if it's a child process that hasn't executed another program.
  • Adds a diagnose datadog-connectivity command that displays information about connectivity issues between the Agent and Datadog intake.
  • Adds support for tailing modes in the journald logs tailer.
  • The CWS agent now supports writing rules on processes termination.
  • Add support for new types of CI Visibility payloads to the Trace Agent, so features that until now were Agentless-only are available as well when using the Agent.
  • Enhancement Notes:
  • Tags configured with DD_TAGS or DD_EXTRA_TAGS in an EKS Fargate environment are now attached to OTLP metrics.
  • Add NetFlow static enrichments (TCP flags, IP Protocol, EtherType, and more).
  • Report lines matched by auto multiline detection as metrics and show on the status page.
  • Add a containerd_exclude_namespaces configuration option for the Agent to ignore containers from specific containerd namespaces.
  • The log_level of the agent is now appended to the flare archive name upon its creation.
  • The metrics reported by KSM core now include the tags "kube_app_name", "kube_app_instance", and so on, if they're related to a Kubernetes entity that has a standard label like "app.kubernetes.io/name", "app.kubernetes.io/instance", etc.
  • The Kubernetes State Metrics Core check now collects two ingress metrics: kubernetes_state.ingress.count and kubernetes_state.ingress.path.
  • Move process chunking code to util package to avoid cycle import when using it in orchestrator check.
  • APM: Add support for PostgreSQL JSON operators in the SQL obfuscate package.
  • The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.54.0 (OTLP v0.18.0).
  • The Agent now embeds Python-3.8.13, an upgrade from Python-3.8.11.
  • APM: Updated Rare Sampler default configuration values to sample traces more uniformly across environments and services.
  • The OTLP ingest endpoint now supports Exponential Histograms with delta aggregation temporality.
  • The Windows installer now supports grouped Managed Service Accounts.
  • Enable https monitoring on arm64 with kernel >= 5.5.0.
  • Add otlp_config.debug.loglevel to determine log level when the OTLP Agent receives metrics/traces for debugging use cases.
  • Deprecation Notes:
  • Deprecateotlp_config.metrics.instrumentation_library_metadata_as_tags in in favor of otlp_config.metrics.instrumentation_scope_metadata_as_tags.
  • Bug Fixes:
  • When enable_payloads.series or enable_payloads.sketches are set to false, don't log the error Cannot append a metric in a closed buffered channel.
  • Restrict permissions for the entrypoint executables of the Dockerfiles.
  • Revert docker.mem.in_use calculation to use RSS Memory instead of total memory.
  • Add missing telemetry metrics for HTTP log bytes sent.
  • Fix panic in container, containerd, and docker when container stats are temporarily not available
  • Fix prometheus check Metrics parsing by not enforcing a list of strings.
  • Fix potential deadlock when shutting down an Agent with a log TCP listener.
  • APM: Fixed trace rare sampler's oversampling behavior. With this fix, the rare sampler will sample rare traces more accurately.
  • Fix journald byte count on the status page.
  • APM: Fixes an issue where certain (#> and #>>) PostgreSQL JSON operators were being interpreted as comments and removed by the obfuscate package.
  • Scrubs HTTP Bearer tokens out of log output
  • Fixed the triggered "svType != tvType; key=containerd_namespace, st=[]interface {}, tt=[]string, sv=[], tv=[]" error when using a secret backend reader.
  • Fixed an issue that made the container check to show an error in the "agent status" output when it was working properly but there were no containers deployed.

New in Datadog Agent Manager 7.37.1 (Jun 28, 2022)

  • Bug Fixes:
  • Fixes issue where proxy config was ignored by the trace-agent.

New in Datadog Agent Manager 7.37.0 (Jun 27, 2022)

  • Upgrade Notes:
  • OTLP ingest: Support for the deprecated experimental.otlp section and the DD_OTLP_GRPC_PORT and DD_OTLP_HTTP_PORT environment variables has been removed. Use the otlp_config section or the DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT and DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT environment variables instead.
  • OTLP: Deprecated settings otlp_config.metrics.report_quantiles and otlp_config.metrics.send_monotonic_counter have been removed in favor of otlp_config.metrics.summaries.mode and otlp_config.metrics.sums.cumulative_monotonic_mode respectively.
  • New Features:
  • Adds User-level service unit filtering support for Journald log collection via include_user_units and exclude_user_units.
  • A wildcard (*) can be used in either exclude_units or exclude_user_units if only a particular type of Journald log is desired.
  • A new troubleshooting section has been added to the Agent CLI. This section will hold helpers to understand the Agent behavior. For now, the section only has two command to print the different metadata payloads sent by the Agent (v5 and inventory).
  • APM: Incoming OTLP traces are now allowed to set their own sampling priority.
  • Enable NPM NAT gateway lookup by default.
  • Partial support of IPv6 on EKS clusters
  • Fix the kubelet client when the IP of the host is IPv6.
  • Fix the substitution of %%host%% patterns inside the auto-discovery annotations: If the concerned pod has an IPv6 and the %%host%% pattern appears inside an URL context, then the IPv6 is surrounded by square brackets.
  • OTLP ingest now supports the same settings and protocol version as the OpenTelemetry Collector OTLP receiver v0.50.0.
  • The Cloud Workload Security agent can now monitor and evaluate rules on bind syscall.
  • [corechecks/snmp] add scale factor option to metric configurations
  • Evaluate memory.usage metrics based on collected metrics.
  • Enhancement Notes:
  • PM: DD_APM_FILTER_TAGS_REQUIRE and DD_APM_FILTER_TAGS_REJECT can now be a literal JSON array. e.g. ["someKey:someValue"] This allows for matching tag values with the space character in them.
  • SNMP Traps are now sent to a dedicated intake via the epforwarder.
  • Update SNMP traps database to include integer enumerations.
  • The Agent now supports a single com.datadoghq.ad.checks label in Docker, containerd, and Podman containers. It merges the contents of the existing check_names, init_configs (now optional), and instances annotations into a single JSON value.
  • Add a new Agent telemetry metric autodiscovery_poll_duration (histogram) to monitor configuration poll duration in Autodiscovery.
  • APM: Added /config/set endpoint in trace-agent to change configuration settings during runtime. Supports changing log level(log_level).
  • APM: When the X-Datadog-Trace-Count contains an invalid value, an error will be issued.
  • Upgrade to Docker client 20.10, reducing the duration of docker check on Windows (requires Docker >= 20.10 on the host).
  • The Agent maintains scheduled cluster and endpoint checks when the Cluster Agent is unavailable.
  • The Cluster Agent followers now forward queries to the Cluster Agent leaders themselves. This allows a reduction in the overall number of connections to the Cluster Agent and better spreads the load between leader and forwarders.
  • The kube_namespace tag is now included in all metrics, events, and service checks generated by the Helm check.
  • Include install_info to version-history.json
  • Allow nightly builds install on non-prod repos
  • Add a kubernetes_node_annotations_as_tags parameter to use Kubernetes node annotations as host tags.
  • Add more detailed logging around leadership status failures.
  • Move the experimental SNMP Traps Listener configuration under network_devices.
  • Add support for the DNS Monitoring feature of NPM to Linux kernels older than 4.1.
  • Adds segment_name and segment_id tags to PCF containers that belong to an isolation segment.
  • Make logs agent additional_endpoints reliable by default. This can be disabled by setting is_reliable: false on the additional endpoint.
  • On Windows, if a datadog.yaml file is found during an installation or upgrade, the dialogs collecting the API Key and Site are skipped.
  • Resolve SNMP trap variables with integer enumerations to their string representation.
  • [corechecks/snmp] Add profile static_tags config
  • Report telemetry metrics about the retry queue capacity: datadog.agent.retry_queue_duration.capacity_secs, datadog.agent.retry_queue_duration.bytes_per_sec and datadog.agent.retry_queue_duration.capacity_bytes
  • Updated cloud providers to add the Instance ID as a host alias for EC2 instances, matching what other cloud providers do. This should help with correctly identifying hosts where the customer has changed the hostname to be different from the Instance ID.
  • NTP check: Include /etc/ntpd.conf and /etc/openntpd/ntpd.conf for use_local_defined_servers.
  • Kubernetes pod with short-lived containers do not have log lines duplicated with both container tags (the stopped one and the running one) when logs are collected. This feature is enabled by default, set logs_config.validate_pod_container_id to false to disable it.
  • Security Notes:
  • Te Agent is built with Go 1.17.11.
  • Bug Fixes:
  • Updates defaults for the port and binding host of the experimental traps listener.
  • APM: The Agent is now performing rare span detection on all spans, as opposed to only dropped spans. This change will slightly reduce the number of rare spans kept unnecessarily.
  • APM OTLP: This change ensures that the ingest now standardizes certain attribute keys to their correct Datadog tag counter parts, such as: container tags, "operation.name", "service.name", etc.
  • APM: Fix a bug where the APM section of the GUI would not show up in older Internet Explorer versions on Windows.
  • Support dynamic Auth Tokens in Kubernetes v1.22+ (Bound Service Account Token Volume).
  • The %%host%% autodiscovery tag now works properly when using containerd, but only on Linux and when using IP v4 addresses.
  • Enhanced the coverage of pause-containers filtering on Containerd.
  • APM: Fix the loss of trace metric container information when large payloads need to be split.
  • Fix cri check producing no metrics when running on OpenShift / cri-o.
  • Fix missing health status from Docker containers in Live Container View.
  • Fix Agent startup failure when running as a non-privileged user (for instance, when running on OpenShift with restricted SCC).
  • Fix missing container metrics (container, containerd checks and live container view) on AWS Bottlerocket.
  • APM: Fixed an issue where "CPU threshold exceeded" logs would show the wrong user CPU usage by a factor of 100.
  • Ensures that when kubernetes_namespace_labels_as_tags is set, the namespace labels are always attached to metrics and logs, even when the pod is not ready yet.
  • Add missing support for UDPv6 receive path to NPM.
  • The agent workload-list --verbose command and the workload-list.log file in the flare do not show containers' environment variables anymore. Except for DD_SERVICE, DD_ENV and DD_VERSION.
  • Fixed a potential deadlock in the Python check runner during agent shutdown.
  • Fixes issue where trace-agent would not report any version info.
  • The DCA and the cluster runners no longer write warning logs to /tmp.
  • Fixes an issue where the Agent would panic when trying to inspect Docker containers while the Docker daemon was unavailable or taking too long to respond.
  • Other Notes:
  • Exclude teradata on Mac agents.

New in Datadog Agent Manager 7.36.0 (May 24, 2022)

  • Upgrade Notes:
  • Debian packages are now built on Debian 8. Newly built DEBs are supported on Debian >= 8 and Ubuntu >= 14.
  • The OTLP endpoint will no longer enable the legacy OTLP/HTTP endpoint 0.0.0.0:55681 by default. To keep using the legacy endpoint, explicitly declare it via the otlp_config.receiver.protocols.http.endpoint configuration setting or its associated environment variable,DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT.
  • Package signing keys were rotated:
  • DEB packages are now signed with key AD9589B7, a signing subkey of key F14F620E
  • RPM packages are now signed with key FD4BF915
  • New Features:
  • Adding support for IBM cloud. The agent will now detect that we're running on IBM cloud and collect host aliases (vm name and ID).
  • Added event collection in the Helm check. The feature is disabled by default. To enable it, set the collect_events option to true.
  • Adds a service check for the Helm check. The check fails for a release when its latest revision is in "failed" state.
  • Adds a kube_qos (quality of service) tag to metrics associated with kubernetes pods and their containers.
  • CWS can now track network devices creation and load TC classifiers dynamically.
  • CWS can now track network namespaces.
  • The DNS event type was added to CWS.
  • The OTLP ingest endpoint is now considered GA for metrics.
  • Enhancement Notes:
  • Traps OIDs are now resolved to names using user-provided 'traps db' files in snmp.d/traps_db/.
  • The Agent now supports a single ad.datadoghq.com/$IDENTIFIER.checks annotation in Kubernetes Pods and Services to configure Autodiscovery checks. It merges the contents of the existing "check_names", init_configs (now optional), and instances annotations into a single JSON value.
  • DD_URL environment variable can now be used to set the Datadog intake URL just like DD_DD_URL. If both DD_DD_URL and DD_URL are set, DD_DD_URL will be used to avoid breaking change.
  • Added a process-agent version command, and made the output mimic the core agent.
  • Windows: Add Datadog registry to Flare.
  • Add --service flag to stream-logs command to filter streamed logs in detail.
  • Support a simple date pattern for automatic multiline detection
  • APM: The OTLP ingest stringification of non-standard Datadog values such as Arrays and KeyValues is now consistent with OpenTelemetry attribute stringification.
  • APM: Connections to upload profiles to the Datadog intake are now closed after 47 seconds of idleness. Common tracer setups send one profile every 60 seconds, which coincides with the intake's connection timeout and would occasionally lead to errors.
  • The Cluster Agent now exposes a new metric cluster_checks_configs_info. It exposes the node and the check ID as tags.
  • KSM core check: add a new kubernetes_state.cronjob.complete service check that returns the status of the most recent job for a cronjob.
  • Retry more HTTP status codes for the logs agent HTTP destination.
  • COPYRIGHT-3rdparty.csv now contains each copyright statement exactly as it is shown on the original component.
  • Adds sidecar_present and sidecar_count tags on Cloud Foundry containers that run apps with sidecar processes.
  • Agent flare now includes output from the process and container checks.
  • Add the --cfgpath parameter in the Process Agent replacing --config.
  • Add the check subcommand in the Process Agent replacing --check (-check). Only warn once if the -version flag is used.
  • Adds human readable output of process and container data in the check command for the Process Agent.
  • The Agent flare command now collects Process Agent performance profile data in the flare bundle when the --profile flag is used.
  • Deprecation Notes:
  • Deprecated process-agent --vesion in favor of process-agent version.
  • The logs configuration use_http and use_tcp flags have been deprecated in favor of force_use_http and force_use_tcp.
  • OTLP ingest: metrics.send_monotonic_counter has been deprecated in favor of metrics.sums.cumulative_monotonic_mode. metrics.send_monotonic_counter will be removed in v7.37.
  • OTLP ingest: metrics.report_quantiles has been deprecated in favor of metrics.summaries.mode. metrics.report_quantiles will be removed in v7.37 / v6.37.
  • Remove the unused --ddconfig (-ddconfig) parameter. Deprecate the --config (-config) parameter (show warning on usage).
  • Deprecate the --check (-check) parameter (show warning on usage).
  • Bug Fixes:
  • Bump GoSNMP to fix incomplete support of SNMP v3 INFORMs.
  • APM: OTLP: Fixes an issue where attributes from different spans were merged leading to spans containing incorrect attributes.
  • APM: OTLP: Fixed an inconsistency where the error message was left empty in cases where the "exception" event was not found. Now, the span status message is used as a fallback.
  • Fixes an issue where some data coming from the Agent when running in ECS Fargate did not have task_*, ecs_cluster_name, region, and availability_zone tags.
  • Collect the "0" value for resourceRequirements if it has been set
  • Fix a bug introduced in 7.33 that could prevent auto-discovery variable %%port_<name>%% to not be resolved properly.
  • Fix a panic in the Docker check when a failure happens early (when listing containers)
  • Fix missing docker.memory.limit (and docker.memory.in_use) on Windows
  • Fixes a conflict preventing NPM/USM and the TCP Queue Length check from being enabled at the same time.
  • Fix permission of "/readsecret.sh" script in the agent Dockerfile when executing with dd-agent user (for cluster check runners)
  • For Windows, fixes problem in upgrade wherein NPM driver is not automatically started by system probe.
  • Fix Gohai not being able to fetch network information when running on a non-English windows (when the output of commands like ipconfig were not in English). gohai no longer relies on system commands but uses Golang net package instead (same as Linux hosts). This bug had the side effect of preventing network monitoring data to be linked back to the host.
  • Time-based metrics (for example, kubernetes_state.pod.age, kubernetes_state.pod.uptime) are now comparable in the Kubernetes state core check.
  • Fix a risk of panic when multiple KSM Core check instances run concurrently.
  • For Windows, includes NPM driver 1.3.2, which has a fix for a BSOD on system probe shutdown.
  • Adds new --json flag to check. process-agent check --json now outputs valid json.
  • On Windows, includes NPM driver update which fixes performance problem when host is under high connection load.
  • Previously, the Agent could not log the start or end of a check properly after the first five check runs. The Agent now can log the start and end of a check correctly.
  • Other Notes
  • Include pre-generated trap db file in the conf.d/snmp.d/traps_db/ folder.
  • Gohai dependency has been upgraded. This brings a newer version of gopsutil and a fix when fetching network information in non-english Windows (see fixes section).

New in Datadog Agent Manager 7.35.2 (May 5, 2022)

  • Bug Fixes:
  • Fix a regression impacting CSPM metering

New in Datadog Agent Manager 7.35.1 (Apr 13, 2022)

  • Bug Fixes:
  • The weak dependency of datadog-agent, datadog-iot-agent and dogstatsd deb packages on the datadog-signing-keys package has been fixed to ensure proper upgrade to version 1:1.1.0.

New in Datadog Agent Manager 7.35.0 (Apr 7, 2022)

  • Upgrade Notes:
  • Agent, Dogstatsd and IOT Agent RPMs now have proper preinstall dependencies. On AlmaLinux, Amazon Linux, CentOS, Fedora, RHEL and Rocky Linux, these are:
  • Coreutils (provided by package coreutils-single on certain platforms)
  • Grep
  • Glibc-common
  • Shadow-utils
  • On OpenSUSE and SUSE, these are:
  • Coreutils
  • Grep
  • Glibc
  • Shadow
  • APM Breaking change: The default head based sampling mechanism settings apm_config.max_traces_per_second or DD_APM_MAX_TPS, when set to 0, will be sending 0% of traces to Datadog, instead of 100% in previous Agent versions.
  • The OTLP ingest endpoint is now considered stable for traces. Its configuration is located in the top-level otlp_configsection.
  • Support for the deprecated experimental.otlp section and the DD_OTLP_GRPC_PORT and DD_OTLP_HTTP_PORT environment variables will be removed in Agent 7.37. Use the otlp_config section or the DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT and DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT environment variables instead.
  • New Features:
  • The Cloud Workload Security agent can now monitor and evaluate rules on signals (kill syscall).
  • CWS allows to write SECL rule on environment variable values.
  • The security Agent now offers a command to directly download the policy file from the API.
  • Policy can now define macros with items specified as a YAML list instead of a SECL expression, as
  • ` - my_macro: values: - value1 - value2`
  • In addition, macros and rules can now be updated in later loaded policies (default.policy is loaded first, the other policies in the folder are loaded in alphabetical order).
  • The previous macro can be modified with:
  • ` - my_macro: combine: merge values: - value3`
  • It can also be overriden with:
  • ` - my_macro: combine: override values: - my-single-value`
  • Rules can now also be disabled with:
  • ` - my_rule: disabled: true``
  • Cloud Workload Security now works on Google's Container Optimized OS LTS versions, starting from v81.
  • Allow setting variables to store states through rule actions. Action rules can now be defined as follows:
  • ` - id: my_rule expression: ... actions: - set: name: my_boolean_variable value: true - set: name: my_string_variable value: a string - set: name: my_other_variable field: process.file.name`
  • These actions will be executed when the rule is triggered by an event. Right now, only set actions can be defined. name is the name of the variable that will be set by the actions. The value for the variable can be specified by using:
  • Value for a predefined value (strings, integers, booleans, array of strings and array of integers are currently supported).
  • field for the value of an event field.
  • Variable arrays can be modified by specifying append: true.
  • Variables can be reused in rule expressions like a regular variable:
  • ` - id: my_other_rule expression: |- open.file.path == ${my_other_variable}`
  • By default, variables are global. They can be bounded to a specific process by using the process scope as follows:
  • ` - set: name: my_scoped_variable scope: process value: true`
  • The variable can be referenced in other expressions as ${process.my_scoped_variable}. When the process dies, the variable with be automatically freed.
  • Expose additional CloudFoundry metadata in the DCA API that the PCF firehose nozzles can use to reduce the load on the CC API.
  • Added new "Helm" cluster check that collects information about the Helm releases deployed in the cluster.
  • Add the process_agent_runtime_config_dump.yaml file to the core Agent flare with process-agent runtime settings.
  • Add process-agent status output to the core Agent status command.
  • Added new process-agent status command to help with troubleshooting and for better consistency with the core Agent. This command is intended to eventually replace process-agent --info.
  • CWS rules can now be written on kernel module loading and deletion events.
  • The splice event type was added to CWS. It can be used to detect the Dirty Pipe vulnerability.
  • Add two options under a new config prefix to send logs to Vector instead of Datadog. vector.logs.enabled must be set to true, along with vector.logs.url that should be set to point to a Vector configured accordingly. This overrides the main endpoints, additional endpoints remains fully functional.
  • Adds new Windows system check, winkmem. This check reports the top users of paged and non-paged memory in the windows kernel.
  • Enhancement Notes:
  • Add support for the device_namespace tag in SNMP Traps.
  • SNMP Trap Listener now also supports protocol versions 1 and 3 on top of the existing v2 support.
  • The cluster agent has an external metrics provider feature to allow using Datadog queries in Kubernetes HorizontalPodAutoscalers. It sometimes faces issues like:
  • To mitigate this problem, use the new external_metrics_provider.chunk_size parameter to reduce the number of queries that are batched by the Agent and sent together to Datadog.
  • Added a new implementation of the containerd check based on the container check. Several metrics are not emitted anymore: containerd.mem.current.max, containerd.mem.kernel.limit, containerd.mem.kernel.max, containerd.mem.kernel.failcnt, containerd.mem.swap.limit, containerd.mem.swap.max, containerd.mem.swap.failcnt, containerd.hugetlb.max, containerd.hugetlb.failcount, containerd.hugetlb.usage, containerd.mem.rsshuge, containerd.mem.dirty, containerd.blkio.merged_recursive, containerd.blkio.queued_recursive, containerd.blkio.sectors_recursive, containerd.blkio.service_recursive_bytes, containerd.blkio.time_recursive, containerd.blkio.serviced_recursive, containerd.blkio.wait_time_recursive, containerd.blkio.service_time_recursive. The containerd.image.size now reports all images present on the host, container tags are removed.
  • Migrate the cri check to generic check infrastructure. No changes expected in metrics.
  • Tags configured with DD_TAGS or DD_EXTRA_TAGS in an ECS Fargate or EKS Fargate environment are now attached to Dogstatsd metrics.
  • Added a new implementation of the docker check based on the container check. Metrics produced do not change. Added the capability to run the docker check on Linux without access to /sys or /proc, although with a limited number of metrics.
  • The DogstatsD protocol now supports a new field that contains the client's container ID. This allows enriching DogstatsD metrics with container tags.
  • When ec2_collect_tags is enabled, the Agent now attempts to fetch data from the instance metadata service, falling back to the existing EC2-API-based method of fetching tags. Support for tags in the instance metadata service is an opt-in EC2 feature, so this functionality will not work automatically.
  • Add support for ECS metadata v4 API https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html
  • Agents are now built with Go 1.17.6.
  • On ECS Fargate and EKS Fargate, Agent-configured tags (DD_TAGS/DD_EXTRA_TAGS) are now applied to all integration-collected metrics.
  • Logs from JMXFetch will now be included in the Agent logfile, regardless of the log_level setting of the Agent.
  • KSMCore node.ready service check now reports warning instead of unknown when a node enters an unknown state.
  • Added DD_PROCESS_CONFIG_PROCESS_DD_URL and DD_PROCESS_AGENT_PROCESS_DD_URL environment variables
  • Added DD_PROCESS_CONFIG_ADDITIONAL_ENDPOINTS and DD_PROCESS_AGENT_ADDITIONAL_ENDPOINTS environment variables
  • Automatically extract the org.opencontainers.image.source container label into the git.repository_url tag.
  • The experimental OTLP ingest endpoint now supports the same settings as the OpenTelemetry Collector OTLP receiver v0.43.1.
  • The OTLP ingest endpoint now supports the same settings as the OpenTelemetry Collector OTLP receiver v0.44.0.
  • The OTLP ingest endpoint can now be configured through environment variables.
  • The OTLP ingest endpoint now always maps conventional metric resource-level attributes to metric tags.
  • OTLP ingest: the k8s.pod.uid and container.id semantic conventions are now used for enriching tags in OTLP metrics.
  • Add the DD_PROCESS_CONFIG_MAX_PER_MESSAGE env variable to set the process_config.max_per_message. Add the DD_PROCESS_CONFIG_MAX_CTR_PROCS_PER_MESSAGE env variable to set the process_config.max_ctr_procs_per_message.
  • Add the DD_PROCESS_CONFIG_EXPVAR_PORT and DD_PROCESS_AGENT_EXPVAR_PORT env variables to set the process_config.expvar_port. Add the DD_PROCESS_CONFIG_CMD_PORT env variable to set the process_config.cmd_port.
  • Add the DD_PROCESS_CONFIG_INTERNAL_PROFILING_ENABLED env variable to set the process_config.internal_profiling.enabled.
  • Add the DD_PROCESS_CONFIG_SCRUB_ARGS and DD_PROCESS_AGENT_SCRUB_ARGS env variables to set the process_config.scrub_args. Add the DD_PROCESS_CONFIG_CUSTOM_SENSITIVE_WORDS and DD_PROCESS_AGENT_CUSTOM_SENSITIVE_WORDS env variables to set the process_config.custom_sensitive_words. Add the DD_PROCESS_CONFIG_STRIP_PROC_ARGUMENTS and DD_PROCESS_AGENT_STRIP_PROC_ARGUMENTS env variables to set the process_config.strip_proc_arguments.
  • Added DD_PROCESS_CONFIG_WINDOWS_USE_PERF_COUNTERS and DD_PROCESS_AGENT_WINDOWS_USE_PERF_COUNTERS environment variables
  • Add the DD_PROCESS_CONFIG_QUEUE_SIZE and DD_PROCESS_AGENT_QUEUE_SIZE env variables to set the process_config.queue_size. Add the DD_PROCESS_CONFIG_RT_QUEUE_SIZE and DD_PROCESS_AGENT_RT_QUEUE_SIZE env variables to set the process_config.rt_queue_size. Add the DD_PROCESS_CONFIG_PROCESS_QUEUE_BYTES and DD_PROCESS_AGENT_PROCESS_QUEUE_BYTES env variables to set the process_config.process_queue_bytes.
  • Changes process payload chunking in the process Agent to take into account the size of process details such as CLI and user name. Adds the process_config.max_message_bytes setting for the target max (uncompressed) payload size.
  • When ec2_collect_tags is configured, the Agent retries API calls to gather EC2 tags before giving up.
  • Retry HTTP transaction when the HTTP status code is 404 (Not found).
  • Validate SNMP namespace to ensure it respects length and illegal character rules.
  • Include /etc/chrony.conf for use_local_defined_servers.
  • Deprecation Notes:
  • The security Agent commands check-policies and reload are deprecated. Use runtime policy check and runtime policy reload respectively instead.
  • Configuration process_config.enabled is now deprecated. Use process_config.process_collection.enabled and process_config.container_collection.enabled settings instead to control container and process collection in the process Agent.
  • Removed API_KEY environment variable from the process agent. Use DD_API_KEY instead
  • Removes the DD_PROCESS_AGENT_CONTAINER_SOURCE environment variable from the Process Agent. The list of container sources now entirely depends on the activated features.
  • Removed unused process_config.windows.args_refresh_interval config setting
  • Removed unused process_config.windows.add_new_args config setting
  • Removes the process_config.max_ctr_procs_per_message setting.
  • Bug Fixes:
  • APM: OTLP: Fixes an issue where attributes from different spans were merged leading to spans containing incorrect attributes.
  • APM: Fixed an issue which caused a panic when receiving OTLP traces with invalid data (specifically duplicate SpanIDs).
  • Silence the misleading error message No valid api key found, reporting the forwarder as unhealthy from the output of the agent check command.
  • Fixed a deadlock in the Logs Agent.
  • Exclude filters no longer apply to empty container names, images, or namespaces.
  • Fix CPU limit calculation for Windows containers.
  • Fix a rare panic in Gohai when collecting the system's Python version.
  • For Windows, includes NPM driver 1.3.2, which has a fix for a BSOD on system probe shutdown.
  • OTLP ingest now uses the exact sum and count values from OTLP Histograms when generating Datadog distributions.

New in Datadog Agent Manager 7.34.0 (Mar 2, 2022)

  • Bug Fixes:
  • APM: Fix SQL obfuscation error on statements using bind variables starting with digits
  • Adds Windows NPM driver 1.3.1, which contains a fix for the system crash on system-probe shutdown under heavy load.
  • DD_CLUSTER_NAME can be used to define the kube_cluster_name on EKS Fargate.
  • On Windows the Agent now correctly detects Windows 11.
  • Fixes an issue where the Docker check would undercount the number of stopped containers in the docker.containers.stopped and docker.containers.stopped.total metrics, accompanied by a "Cannot split the image name" error in the logs.
  • Fixed a bug that caused a panic when running the docker check in cases where there are containers stuck in the "Removal in Progress" state.
  • On EKS Fargate, the container check is scheduled while no suitable metrics collector is available, leading to excessive logging. Also fixes an issue with Liveness/Readiness probes failing regularly.
  • Allow Prometheus scrape tls_verify to be set to false and change label_to_hostname type to string.
  • Fixes truncated queries using temp tables in SQL Server.
  • Fixes an NPM issue on Windows where if the first packet on a UDP flow is inbound, it is not counted correctly.
  • On macOS, fix a bug where the Agent would not gracefully stop when sent a SIGTERM signal.
  • Fix missing tags with eBPF checks (OOM Kill/TCP Queue Length) with some container runtimes (for instance, containerd 1.5).
  • The experimental OTLP endpoint now ignores hostname attributes with localhost-like names for hostname resolution.
  • Fixes an issue where cumulative-to-delta OTLP metrics conversion did not take the hostname into account.

New in Datadog Agent Manager 7.33.1 (Feb 10, 2022)

  • Bug Fixes:
  • Fixes a panic that happens occasionally when handling tags for deleted containers or pods.
  • Fixes security module failing to start on kernels 4.14 and 4.15.

New in Datadog Agent Manager 7.33.0 (Jan 26, 2022)

  • Upgrade Notes:
  • APM: The apm_config.max_traces_per_second setting no longer affects error sampling. To change the TPS for errors, use apm_config.error_traces_per_second instead.
  • Starting from this version of the Agent, the Agent does not run on SLES 11. The new minimum requirement is SLES >= 12 or OpenSUSE >= 15 (including OpenSUSE 42).
  • Changed the default value of logs_config.docker_container_use_file to true. The agent will now prefer to use files for collecting docker logs and fall back to the docker socket when files are not available.
  • Upgrade Docker base image to ubuntu:21.10 as new stable release.
  • New Features:
  • Autodiscovery of integrations now works with containerd.
  • Metadata information sent by the Agent are now part of the flares. This will allow for easier troubleshooting of issues related to metadata.
  • APM: Added credit card obfuscation. It is off by default and can be enabled using the env. var. DD_APM_OBFUSCATION_CREDIT_CARDS_ENABLED or apm_config.obfuscation.credit_cards.enabled. There is also an option to enable an additional Luhn checksum check in order to eliminate false negatives, but it comes with a performance cost and should not be used unless absolutely needed. The option is DD_APM_OBFUSCATION_CREDIT_CARDS_LUHN or apm_config.obfuscation.credit_cards.luhn.
  • APM: The rare sampler can now be disabled using the environment variable DD_APM_DISABLE_RARE_SAMPLER or the apm_config.disable_rare_sampler configuration. By default the rare sampler catches 5 extra trace chunks per second on top of the head base sampling. The TPS is spread to catch all combinations of service, name, resource, http.status, error.type missed by head base sampling.
  • APM: The error sampler TPS can be configured using the environment variable DD_APM_ERROR_TPS or the apm_config.error_traces_per_second configuration. It defaults to 10 extra trace chunks sampled per second on top of the base head sampling. The TPS is spread to catch all combinations of service, name, resource, http.status, and error.type.
  • Add a generic container check. It generates container.* metrics based on all running containers, regardless of the container runtime used (among the supported ones).
  • Added new option "container_labels_as_tags" that allows the Agent to extract container label values and set them as metric tags values. It's equivalent to the existing "docker_labels_as_tags", but it also works with containerd.
  • CSPM: enable the usage of the print function in Rego rules.
  • CSPM: add option to dump reports to file, when running checks manually. CSPM: constants can now be defined in rego rules and will be usable from rego rules.
  • CWS: SECL expressions can now make use of predefined variables. ${process.pid} variable refers to the pid of the process that trigger the event.
  • Enable NPM DNS domain collection by default.
  • Exposed additional experimental configuration for OTLP metrics translation via experimental.otlp.metrics.
  • Add two options under a new config prefix to send metrics to Vector instead of Datadog. vector.metrics.enabled must be set to true, along with vector.metrics.url that should be set to point to a Vector configured accordingly.
  • The bpf syscall is now monitored by CWS; rules can be written on BPF commands.
  • Add runtime settings support to the security-agent. Currenlty only the log-level is supported.
  • APM: A new intake endpoint was added as /v0.6/traces, which accepts a new, more compact and efficient payload format.
  • Enhancement Notes:
  • Adds Nomad namespace and datacenter to list of env vars extracted from Docker containers.
  • Add a new On-disk storage section to agent status command.
  • Run CSPM commands as a configurable user. Defaults to 'nobody'.
  • CSPM: the findings query now defaults to data.datadog.findings
  • The docker.exit service check has a new tag exit_code. The 143 exit code is considered OK by default, in addition to 0. The Docker check supports a parameter ok_exit_codes to allow choosing exit codes that are considered OK.
  • Allow dogstatsd replay files to be fully loaded into memory as opposed to relying on MMAP. We still default to MMAPing replay targets.
  • kubernetes_state.node.* metrics are tagged with kubelet_version, container_runtime_version, kernel_version, and os_image.
  • The Kube State Metrics Core check uses ksm v2.1.
  • Lowercase the cluster names discovered from cloud providers to ease moving between different Datadog products.
  • On Windows, allow enabling process discovery in the process agent by providing PROCESS_DISCOVERY_ENABLED=true to the msiexec command.
  • Automatically extract the org.opencontainers.image.revision container label into the git.commit.sha tag.
  • The experimental OTLP endpoint now can be configured through the experimental.otlp.receiver section and supports the same settings as the OpenTelemetry Collector OTLP receiver v0.38.0.
  • The Process, APM, and Security agent now use the remote tagger introduced in Agent 7.26 by default. To disable it in the respective agent, the following settings need to be set to `false`:
  • apm_config.remote_tagger
  • process_config.remote_tagger
  • security_agent.remote_tagger
  • Allows the remote tagger timeout at startup to be configured by setting the remote_tagger_timeout_seconds config value. It also now defaults to 30 seconds instead of 5 minutes.
  • Calls to cloud metadata APIs for metadata like hostnames and IP addresses are now cached and the existing values used when the metadata service returns an error. This will prevent such metadata from temporarily "disappearing" from hosts.
  • Datadog Process Agent Service is started automatically by the core agent on Windows when process discovery is enabled in the config.
  • All packages - datadog-agent, datadog-iot-agent and datadog-dogstatsd -now support AlmaLinux and Rocky Linux distributions.
  • If unrecognized DD_.. environment variables are set, the agent will now log a warning at startup, to help catch deployment typos.
  • Update the embedded pip version to 21.3.1 on Python 3 to allow the use of newer build backends.
  • Metric series can now be submitted using the V2 API by setting use_v2_api.series to true. This value defaults to false, and should only be set to true in internal testing scenarios. The default will change in a future release.
  • Add support for Windows 20H2 in published Docker images
  • Add a new agent command to dump the content of the workloadmeta store agent workload-list. The output of agent workload-list --verbose is included in the agent flare.
  • Bug Fixes:
  • Strip special characters (n, r and t) from OctetString
  • APM: Fix bug where obfuscation fails for autovacuum sql text. For example, SQL text like autovacuum: VACUUM ANALYZE fake.table will no longer fail obfuscation.
  • APM: Fix SQL obfuscation failures on queries with literals that include non alpha-numeric characters
  • APM: Fix obfuscation error on SQL queries using the '!' operator.
  • Fixed Windows Dockerfile scripts to make the ECS Fargate Python check run when the agent is deployed in ECS Fargate Windows.
  • Fixing deadlock when stopping the agent righ when a metadata provider is scheduled.
  • Fix a bug where container_include/exclude_metrics was applied on Autodiscovery when using Docker, preventing logs collection configured through container_include/exclude_logs.
  • Fix inclusion of registry.json file in flare
  • Fixes an issue where the agent would remove tags from pods or containers around 5 minutes after startup of either the agent itself, or the pods or containers themselves.
  • APM: SQL query obfuscation doesn't drop redacted literals from the obfuscated query when they are preceded by a SQL comment.
  • The Kube State Metrics Core check supports VerticalPodAutoscaler metrics.
  • The experimental OTLP endpoint now uses the StartTimestamp field for reset detection on cumulative metrics transformations.
  • Allow configuring process discovery check in the process agent when both regular process and container checks are off.
  • Fix disk check reporting /dev/root instead of the actual block device path and missing its tags when tag_by_label is enabled.
  • Remove occasionally hanging autodiscovery errors from the agent status once a pod is deleted.

New in Datadog Agent Manager 7.32.4 (Dec 22, 2021)

  • JMXFetch: Remove all dependencies on log4j and use java.util.logging instead.

New in Datadog Agent Manager 7.32.3 (Dec 16, 2021)

  • Security Notes:
  • Upgrade the log4j dependency to 2.12.2 in JMXFetch to fully address CVE-2021-44228 and CVE-2021-45046

New in Datadog Agent Manager 7.32.2 (Dec 15, 2021)

  • Set -Dlog4j2.formatMsgNoLookups=True when starting the JMXfetch process to mitigate vulnerability described in CVE-2021-44228

New in Datadog Agent Manager 7.32.1 (Dec 10, 2021)

  • Bug Fixes:
  • On ECS, fix the volume of calls to ListTagsForResource which led to ECS API throttling.
  • Fix incorrect use of a namespaced PID with the host procfs when parsing mountinfo to ensure debugfs is mounted correctly. This issue was preventing system-probe startup in AWS ECS. This issue could also surface in other containerized environments where PID namespaces are in use and /host/proc is mounted.
  • Fixes system-probe startup failure due to kernel version parsing on Linux 4.14.252+. This specifically was affecting versions of Amazon Linux 2, but could affect any Linux kernel in the 4.14 tree with sublevel >= 252.

New in Datadog Agent Manager 7.31.1 (Oct 4, 2021)

  • Fix CSPM not sending intake protocol causing lack of host tags.

New in Datadog Agent Manager 7.30.0 (Aug 16, 2021)

  • New Features:
  • APM: It is now possible to enable internal profiling of the trace-agent. Warning however that this will incur additional billing charges and should not be used unless agreed with support.
  • APM: Added experimental support for Opentelemetry collecting via experimental.otlp.{http_port,grpc_port} or their corresponding environment variables (DD_OTLP{HTTP,GRPC}_PORT).
  • Kubernetes Autodiscovery now supports additional template variables:
  • kube_pod_name%%, %%kube_namespace%% and %%kube_pod_uid%%.
  • Add support for SELinux related events, like boolean value updates or enforcment status changes.
  • Enhancement Notes:
  • Reveals useful information within a SQL execution plan for Postgres.
  • Add support to provide options to the obfuscator to change the behavior.
  • APM: Added additional tags to profiles in AWS Fargate environments.
  • APM: Main hostname acquisition now happens via gRPC to the Datadog Agent.
  • Make the check_sampler bucket expiry configurable based on the number of CheckSampler commits.
  • The cri check no longer sends metrics for stopped containers, in line with containerd and docker checks. These metrics were all zeros in the first place, so no impact is expected.
  • Kubernetes State Core check: Job metrics corresponding to a Cron Job are tagged with a kube_cronjob tag.
  • Environment autodiscovery is now used to selectively activate providers (kubernetes, docker, etc.) inside each component (tagger, host tags, hostname).
  • When using a secret_backend_command
  • STDERR is always logged with a debug log level. This eases troubleshooting a user's secret_backend_command in a containerized environment. secret_backend_timeout has been increased from 5s to 30s. This increases support for the slow to load Python script used for secret_backend_command. This was an issue when importing large libraries in a containerized environment.
  • Increase default timeout to sync Kubernetes Informers from 2 to 5 seconds.
  • The Kube State Metrics Core checks adds the global user-defined tags DD_TAGS) by the default.
  • If the new log_all_goroutines_when_unhealthy configuration parameter is set to true, when a component is unhealthy, log the stacktraces of the goroutines to ease the investigation.
  • The amount of time the agent waits before scanning for new logs is
  • now configurable with logs_config.file_scan_period
  • Flares now include goroutine bing and mutex profiles if enabled.
  • New flare options were added to collect new profiles at the same time as cpu profile.
  • Add a section about container inclusion/exclusion errors to the agent status command.
  • Runtime Security now provide kernel related information as part of the flare.
  • Python interpreter sys.executable is now set to the appropriate interpreter's executable path. This should allow multiprocessing to be able to spawn new processes since it will try to invoke the
  • Python interpreter instead of the Agent itself. It should be noted though that the Pyton packages injected at runtime by the Agent are only available from the main process, not from any sub-processes.
  • Add a single entrypoint script in the agent docker image. This script will be leveraged by a new version of the Helm chart. corechecks/snmp] Add bulk_max_repetitions config
  • Add device status snmp corecheck metadata snmp/corecheck] Add interface.id_tags needed to correlated metadata interfaces with interface metrics
  • In addition to the existing /readsecret.py script, the Agent container image contains another secret helper script readsecret.sh, faster and more reliable.
  • Consider pinned CPUs (cpusets) when calculating CPU limit from cgroups.
  • Bug Fixes:
  • APM: Fix SQL obfuscation on postgres queries using the tilde operator.
  • APM: Fixed an issue with the Web UI on Internet Explorer.
  • APM: The priority sampler service catalog is no longer unbounded. It is now limited to 5000 service & env combinations.
  • Apply the max_returned_metrics parameter from prometheus annotations, if configured.
  • Removes noisy error logs when collecting Cloud Foundry application containers
  • For dogstatsd captures, Only serialize to disk the portion of buffers actually used by the payloads ingested, not the full buffer.
  • Fix a bug in cgroup parser preventing from getting proper metrics in Container Live View when using CRI-O and systemd cgroup manager.
  • Avoid sending duplicated datadog.agent.up service checks.
  • When tailing logs from docker with DD_LOGS_CONFIG_DOCKER_CONTAINER_USE_FILE=true and a source container label is set the agent will now respect that label and use it as the source. This aligns the behavior with tailing from the docker socket.
  • On Windows, when the host shuts down, handles the PreShutdown message to avoid the error
  • The DataDog Agent service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 60000 milliseconds: Restart the service in Event Viewer.
  • Fix label joins in the Kube State Metrics Core check.
  • Append the cluster name, if found, to the hostname for kubernetes_state_core metrics.
  • Ensure the health probes used as Kubernetes liveness probe are not failing in case of issues on the network or on an external component.
  • Remove unplanned call between the process-agent and the the DCA when the orchestratorExplorer feature is disabled. corechecks/snmp] Set default oid_batch_size to 5. High oid batch size can lead to timeouts.
  • Agent collecting Docker containers on hosts with a lot of container churn now uses less memory by properly purging the respective tags after the containers exit. Other container runtimes were not affected by the issue.
  • Other Notes:
  • APM: The trace-agent no longer warns on the first outgoing request retry, only starting from the 4th.
  • All Agent binaries are now compiled with Go 1.15.13
  • JMXFetch upgraded to 0.44.2
  • Build environment changes:
  • omnibus-software: [cacerts] updating with latest: 2021-07-05
  • omnibus-ruby: Support 'Recommends' dependencies for deb packages
  • Runtime Security doesn't set the service tag with the runtime-security-agent value by default.

New in Datadog Agent Manager 7.29.0 (Jul 5, 2021)

  • Upgrade Notes:
  • Upgrade Docker base image to ubuntu:21.04 as new stable release.
  • New Features:
  • New extra_tags setting and DD_EXTRA_TAGS environment variable can be
  • used to specify additional host tags.
  • Add network devices metadata collection
  • APM: The obfuscator adds two new features (dollar_quoted_func and keep_sql_alias). They are off by default.
  • For more details see PR 8071. We do not recommend using these
  • features unless you have a good reason or have been recommended by
  • support for your specific use-case.
  • APM: Add obfuscator support for Postgres dollar-quoted string constants.
  • Tagger state will now be stored for dogstatsd UDS traffic captures with origin detection. The feature will track the incoming traffic, building a map of traffic source processes and their source containers, then storing the relevant tagger state into the capture file. This will allow to not only replay the traffic, but also load a snapshot of the tagger state to properly tag replayed payloads in the dogstatsd pipeline.
  • New host_aliases setting can be used to add custom host aliases in addition to aliases obtained from cloud providers automatically.
  • Paths can now be relsolved using an eRPC request.
  • Add time comparison support in SECL allow to write rules such as:
  • open.file.path == "/etc/secret" &&
  • process.created_at > 5s
  • Enhancement Notes:
  • Add the following new metrics to the kubernetes_state_core.
  • node.ephemeral_storage_allocatable`
  • node.ephemeral_storage_capacity
  • Agent can now set hostname based on Azure instance metadata. See the new azure_hostname_style configuration option.
  • Compliance agents can now generated multiple reports per run.
  • Docker and Kubernetes log launchers will now be retried until one succeeds instead of falling back to the docker launcher by default.
  • Increase payload size limit for dbm-metrics from 1
  • MB to 20 MB.
  • Expose new batch_max_size and batch_max_content_size config settings for all logs endpoints.
  • Adds improved cadence/resolution captures/replay to dogstatsd traffic captures. The new file format will store payloads with nanosecond resolution. The replay feature remains
  • backward-compatible.
  • Support fetching host tags using ECS task and EKS IAM roles.
  • Improve the resiliency of the datadog-agent check command when running Autodiscovered checks.
  • Adding the hostname to the host aliases when running on GCE
  • Display more information when the error
  • Could not initialize instance happens. JMXFetch upgraded to
  • 0.44.0
  • Kubernetes pod with short-lived containers won't have a few logs of lines duplicated with both container tag (the stopped one and the running one) anymore while logs are being collected. Mount /var/log/containers and use
  • logs_config.validate_pod_container_id to enable this feature.
  • The kube state metrics core check now tags pod metrics with a reason tag. It can be NodeLost, Evicted or
  • UnexpectedAdmissionError.
  • Implement the following synthetic metrics in the
  • kubernetes_state_core.
  • cronjob.count
  • endpoint.count
  • hpa.count
  • `vpa.count
  • Add system.cpu.interrupt on linux.
  • Authenticate logs http input requests using the API key header rather than the URL path.
  • Upgrade embedded Python 3 from 3.8.8 to 3.8.10. See Python 3.8's changelog.
  • Show autodiscovery errors from pod annotations in agent status.
  • Paths are no longer limited to segments of 128 characters and a depth of 16. Each segment can now be up to 255 characters (kernel limit) and with a depth of up to 1740 parents.
  • Add loader as snmp_listener.loader config
  • Make SNMP Listener configs compatible with SNMP Integration configs
  • The agent stream-logs command will use less CPU while idle.
  • Security Notes:
  • Redact the whole annotation "kubectl.kubernetes.io/last-applied-configuration" to ensure we don't expose secrets.
  • Bug Fixes:
  • Imports the value of non_local_traffic to dogstatsd_non_local_traffic (in addition
  • to apm_config.non_local_traffic)
  • when upgrading from Datadog Agent v5.
  • Fixes the Agent using 100% CPU on MacOS Big Sur.
  • Declare database_monitoring.{samples,metrics} as known keys in order to remove "unknown key" warnings on startup.
  • Fixes the container_name tag not being updated after Docker containers were renamed.
  • Fixes CPU utilization being underreported on Windows hosts with more than one physical CPU.
  • Fix CPU limit used for Live Containers page in ECS Fargate environments.
  • Fix bug introduced in 7.26 where default checks were schedueld on
  • ECS Fargate due to changes in entrypoint scripts.
  • Fix a bug that can make the agent enable incompatible Autodiscovery listeners.
  • An error log was printed when the creation date or the started date of a fargate container was not found in the fargate API payload.
  • This would happen even though it was expected to not have these dates because of the container being in a given state. This is now fixed and the error is only printed when it should be.
  • Fix the default value of the configuration option forwarder_storage_path when run_path is set. The default value is RUN_PATH/transactions_to_retry where RUN_PATH is defined by the configuration option run_path.
  • In some cases, compliance checks using YAML file with JQ expressions were failing due to discrepencies between YAML parsing and gojq handling.
  • On Windows, fixes inefficient string conversion
  • Reduce CPU usage when logs agent is unable to reach an http endpoint.
  • Fixed no_proxy depreciation warning from being logged too frequently. Added better warnings for when the proxy behavior could change.
  • Ignore CollectorStatus response from orchestrator-intake in the process-agent to prevent changing realtime mode interval to default 2s.
  • Fixes an issue where the Agent would not retry resource tags collection for containers on ECS if it could retrieve only a subset of tags. Now it will keep on retrying until the complete set of tags is collected.
  • Fix noisy configuration error when specifying a proxy config and using secrets management.
  • Reduce amount of log messages on windows when tailing log files.

New in Datadog Agent Manager 7.28.1 (May 31, 2021)

  • Please refer to the 7.28.1 tag on integrations-core for the list of changes on the Core Checks

New in Datadog Agent Manager 7.28.0 (May 26, 2021)

  • New Features:
  • APM: Add a new feature flag component2name which determines the component tag value on a span to become its operation name. This facilitates compatibility with Opentracing.
  • Adds a functionality to allow capturing and replaying of UDS dogstatsd traffic.
  • Expose new aggregator.submit_event_platform_event python API with two supported event types: dbm-samples and dbm-metrics.
  • Runtime security reports environment variables.
  • Runtime security now reports command line arguments as part of the exec events.
  • The args_flags and args_options were added to the SECL language to ease the writing of runtime security rules based on command line arguments. args_flags is used to catch arguments that start by either one or two hyphen characters but do not accept any associated value.
  • Add support for ARM64 to the runtime security agent
  • Enhancement Notes:
  • Add oid_batch_size configuration as init and instance config
  • Add oid_batch_size config to snmp_listener
  • Group the output of agent tagger-list by entity and by source.
  • On Windows on a Domain Controller, if no domain name is specified, the installer will use the controller's joined domain.
  • Windows installer can now use the command line key EC2_USE_WINDOWS_PREFIX_DETECTION to set the config value of ec2_use_windows_prefix_detection
  • APM: The trace writer will now consider 408 errors to be retriable.
  • Build RPMs that can be installed in FIPS mode. This change doesn't affect SUSE RPMs.
  • RPMs are now built with RPM 4.15.1 and have SHA256 digest headers, which are required by RPM on CentOS 8/RHEL 8 when running in FIPS mode.
  • Note that newly built RPMs are no longer installable on CentOS 5/RHEL 5.
  • Make the check_sampler bucket expiry configurable
  • The Agent can be configured to replace colon : characters in the ECS resource tag keys by underscores _. This can be done by enabling ecs_resource_tags_replace_colon: true in the Agent config file or by configuring the environment variable DD_ECS_RESOURCE_TAGS_REPLACE_COLON=true.
  • Add jvm.gc.old_gen_size as an alias for Tenured Gen. Prevent double signing of release artifacts.
  • JMXFetch upgraded to v0.44.0.
  • The kubernetes_state_core check now collects two new metrics kubernetes_state.pod.age and kubernetes_state.pod.uptime.
  • Improve logs/sender throughput by adding optional concurrency for serializing & sending payloads.
  • Make kube_replica_set tag low cardinality
  • Runtime Security now supports regexp in SECL rules.
  • Add loader tag to snmp telemetry metrics
  • Network Performance Monitoring for windows now collects DNS stats, connections will be shows in the networks -> DNS page.
  • Deprecation Notes:
  • For internal profiling of agent processes, the profiling option has been renamed to internal_profiling to avoid confusion.
  • The single dash variants of the system-probe flags are now deprecated. Please use --config and --pid instead.
  • Bug Fixes:
  • APM: Fixes bug where long service names and operation names were not normalized correctly.
  • On Windows, fixes a bug in process agent in which the process agent would become unresponsive.
  • The Windows installer compares the DNS domain name and the joined domain name using a case-insensitive compare. This avoids an incorrect warning when the domain names match but otherwise have different cases. Replace usage of runtime.NumCPU when used to compute metrics related to CPU Hosts. On some Unix systems, runtime.NumCPU can be
  • influenced by CPU affinity set on the Agent, which should not affect the metrics computed for other processes/containers. Affects the CPU
  • Limits metrics (docker/containerd) as well as the live containers page metrics.
  • Fix issue where Kube Apiserver cache sync timeout configuration is not used.
  • Fix the usage of DD_ORCHESTRATOR_EXPLORER_ORCHESTRATOR_DD_URL and DD_ORCHESTRATOR_EXPLORER_MAX_PER_MESSAGE environment variables.
  • Fix a panic that could occur in Docker AD listener when doing docker inspect fails
  • Fix a small leak where the Agent in some cases keeps in memory identifiers corresponding to dead objects (pods, containers).
  • Log file byte count now works correctly on Windows.
  • releases purge /var/log folder on ugprade. Packaging: ensure only one pip3 version is shipped in embedded/directory
  • Add a validation step before accepting metrics set in HPAs. This ensures that no obviously-broken metric is accepted and goes on to break the whole metrics gathering process.
  • The Windows installer now log only once when it fails to replace a property.
  • Windows installer will not abort if the Server service is not running (introduced in 6.24.0/7.24.0).

New in Datadog Agent Manager 7.27.1 (May 11, 2021)

  • Bug Fixes:
  • On Windows, exit system-probe if process-agent has not queried for connection data for 20 consecutive minutes. This ensures excessive system resources are not used while connection data is not being sent to Datadog.

New in Datadog Agent Manager 7.27.0 (Apr 15, 2021)

  • Upgrade Notes:
  • SECL and JSON format were updated to introduce the new attributes. Legacy support was added to avoid breaking existing rules.
  • The overlay_numlower integer attribute that was reported for files and executables was unreliable. It was replaced by a simple boolean attribute named in_upper_layer that is set to true when a file is either only on the upper layer of an overlayfs filesystem, or is an altered version of a file present in a base layer.
  • New Features:
  • APM: Add support for AIX/ppc64. Only POWER8 and above is supported.
  • Adds support for Kubernetes namespace labels as tags extraction (kubernetes_namespace_labels_as_tags).
  • Add snmp corecheck implementation in go
  • APM: Tracing clients no longer need to be sending traces marked with sampling priority 0 (AUTO_DROP) in order for stats to be correct.
  • APM: A new discovery endpoint has been added at the /info path. It reveals information about a running agent, such as available endpoints, version and configuration.
  • APM: Add support for filtering tags by means of apm_config.filter_tags or environment variables DD_APM_FILTER_TAGS_REQUIRE and DD_APM_FILTER_TAGS_REJECT.
  • Dogstatsd clients can now choose the cardinality of tags added by origin detection per metrics via the tag 'dd.internal.card' ("low", "orch", "high").
  • Added two new metrics to the Disk check: read_time and write_time.
  • The Agent can store traffic on disk when the in-memory retry queue of the forwarder limit is reached. Enable this capability by setting forwarder_storage_max_size_in_bytes to a positive value indicating the maximum amount of storage space, in bytes, that the Agent can use to store traffic on disk.
  • PCF Containers custom tags can be extracted from environment variables based on an include and exclude lists mechanism.
  • NPM is now supported on Windows, for Windows versions 2016 and above.
  • Runtime security now report command line arguments as part of the exec events.
  • Process credentials are now tracked by the runtime security agent.
  • Various user and group attributes are now collected, along with kernel capabilities.
  • File metadata attributes are now available for all events. Those new attributes include uid, user, gid, group, mode, modification time and change time.
  • Add config parameters to enable fim and runtime rules.
  • Network Performance Monitoring for Windows instruments DNS. Network data from Windows hosts will be tagged with the domain tag, and the DNS page will show data for Windows hosts.
  • Enhancement Notes:
  • Improves sensitive data scrubbing in URLs
  • Includes UTC time (unless already in UTC+0) and millisecond timestamp in status logs. Flare archive filename now timestamped in UTC.
  • Automatically set debug log_level when the '--flare' option is used with the JMX command
  • Number of matched lines is displayed on the status page for each source using multi_line log processing rules.
  • Add public IPv4 for EC2/GCE instances to host network metadata.
  • Add loader config to snmp_listener
  • Add snmp corecheck extract value using regex
  • Remove agent MaxNumWorkers hard limit that cap the number of check runners to 25. The removal is motivated by the need for some users to run thousands of integrations like snmp corecheck.
  • APM: Change in the stats payload format leading to reduced CPU and memory usage. Use of DDSketch instead of GKSketch to aggregate distributions leading to more accurate high percentiles.
  • APM: Removal of sublayer metric computation improves performance of the trace agent (CPU and memory).
  • APM: All API endpoints now respond with the "Datadog-Agent-Version" HTTP response header.
  • Query application list from Cloud Foundry Cloud Controller API to get up-to-date application names for tagging containers and metrics.
  • Introduce a clc_runner_id config option to allow overriding the default Cluster Checks Runner identifier. Defaults to the node name to make it backwards compatible. It is intended to allow binpacking more than a single runner per node.
  • Improve migration path when shifting docker container tailing from the socket to file. If tailing from file for Docker containers is enabled, container with an existing entry relative to a socket tailer will continue being tailed from the Docker socket unless the following newly introduced option is set to true: logs_config.docker_container_force_use_file It aims to allow smooth transition to file tailing for Docker containers.
  • (Unix only) Add go_core_dump flag to generate core dumps on Agent crashes JSON payload serialization and compression now uses shared input and output buffers to reduce total allocations in the lifetime of the agent.
  • On Windows the comments in the datadog.yaml file are preserved after installation.
  • Add kube_region and kube_zone tags to node metrics reported by the kube-state-metrics core check
  • Implement the following synthetic metrics in the kubernetes_state_core check to mimic the legacy kubernetes_state one:
  • persistentvolumes.by_phase
  • service.count
  • namespace.count
  • replicaset.count
  • job.count
  • deployment.count
  • daemonset.count
  • statefulset.coumt
  • Minor improvements to agent log-stream command. Fixed timestamp, added host name, use redacted log message instead of raw message.
  • NPM - Improve accuracy of retransmits tracking on kernels >=4.7
  • Orchestrator explorer collection is no longer handled by the cluster-agent directly but by a dedicated check. prometheus_scrape.checks may now be defined as an environmnet variable DD_PROMETHEUS_SCRAPE_CHECKS formatted as JSON
  • Runtime security module doesn't stop on first policies file load error and now send an event with a report of the load.
  • Sketch series payloads are now compressed as a stream to reduce buffer allocations.
  • The Datadog Agent won't try to connect to kubelet anymore if it's not running in a Kubernetes cluster.
  • Bug Fixes:
  • Fixes bug introduced in #7229
  • Adds a limit to the number of DNS stats objects the DNSStatkeeper can have at any given time. This can alleviate memory issues on hosts doing high numbers of DNS requests where network performance monitoring is enabled.
  • Add tags to snmp_listener network configs. This is needed since user switching from Python SNMP Autodiscovery will expect to have tags to be available with Agent SNMP Autodiscovery (snmp_listener) too.
  • APM: When UDP is not available for Dogstatsd, the trace-agent can now use any other available alternative, such as UDS or Windows Pipes.
  • APM: Fixes a bug where nested SQL queries may occasionally result in bad obfuscator output.
  • APM: All Datadog API key usage is sanitized to exclude newlines and other control characters.
  • Exceeding the conntrack rate limit:
  • (system_probe_config.conntrack_rate_limit) would result in
  • conntrack updates from the system not being processed anymore
  • Address issue with referencing the wrong repo tag for Docker image by simplifying logic in DockerUtil.ResolveImageNameFromContainer to prefer Config.Image when possible.
  • Fix kernel version parsing when subversion/patch is > 255, so eBPF program loading does not fail.
  • Agent host tags are now correctly removed from the in-app host when the configured tags/DD_TAGS list is empty or not defined.
  • Fixes scheduling of non-working container checks introduced by environment autodiscovery in 7.26. Features can now be exluded from autodiscovery results through autoconfig_exclude_features. Example autoconfig_exclude_features: ["docker","cri"] or DD_AUTOCONFIG_EXCLUDE_FEATURES="docker cri" Fix typo in variable used to disable environment autodiscovery and make it usable in datadog.yaml. You should now set autoconfig_from_environment: false or DD_AUTOCONFIG_FROM_ENVIRONMENT=false
  • Fixes limitation of runtime autodiscovery which would not allow to run containerd check without cri check enabled. Fixes error logs in non-Kubernetes environments.
  • Fix missing tags on Dogstatsd metrics when DD_DOGSTATSD_TAG_CARDINALITY=orchestrator (for instance, task_arn on Fargate)
  • Fix a panic in the system-probe part of the tcp_queue_length check when running on nodes with several CPUs.
  • Fix agent crashes from Python interpreter being freed too early.
  • This was most likely to occur as an edge case during a shutdown of the agent where the interpreter was destroyed before the finalizers for a check were invoked by finalizers.
  • Do not make the liveness probe fail in case of network connectivity issue. However, if the agent looses network connectivity, the readiness probe may still fail.
  • On Windows, using process agent, fixes the virtual CPU count when the device has more than one physical CPU (package)).
  • On Windows, fixes problem in process agent wherein windows processes
  • could not completely exit.
  • (macOS only) Apple M1 chip architecture information is now correctly
  • reported.
  • Make ebpf compiler buildable on non-GLIBC environment.
  • Fix a bug preventing pod updates to be sent due to the Kubelet exposing unreliable resource versions.
  • Silence INFO and WARNING gRPC logs by default. They can be re-enabled by setting GRPC_GO_LOG_VERBOSITY_LEVEL to either INFO or WARNING.
  • Other Notes:
  • Network monitor now fails to load if conntrack initialization fails on system-probe startup. Set network_config.ignore_conntrack_init_failure to true to reverse this behavior.
  • When generating the permissions.log file for a flare, if the owner of a file no longer exists in the system, return its id instead instead of failing.
  • Upgrade embedded openssl to 1.1.1k.

New in Datadog Agent Manager 7.26.0 (Mar 2, 2021)

  • New Features:
  • APM: Support SQL obfuscator feature to replace consecutive digits in
  • table names.
  • APM: Add an endpoint to receive apm stats from tracers.
  • Agent discovers by itself which container AD features and checks
  • should be scheduled without having to specify any configuration.
  • This works for Docker, Containerd, ECS/EKS Fargate and Kubernetes.
  • It also allows to support heterogeneous nodes with a single
  • configuration (for instance a Kubernetes DaemonSet could cover nodes
  • running Containerd and/or Docker - activating relevant configuration
  • depending on node configuration). This feature is activated by
  • default and can be de-activated by setting environment variable
  • AUTCONFIG_FROM_ENVIRONMENT=false.
  • Adds a new agent command stream-logs to stream the logs being
  • processed by the agent. This will help diagnose issues with log
  • integrations.
  • Submit host tags with log events for a configurable time duration to
  • avoid potential race conditions where some tags might not be
  • available to all backend services on freshly provisioned instances.
  • Added no_proxy_nonexact_match as a configuration setting which
  • allows non-exact URL and IP address matching. The new behavior uses
  • the go http proxy function documented here
  • https://godoc.org/golang.org/x/net/http/httpproxy#Config If the
  • new behavior is disabled, a warning will be logged if a url or IP
  • proxy behavior will change in the future.
  • The Quality of Service of pods is now collected and sent to the
  • orchestration endpoint.
  • Runtime-security new command line allowing to trigger a process
  • cache dump..
  • Support Prometheus Autodiscovery for Kubernetes Pods.
  • The core agent now exposes a gRPC API to expose tags to the other
  • agents. The following settings are now introduced to allow each of
  • the agents to use this API (they all default to false):
  • apm_config.remote_tagger
  • logs_config.remote_tagger
  • process_config.remote_tagger
  • New perf map usage metrics.
  • Add unofficial arm64 support to network tracer in system-probe.
  • system-probe: Add optional runtime compilation of eBPF programs.

New in Datadog Agent Manager 7.25.1 (Jan 27, 2021)

  • Bug Fixes:
  • Fix "fatal error: concurrent map read and map write" due to reads of a concurrently mutated map in inventories.payload.MarshalJSON()
  • Fix an issue on arm64 where non-gauge metrics from Python checks were treated as gauges.
  • On Windows, fixes uninstall/upgrade problem if core agent is not running but other services are.
  • Fix NPM UDP destination address decoding when source address ends with .8 during offset guessing.
  • On Windows, changes the password generating algorithm to have a minimum length of 16 and a maximum length of 20 (from 12-18).
  • Improves compatibility with environments that have longer password requirements.

New in Datadog Agent Manager 7.25.0 (Jan 15, 2021)

  • New Features:
  • Add com.datadoghq.ad.tags container auto-discovery label in AWS Fargate environment.
  • Package the gstatus command line tool binary for GlusterFS integration metric collection.
  • Queried domain can be tracked as part of DNS stats
  • APM: The agent is now able to skip top-level span computation in cases when the client has marked them by means of the
  • Datadog-Client-Computed-Top-Level header.
  • APM: The maximum allowed key length for tags has been increased from 100 to 200.
  • APM: Improve Oracle SQL obfuscation support.
  • APM: Added support for Windows pipes. To enable it, set the pipe path using DD_APM_WINDOWS_PIPE_NAME. For more details check PR
  • Pause containers are now detected and auto excluded based on the io.kubernetes container labels.
  • APM: new datadog_agent.obfuscate_sql_exec_plan function exposed to python checks to enable obfuscation of json-encoded SQL Query Execution Plans.
  • APM: new obfuscate_sql_values option in apm_config.obfuscation enabling optional obfuscation of SQL queries contained in JSON data collected from some APM services (ES & Mongo)
  • Enhancement Notes:
  • Support the ddog-gov.com site option in the Windows GUI installer.
  • Adds config setting for ECS metadata endpoint client timeout (ecs_metadata_timeout), value in milliseconds.
  • Add loader config to allow selecting specific loader at runtime. This config is available at init_config and instances level.
  • Added additional container information to the status page when collect all container logs is enabled in agent status.
  • On Windows, it will no longer be required to supply the ddagentuser name on upgrade. Previously, if a non-default or domain user was used, the same user had to be provided on subsequent upgrades.
  • Added --flare flag to agent check to save check results to the agent logs directory. This enables flare to pick up check results.
  • Added new config option for JMXFetch collect_default_jvm_metrics that enables/disables default JVM metric collection.
  • Allow empty message for DogStatsD events (e.g. "_e{10,0}:test title|")
  • Expires the cache key for availability of ECS metadata endpoint used to fetch EC2 resource tags every 5 minutes.
  • Data coming from kubernetes pods now have new kube_ownerref_kind and kube_ownerref_name tags for each of the pod's OwnerRef property, indicating its Kind and Name, respectively.
  • We improved the way Agents get the Kubernetes cluster ID from the
  • Cluster Agent. It used to be that the cluster agent would create a configmap which had to be mounted as an env variable in the agent daemonset, blocking the process-agent from starting if not found.
  • Now the process-agent will start, only the Kubernetes Resources collection will be blocked.
  • Events sent by the runtime security agent to the backend use a net taxonomy.
  • Scrub container args as well for orchestrator explorer.
  • Support custom autodiscovery identifiers on Kubernetes using the ad.datadoghq.com/<container_name>.check.id pod annotation.
  • The CPU check now collects system-wide context switches on Linux.
  • Add --table option to agent check command to output results in condensed tabular format instead of JSON.
  • APM: improve performance by changing the msgpack serialization implementation.
  • APM: improve the performance of the msgpack deserialization for the v0.5 payload format.
  • APM: improve performance of trace processing by removing some heap allocations.
  • APM: improve sublayer computation performance by reducing the number of heap allocations.
  • APM: improved stats computation performance by removing some string concatenations.
  • APM: improved trace signature computation by avoiding heap allocations.
  • APM: improve stats computation performance.
  • Update from alpine:3.10 to alpine:3.12 the base image in Dogstatsd's Dockerfiles.
  • Deprecation Notes:
  • APM: remove the already deprecated apm_config.extra_aggregators config option.
  • Bug Fixes:
  • Fix macos dlopen failures by ensuring cmake preserves the required runtime search path.
  • Fix memory leak on check unscheduling, which could be noticeable for checks submitting large amounts of metrics/tags.
  • Exclude pause containers using the cdk/pause.* image.
  • Fixed missing some Agent environment variables in the flare
  • Fix a bug that prevented the logs Agent from discovering the correct init containers source and service on Kubernetes.
  • The logs agent now uses the container image name as logs source instead of kubernetes when a standard service value was defined for the container.
  • Fixes panic on concurrent map access in Kubernetes metadata tag collector.
  • Fixed a bug that could potentially cause missing container tags for check metrics.
  • Fix a potential panic on ECS when the ECS API is returning empty docker ID
  • Fix systemd check id to handle multiple instances. The fix will make check id unique for each different instances.
  • Fix missing tags on pods that were not seen with a running container yet.
  • Fix snmp listener subnet loop to use correct subnet pointer when creating snmpJob object.
  • Upgrade the embedded pip version to 20.3.3 to get a newer vendored version of urllib3.
  • Other Notes:
  • The Agent, Logs Agent and the system-probe are now compiled with Go 1.14.12
  • Upgrade embedded libkrb5 Kerberos library to v1.18.3. This version drops support for the encryption types marked as "weak" in the docs of the library

New in Datadog Agent Manager 7.24.1 (Dec 17, 2020)

  • Bug Fixes
  • Fix a bug when parsing the current version of an integration that prevented
  • upgrading from an alpha or beta prerelease version.
  • During a domain installation in a child domain, the Windows installer can now use a user from a parent domain.
  • The Datadog Agent had a memory leak where some tags would be collected but
  • never cleaned up after their entities were removed from a Kubernetes
  • cluster due to their IDs not being recognized. This has now been fixed, and
  • all tags are garbage collected when their entities are removed.
  • Other Notes:
  • Updated the shipped CA certs to latest (2020-12-08)

New in Datadog Agent Manager 7.23.1 (Oct 21, 2020)

  • The ec2_prefer_imdsv2 parameter was ignored when fetching EC2 tags from the metadata endpoint. This fixes a misleading warning log that was logged even when ec2_prefer_imdsv2 was left disabled in the Agent configuration.
  • Support of secrets in JSON environment variables, added in 7.23.0, is reverted due to a side effect (e.g. a string value of "-" would be loaded as a list). This feature will be fixed and added again in a future release.
  • The Windows installer can now install on domains where the domain name is different from the Netbios name.

New in Datadog Agent Manager 7.23.0 (Oct 8, 2020)

  • Upgrade Notes:
  • Network monitoring: enable DNS stats collection by default.
  • New Features:
  • APM: Decoding errors reported by the datadog.trace-agent.receiver.error and datadog.trace_agent.normalizer.traces_dropped contain more detailed reason tags in case of EOFs and timeouts.
  • Running the agent flare with the -p flag now includes profiles for the trace-agent.
  • APM: An SQL query obfuscation cache was added under the feature flag
  • DD_APM_FEATURES=sql_cache. In most cases where SQL queries are repeated or prepared, this can significantly reduce CPU work.
  • Secrets handles are not supported inside JSON value set through environment variables. For example setting a secret in a list DD_FLARE_STRIPPED_KEYS='["ENC[auth_token_name]"]'datadog-agent run
  • Add basic support for UTF16 (BE and LE) encoding. It should be manually enabled in a log configuration using encoding: utf-16-be or encoding: utf-16-le other values are unsupported and ignored by the agent.
  • Enhancement Notes:
  • Add new configuration parameter to allow 'GroupExec' permission on the secret-backend command. Set to 'true' the new parameter 'secret_backend_command_allow_group_exec_perm' to activate it.
  • Add a map from DNS rcode to count of replies received with that rcode
  • Enforces a size limit of 64MB to uncompressed sketch payloads (distribution metrics). Payloads above this size will be split into smaller payloads before being sent.
  • APM: Span normalization speed has been increased by 15%.
  • Improve the kubelet check error reporting in the output of agent status in the case where the agent cannot properly connect to the kubelet.
  • Add space_id, space_name, org_id and org_name as tags to both autodiscovered containers as well as checks found through autodiscovery on Cloud
  • Foundry/Tanzu.
  • Improves compliance check status view in the security-agent status command.
  • Include compliance benchmarks from github.com/DataDog/security-agent-policies in the Agent packages and the Cluster Agent image.
  • Windows Docker image is now based on Windows Server Nano instead of
  • Windows Server Core.
  • Allow sending the GCP project ID under the project_id: host tag key, in addition to the project: host tag key, with the gce_send_project_id_tag config setting.
  • Add kubeconfig to GCE excluded host tags (used on GKE)
  • The cluster name can now be longer than 40 characters, however the combined length of the host name and cluster name must not exceed 254 characters.
  • When requesting EC2 metadata, you can use IMDSv2 by turning on a new configuration option (ec2_prefer_imdsv2).
  • When tailing logs from container in a kubernetes environment long lines (>16kB usually) that got split by the container runtime (docker & containerd at least) are now reassembled pending they do not exceed the upper message length limit (256kB).
  • Move the cluster-id ConfigMap creation, and Orchestrator Explorer controller instantiation behind the orchestrator_explorer config flag to avoid it failing and generating error logs.
  • Add caching for sending kubernetes resources for live containers
  • Agent log format improvement: logs can have kv-pairs as context to make it easier to get all logs for a given context Sample:
  • The CRI check now supports container exclusion based on container name, image and kubernetes namespace.
  • Added a network_config config to the system-probe that allows the network module to be selectively enabled/disabled. Also added a corresponding DD_SYSTEM_PROBE_NETWORK_ENABLED env var. The network module will only be disabled if the network_config exists and has enabled set to false, or if the env var is set to false. To maintain compatibility with previous configs, the network module will be enabled in all other cases.
  • Log a warning when a log file is rotated but has not finished tailing the file.
  • The NTP check now uses the cloud provider's recommended NTP servers by default, if the Agent detects that it's running on said cloud provider.
  • Bug Fixes
  • Allow agent integration install to work even if the datadog agent configuration file doesn't exist.
  • This is typically the case when this command is run from a
  • Dockerfile in order to build a custom image from the datadog official one.
  • Implement variable interpolation in the tagger when inferring the standard tags from the DD_ENV, DD_SERVICE and DD_VERSION environment variables
  • Fix a bug that was causing not picking checks and logs for containers targeted by container-image-based autodiscovery. Or picking checks and logs for containers that were not targeted by container-image-based autodiscovery. This happened when several image names were pointing to the same image digest.
  • APM: Allow digits in SQL literal identifiers (e.g. 1sad123jk)
  • Fixes an issue with not always reporting ECS Fargate task_arn tag due to a race condition in the tag collector.
  • The SUSE SysVInit service now correctly starts the Agent as the dd-agent user instead of root.
  • APM: Allow double-colon operator in SQL obfuscator.
  • UDP packets can be sent in two ways. In the "connected" way, a connect call is made first to assign the remote/destination address, and then packets get sent with the send function or sendto function with destination address set to NULL. In the "unconnected" way, packets get sent using sendto function with a non NULL destination address. This fix addresss a bug where network stats were not being generated for UDP packets sent using the "unconnected" way.
  • Fix the Windows systray not appearing sometimes (bug introduced with 6.20.0).
  • The Chocolatey package now uses a fixed URL to the MSI installer.
  • Fix logs tagging inconsistency for restarted containers.
  • On macOS, in Agent v6, the unversioned python binaries in /opt/datadog-agent/embedded/bin (example: python, pip) now correctly point to the Python 2 binaries.
  • Fix truncated cgroup name on copy with bpf_probe_read_str in OOM kill and TCP queue length checks.
  • Use double-precision floats for metric values submitted from Python checks.
  • On Windows, the ddtray executable now has a digital signature
  • Updates the logs package to get the short image name from Kubernetes
  • ContainerSpec, rather than ContainerStatus. This works around a known issue where the image name in the ContainerStatus may be incorrect.
  • On Windows, the Agent now responds to control signals from the OS and shuts down gracefully. Coincidentally, a Windows Agent Container will now gracefully stop when receiving the stop command.

New in Datadog Agent Manager 7.22.1 (Oct 7, 2020)

  • Bug Fixes:
  • Define a default logs file (security-agent.log) for the security-agent.
  • Fix segfault when listing Garden containers that are in error state.
  • Do not activate security-agent service by default in the Linux packages of the Agent (RPM/DEB).
  • The security-agent was already properly starting and exiting if not activated in configuration.